You are on page 1of 112

# Chapter I Descriptive Statstics

Objectives
– – – – Define variable and data Describe types of data and measurement scales Define and calculate ratio, rate and proportion Define and calculate measures of central tendency and measures of spread – Organize and display data – Extract useful information

•Any aspect of an individual or object that is measured (e.g., BP) or recorded (e.g., age, sex) and takes any value. •There may be one variable in a study or many. •E.g., A study of treatment outcome of TB

.

.

Eg. Mild 3. Married 3. None 2.g. Moderate 4. Ordinal ♦ Pain level 1. Nominal Marital status: 1. Divorce ♦ The numbers have NO meaning ♦ They are labels only E. Single 2. Severe ♦ The numbers have LIMITED meaning 4>3>2>1 is all we know apart from their utility as labels . Widow 4.

.

It has no true zero point. Eg.Temperature. . weight. BP. oC: 18 20 22 23 For these data. Ratio -Height. etc Someone who weighs 80 kg is two times as heavy as someone else who weighs 40 kg. but is 5o cooler. . Note on meaningfulness of “ratio”- .Eg. age. ―0‖ is arbitrarily chosen and doesn‘t reflect the absence of temp. in oC on 4 consecutive days Days: A B C D Temp. not only is day A with 18o cooler than day D with 23o. Interval .

Interval Nominal Degree of precision in measuring Ordinal Ratio .

number of e.g.Summary of Data Variable Qualitative or categorical Quantitative measurement Nominal (not ordered) e. response to treatment Continuous Discrete (real-valued) (count data) e.g.g. height admissions . ethnic group Ordinal (ordered) e.g.

natural breaks.99 kg/m2 • Overweight: ≥ 25. established criteria – Example: WHO body mass index classification • Underweight: <18.50 – 24.Categorizing Data ♦ Can facilitate data analysis ♦ Must choose: – Number of categories – Category cut points ♦ Some options for cut points: – Percentiles.00 kg/m2 .50 kg/m2 • Normal: 18.

4. 2.Categorizing Variables-Exercise 1. 5. Year of birth Marital status of women Identification number study participant Class rank Length of infants at ANC clinic . 3.

Identification number: Categorical/Nominal 4. Marital status: Categorical/Nominal 3. Year of birth: Quantitative/Discrete 2.Categorizing Variables-Exercise 1. Length: Quantitative/Continuous . Class rank: Categorical/Ordinal 5.

Number of times a child under 5 has experienced fever in the last month 3.Discrete or Continuous? Identify whether the following data is discrete or continuous: 1. Distance from primary health center to reference lab 2. Number of fatal accidents on a road over the past year 4. Weight gained or lost by a 9 month old in the past 3 months .

Weight gained or lost by a 9 month old in the past 3 months: Continuous .Discrete or Continuous? Identify whether the following data is discrete or continuous: 1. Distance from primary health center to reference lab: Continuous 2. Number of times a child under 5 has experienced fever in the last month: Discrete 3. Number of fatal accidents on a road over the past year: Discrete 4.

Describing categorical data A prerequisite for any research is the ability to quantify the occurrence of disease • How many people are affected by a certain disease? (Count) • What is the rate at which the disease in occurring through time? (Rate) • How does the disease burden vary by location. or various modes of exposure? (Ratio. by age. Proportion) . by sex.

Example: ♦ 350.Counts ♦ Most basic measure of disease frequency is a simple count of affected individuals.000 cases of polio in 1988 ♦ 350.000 cases of polio in 1988 in 125 countries .000 cases of polio ♦ 350.

How is count data used? 1988 Polio > 350 000 cases 125 countries 2002 Polio 1918 cases 7 countries .

29 10 18 33 14 25 60 24 43 93 30 . 1976 Age (years) Male Female Total <1 1 .14 15 . Zaire.49 50+ Total 57 23 141 52 26 177 109 49 318 .Example of Counts Number of Cases of Hemorrhagic Fever by Age and Sex.

who is in the denominator???? .• Ratio • Proportion • Rate What.

Ratio ♦ The quotient of 2 numbers ♦ Numerator NOT INCLUDED in the denominator ♦ No relationship necessary between numerator and denominator ♦ May be expressed as a/b or a:b .

4 X 100 = 40 .What is the sex ratio? # males 100 Sex ratio = males:females = # females = 2 / 5 = .

When is a ratio used? ♦ Sex ratio: Male to female ♦ Number of health facilities per population ♦ Number of participants in the course per facilitator ♦ Number of inhabitants per latrine ♦ Odds ratio ♦ Relative risk ♦ Prevalence ratio ♦ Maternal mortality ratio .

Ratio Example 1 ♦ A university has 4000 male students and 2000 female students. The ratio of male to female students is: ♦ 4000/2000 = 2/1 or 2:1 ♦ For every 2 male students there is one female student .

The attack rate in the first grade was 24% while the attack rate in the second grade was 16%. . Compare these two attack rates. ♦ 24/16 = 3/2 or 3:2 ♦ For every 3 first graders who fell ill. there were 2 second graders who also fell ill.Ratio Example 2 ♦ A foodborne epidemic occurred in an elementary school canteen.

Ratio = 400 / 4. Calculate the ratio of clinics per person.000 persons .000.0001 x 104 = 1 clinic / 10.Ratio Example 3 A city of 4 million people has 400 clinics.0001 clinics / person Multiply by 104 Ratio = 0.000 = 0.

Proportion ♦ The quotient of 2 numbers ♦ Numerator is a sub-group of the population in the denominator ♦ Numerator is always INCLUDED in the denominator ♦ Proportion ranges between 0 and 1 ♦ Percentage = proportion x 100 .

What is the proportion of cases? + + 2 cases  0.5  100  50% 4 total + - - + .

Falciparum – 1000 samples.236 x 100 = 23.When is a proportion used? ♦ Proportion of samples positive for P. 6% ♦ Proportion of malaria deaths – 123 malaria cases.7% . 7 deaths – Proportion of malaria deaths = 7/123 = 0.057 – Percentage of malaria deaths = 0. 236 positive – Proportion of positive samples = 236/1000 = 0.057 x 100 = 5.236 – Percentage of positive samples = 0.

3% .Proportion Example 1 ♦ A university has 4000 male students and 2000 female students. ♦ Male: 4000/6000 x 100% = 66.7% ♦ Female: 2000/6000 x 100% = 33. Calculate the proportion of male and female students.

Proportion Example 2 40 children are currently ill with the measles. 80 children all together have had the measles ♦ 40 / 80 = .50 (proportion) ♦ 40 / 80 = .50 * 100 = 50% (percentage) .

Rate ♦ The quotient of 2 numbers ♦ Measures the probability of occurrence of an event over time ♦ Numerator: number of EVENTS ♦ Denominator: POPULATION at risk for event in numerator observed for a given TIME .

What is the rate of death? Observed in one year 2  2 deaths per 100 population per year 100 per year .

When is a rate used? ♦ Morbidity rates – Attack rates – Prevalence rates – Incidence rates ♦ Mortality rates ♦ Natality rates .

000. per day .Rate Example 1 ♦ Mortality rate of tetanus in France in 1995 – – – – Tetanus deaths: 17 Population in 1995: 58 million Time period: 1 year Mortality rate = 0. 1. per month.000.029 per 100. 100.000 population per year ♦ Rate may be expressed in any power of 10 – 100. 10.000 ♦ Rate must include an aspect of time – Per year.

Rate Example 2 Maternal Mortality for Various Continents (1995) Continent Rate Africa Asia Europe Latin America/Caribbean 273000 217000 2000 22000 South America North America Australia/New Zealand 15000 490 25 .

Summary W is the Measure of Frequency? hat Is numerator included in denominator? Yes Is time included in denominator? Yes Measure: Rate No Proportion Ratio 14 No .

Standard deviation .Describing Quantitative Variables •Measures of Central Location • Mean. IQR. Variance. Median. Mode •Measures of Spread • Range.

Measure of Central Location  Central Location / Position / Tendency –  A single value that represents (is a good summary of) an entire distribution of data  Also known as: – ―Measure of central tendency‖ – ―Measure of central position‖  Common measures – Arithmetic mean – Median – Mode .

Central Location ? 20 ? Number of people 15 10 5 Spread 0-9 10-19 20-29 30-39 40-49 50-59 60-69 Age 70-79 80-89 90-99 0 .

Age 27 30 28 31 28 36 29 37 29 34 Raw data set: Ages of students in a class (years) 30 30 27 30 28 31 32 30 29 29 .

Ob s Age 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 27 27 28 28 28 29 29 29 29 30 30 30 30 30 31 31 32 34 36 37 Order the data set from the lowest value to the highest value Add observation numbers .

showing the values of the variable and the frequency with which each value occurs 2. Identify the value that occurs most often .Mode Definition: Mode is the value that occurs most frequently Method for identification 1. Arrange data into frequency distribution or histogram.

Ob s Age 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 27 27 28 28 28 29 29 29 29 30 30 30 30 30 31 31 32 34 36 37 Mode Age 27 28 29 Frequency 2 3 4 30 31 32 5 2 1 Mode 33 34 35 36 37 Total 0 1 0 1 1 20 .

Ob s Age 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 27 27 28 28 28 29 29 29 30 30 30 30 30 31 31 32 34 36 37 29 Mode The most frequent value of the variable Mode = 30 7 6 Frequency 5 4 3 2 1 27 2 8 29 30 31 32 33 34 35 36 37 Age (years) .

22. 12. 18. 9. 16. 9. 4. 2. 10.Example Finding Mode from Length of Stay Data 0. 19. 14. 10. 49 . 3. 8. 5. 18. 6. 12. 7. 10. 11. 27. 13. 9. 10. 5. 10. 12.

Mode = 10

Finding Mode from Histogram
6

Number of patients

5 4 3 2 1 0 0 5 10 15 20 25 30 35 40 45 50 Nights of stay

Mode – Properties / Uses
Easiest measure to understand, explain, identify Always equals an original value Insensitive to extreme values (outliers) Good descriptive measure, but poor statistical properties ♦ May be more than one mode ♦ May be no mode ♦ Does not use all the data ♦ ♦ ♦ ♦

Outliers 6 Number of patients 5 4 3 2 1 0 0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 Nights of stay .

20 18 16 Population 14 12 10 8 6 4 Unimodal Distribution 2 0 18 16 Population 14 12 10 8 6 4 2 0 Bimodal Distribution .

Median Definition: Median is the middle value. the value that splits the distribution into two equal parts – 50% of observations are below the median – 50% of observations are above the median Method for identification 1. Find middle position as (n + 1) / 2 3. Identify the value at the middle . Arrange observations in order 2. also.

Obs 1 2 3 4 5 Age 27 27 28 28 28 Median: Odd Number of Values N = 19 Median = Observation = = 6 7 8 9 29 29 29 29 N+1 2 19+1 2 20 2 10 10 11 12 13 14 15 16 17 18 30 30 30 30 30 31 31 32 34 = Median age = 30 years 19 36 .

5 Median age = Average value between 10th and 11th observation 30+30 = 2 30 years .Obs 1 2 3 4 5 Age 27 27 28 28 28 Median: Even Number of Values N = 20 Median = Observation = = = N+1 2 6 7 8 9 10 11 12 13 14 15 16 17 18 19 29 29 29 29 30 30 30 30 30 31 31 32 34 36 20+1 2 21 2 10.

2. 10. 16. 10. 49 . 5. 18. 11. 22. 9. 19. 5. 12. 9. 27. 14. 6.Examples Find Median of Length of Stay Data. 13. 10. 7. 12. 4. 8. 10. 10. 18. 0. 9. 3. 12.

Median at 50% = 10 .

Median – Properties / Uses ♦ Does not use all the data available ♦ Insensitive to extreme values (outliers) ♦ Good descriptive measure but poor statistical properties ♦ Measure of choice for skewed data ♦ Equals an original value of n is odd .

Quartiles Definition: Quartile is the value that splits the distribution into four equal parts  25% of observations are below the first quartile (Q1)  25% of observations are between Q1 and Q2 (median)  25% of observations are between Q2 (median) and Q3  25% of observations are above Q3 .

5 (median) 3(N+1) Q3 observation = round 4 3(20+1) 3(21) = = 4 4 = 15.75 ~ 16th obs 12 13 14 15 16 17 18 19 30 30 30 31 31 32 34 36 Q3 .25 ~ 5th obs Q1 4 5 6 7 8 9 Q2 10 11 Q2 observation = 10.Obs Age 1 2 3 27 27 28 28 28 29 29 29 29 30 30 Quartiles Q1 age = 28 Q2 age = 30 Q3 age = 31 N+1 4 Q1 observation = round 20+1 21 = = 4 4 = 5.

Percentiles Value of the variable that splits the distribution in 100 equal parts •35 % of observations are below the 35th percentile •65 % of observations are above 35th percentile .

Obs 1 2 Age 27 27 28 28 28 29 29 29 29 Percentiles Values (Age) 27 Fre q 2 Percent (Freq/Tota l) 10% Cumulativ e Percent 10% 3 4 5 6 7 8 9 28 29 30 31 32 34 36 37 Total 3 4 5 2 1 1 1 1 20 15% 20% 25% 10% 5% 5% 5% 5% 100% 25% 45% 70% 80% 85% 90% 95% 100% 25th Percentile 10 11 12 13 30 30 30 30 90th Percentile 14 15 16 17 18 19 30 31 31 32 34 36 .

Arithmetic Mean Arithmetic mean = “average” value Method for identification 1. Sum up all of the values 2. Divide the sum by the number of observations (n) .

25 .Obs Age 1 2 3 4 5 27 27 28 28 28 Arithmetic Mean 6 7 8 9 10 11 12 13 14 29 29 29 29 30 30 30 30 30 x i m N N = 20 Sxi = 605 m 15 16 17 18 19 31 31 32 34 36 605 20  30.

9. 5. 2. 3. 16. 22. 14. 6. 49 Sum = 360 n = 30 Mean = 360 / 30 = ? . 10. 10. 18.Example Finding the Mean — Length of Stay Data 0. 8. 18. 13. 5. 10. 10. 12. 9. 4. 11. 12. 9. 27. 12. 19. 10. 7.

Arithmetic Mean – Properties / Uses ♦ Probably best known measure of central location ♦ Use all of the data ♦ Affected by extreme values (outliers) ♦ Best for normally distributed data ♦ Not usually equal to one of the original values ♦ Good statistical properties .

0 5 10 15 20 25 30 Nights of stay 35 40 45 50 Number of patients 5 4 3 2 1 0 0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 Nights of stay Mean = 15.Sensitive to Outliers 6 5 4 3 2 1 0 0 6 Mean = 12.3 .

When to use the arithmetic mean?  Centered distribution  Approximately symmetrical  Few extreme values (outliers) OK! .

Summary  Measure of Central Location – single measure that represents an entire distribution  Mode – most common value  Median – central value  Arithmetic mean – average value  Mean uses all data. so sensitive to outliers  Mean has best statistical properties  Mean preferred for normally distributed data  Median preferred for skewed data  Geometric mean for dilutional titer .

g.Other Measures of Central Location Midrange  = Minimum + maximum values / 2  Quick and dirty Geometric Mean  Can use if log of data are normally distributed (e. lab titers)  = nth root of (Obs1 x Obs2 x Obs3 x …Obsn)  = antilog (sum log xi / n) ..

Measures of Spread Definition: Measures that quantify the variation or dispersion of a set of data from its central location Also known as: – ―Measure of dispersion‖ – ―Measure of variation‖ Common measures – Range – Standard error – Interquartile range – 95% confidence interval – Variance / standard deviation .

Same center but … different dispersions .

12. 10. 8. 10. 6. 10. 9. 3. 22. 12. 49 . 13. 10. 9. 9. 10. 14. 16. 11. 18. 4. 2. 19. 5. 5. 27.Range Definition: difference between largest and smallest values Example: Finding the Range of Length of Stay Data 0. 7. 18. 12.

Range – Sensitive to Outliers? 6 5 4 3 2 1 0 0 5 10 15 Range = 0 to 49 20 25 30 Nights of stay 35 40 45 50 Number of patients 6 5 4 3 Range = 0 to 149 2 1 0 0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 Nights of stay .

Interquartile Range Definition: the central 50% of a distribution Properties / Uses • Used with median • Used to show the ―most typical‖ 50% of the values .

10. 7.Example IQR— Length of Stay Data Q1 0. 12. 11.5 Q3 = 75th percentile = 3 (30+1) / 4 = 23¼ 6¾ 10 14½ . 9. 14. 6. 3. 10. 16. 27. 9. 19. 5. 18. 22. 2. M 10. 5. 4. 9. 13. 10. 12. 18. 49 Q3 Q1 = 25th percentile = (30+1) / 4 = 7¾ Median = 50th percentile = 15. 8. 12. 10.

75 6 5 4 Q1 M Q3 3 2 1 0 0 5 10 15 20 25 30 35 40 45 50 Nights of stay .IQR— Length of Stay Data IR = 7.

Variance and Standard Deviation ♦ Definition: measures of variation that quantifies how closely clustered the observed values are to the mean ♦ Variance = average of squared deviations from mean = Sum (x – mean)2 / n-1 ♦ Standard deviation = square root of variance .

Variance and Standard Deviation Mean Mean .

x ) ² s² = n-1 ( x i .x )² s = n-1 .Equations for Variance and Standard Deviation : Mean xi : Data value n : No. of observation s²: Variance s : Standard deviation  (x i .

0% of the data fall within ♦ 99.Standard Deviation – Properties / Uses Standard deviation usually calculated only when data are more or less normally distributed (bell shaped curve) For normally distributed data.7% of the data fall within plus/minus plus/minus plus/minus plus/minus 1 SD 2 SD 1.3% of the data fall within ♦ 95. ♦ 68.96 SD 3 SD .5% of the data fall within ♦ 95.

5% 95 % 68% 2.Normal Distribution 2.5% Standard deviation Mean .

Comparison of Mode. Median and Mean Symmetrical: Mode = Median = Mean Skewed right: Mode < Median < Mean Skewed left: Mean < Median < Mode .

Match the Measures of Central Location & Spread  Mode  Median  Standard deviation  Range  Arithmetic mean  Interquartile range .

Match the Measures of Central Location & Spread  Mode  Median  Standard deviation  Range  Arithmetic mean  Interquartile range .

symmetrical Skewed or Data with outliers .Name the Appropriate Measures of Central Location and Spread Distribution Central Location Spread Single peak.

symmetrical Mean* Standard deviation Skewed or Median Range or Data with outliers Interquartile range * Median and mode will be similar .Name the Appropriate Measures of Central Location and Spread Distribution Central Location Spread Single peak.

not always useful  Standard deviation – use with mean  Range/Interquartile Range – use with median .Properties of Measures of Central Location & Spread  Arithmetic mean – best for normally distributed data  Median – best for skewed data  Mode – simple. descriptive.

Median 14 Mode 12 Population 10 8 6 4 2 0 Age 1st quartile 3rd quartile Minimum Interquartile interval Range Maximum .

Displaying categorical variables ♦ Table of frequency distributions – Frequency – Relative frequency – Cumulative frequencies ♦ Charts – Bar charts – Pie charts .

Freq(%) 43 0. Very low Low Normal Big Total Freq.Frequency distributions ♦ A simple and effective way of summarizing categorical data is to construct a frequency distribution table.7 9974 100 Cum. Freq 43 836 9706 9974 8.4 793 8.0 8870 88. BWT . ♦ Second column: Count number of observation ♦ E. Table below shows the frequency distribution of birth weight for 9975 newborns between 1976-1996.9 268 2. Rel.3 100 .4 97.g. ♦ First column: Level of the variables.

♦ Conversion in the opposite direction is also possible. but the conversion is often inaccurate because of rounding ♦ The third column of Table below shows the relative frequency distribution of birth weight for 9975 newborns between 1976-1996 .Relative Frequency ♦ Useful to compute the proportion. ♦ The distribution of proportions is called the relative frequency distribution of the variable ♦ Given a total number of observations. the relative frequency distribution is easily derived from the frequency distribution. or percentages of observations in each level.

4 43 8. 97. Frequency Distribution of birth weight of newborns between 1976-1996 at TAH.7 9974 100 q.0 836 88.9 9706 2.Freq(%) Cum.Table 1.(%) 0 8. Freq 0.3 10 . 43 793 8870 268 9974 Rel. BWT Very low Low Normal Big Total Freq.

Cumulative frequency ♦ The cumulative frequency of a category is the number of observations in the category plus observations in all categories smaller than it.(%) Very low 43 0.4 43 0. ♦ BWT Freq.7 9974 100 Total 9974 100 .rel.0 836 8.9 9706 97.4 Normal 8870 88.4 Low 793 8.3 Big 268 2.freq.Freq(%) Cum.Freq Cum. Rel.

0 904 84.4 605 56.7 200-239 299 28.7 240-279 115 10.rel.1 163 15.3 160-199 442 41.8 1019 95.5 1067 100 ------------------------------------------------------------------------------------------Total 1067 100 .Table 2.5 280-319 34 3. freq ------------------------------------------------------------------------------------------80-119 13 1.7 320-359 9 0.2 120-159 150 14.5 360-399 5 0. Frequencies of serum cholesterol levels for 1067 US males of ages 25-34 1976-1980 -----------------------------------------------------------------------------------Cholesterol level Mg/100ml freq Relative freq Cum freq Cum.2 1053 98.2 13 1.8 1062 99.

♦ Bar charts: display the frequency distribution for nominal or ordinal data. ♦ Horizontal axis: Labels of the variable ♦ Vertical bar: Frequency or the relative frequency ♦ The bars should be of equal width and should be separated from one another so as not to imply continuity .Charts ♦ The frequency distribution of a categorical variable is often presented graphically as a bar chart or pie chart.

Freq. Freq.Bar charts showing frequency distribution of the variable ‗BWT‘ described in Table 6000 100 5000 80 4000 Rel. 60 3000 2000 40 1000 20 0 Very low Low BWT Normal Big 0 Very low Low Normal Big BWT .

bars are often drawn along side each other for groups being compared in a single bar chart 100 90 80 70 60 50 40 30 20 10 0 88.Bar charts for comparison ♦ In order to compare the distribution of a variable for two or more groups.9 2.9 89 Percent Yes No 9 7.1 3.1 Normal BWT Big Low Bar chart indicating categories of birth weight of 9975 newborns grouped by antenatal follow-up of the mothers .

7 0. such that each sector represents either the frequency or the relative frequency of observation within the class the angles of which are proportional to frequency or the relative. ♦ In a pie chart the various categories into which the observation fall are represented along sectors of a circle. Fig 3(b) Pie chart indicating relative frequency of categories of birth weight Fig 3(a) Pie chart indicating frequency of catego of birth weight 2.Pie chart ♦ Pie Chart: displays the frequency distribution for nominal or ordinal data.9 8870 .4 8 Very low Low Normal Big 268 43 793 88.

Displaying numerical variables ♦ Graphs – Histograms – Frequency polygons – Cumulative frequency polygons – Box Plots .

♦ Given a set of numerical data. class intervals are usually chosen to be of equal width. ♦ If this is not the case. we can obtain impression of the shape of its distribution by constructing a histogram. the histogram could give a misleading impression of the shape of the data . ♦ Horizontal axis: Labels of the variable ♦ Vertical bar: Frequency or the relative frequency ♦ Except for the two boundaries.Histograms ♦ Histograms are frequency distributions with continuous class interval that have been turned into graphs.

5-24.5 .5 19.5-19.5-49.5 39. of women 11 36 28 40 13 35 7 3 30 2 No of women Age of women at the time of marriage 25 20 15 10 5 0 14.5 Age group 34.5 24.5-39.5 29.5-44.Example Consider the following table and the histogram showing distribution of the age of women at the time of marriage Age group 15-19 20-24 25-29 30-34 35-39 40-44 45-49 No.5 44.5-34.5-29.

Dev = 502.34 Mean = 3126 N = 9975.A histogram displaying frequency distribution of birth weight of newborns at Tikur Anbessa Hospital 2000 1800 1600 1400 1200 1000 800 600 Frequency 400 200 0 Std.00 00 52 00 48 00 44 00 40 00 36 00 32 00 28 00 24 00 20 00 16 00 12 0 80 Birth weight .

♦ Frequency polygons are superior to histograms for comparing two or more sets of data.Frequency polygons ♦ Instead of drawing bars for each class interval. sometimes a single point is drawn at the mid point of each class interval and consecutive points joined by straight line. ♦ A graph drawn in this way is called frequency polygons (line graphs). .

Frequency polygon of birth weight of 9975 newborns at Tikur Anbessa Hospital for males and females
50

40

%
30

20

SEX

10

Males Females

0
500 1000 1500 2000 2500 3000 3500 4000 4500 5000

Birth Weight

Cumulative frequency polygons
♦ Horizontal axis: Labels of the variable ♦ Vertical bar: cumulative relative frequency.
♦ The points are then connected by straight lines. ♦ Like frequency polygons, cumulative frequency polygons may be used to comparing sets of data.

♦ Cumulative frequency polygons can also be used to obtain percentiles of a set of data.
♦ Roughly the 50th percentile is the value that is greater than or equal to 50%.

Table 2. Frequencies of serum cholesterol levels for 1067 US males of ages 25-34 1976-1980
------------------------------------------------------------------------------------

Cholesterol level Mg/100ml freq Relative freq Cum freq Cum.rel. freq ---------------------------------------------------------------------------------------80-119 13 1.2 13 1.2 120-159 150 14.1 163 15.3 160-199 442 41.4 605 56.7 200-239 299 28.0 904 84.7 240-279 115 10.8 1019 95.5 280-319 34 3.2 1053 98.7 320-359 9 0.8 1062 99.5 360-399 5 0.5 1067 100 ---------------------------------------------------------------------------------------Total 1067 100

4 1185 96.1 280-319 128 10.2 240-279 281 22.9 1220 99.5 320-359 35 2.4 5 0.9 1057 86.6 318 25.3 776 63.9 53 4.4 120-159 48 3.Table 3.9 200-239 458 37. freq ------------------------------------------------------------------------------------------80-119 5 0.3 160-199 265 21.4 360-399 7 0.5 1227 100 ------------------------------------------------------------------------------------------Total 1227 100 . Frequencies of serum cholesterol levels for 1227 US males of ages 55-64 1976-1980 ------------------------------------------------------------------------------------------Cholesterol level Mg/100ml freq Relative freq Cum freq Cum.rel.

1976-1980 45 40 35 30 25 20 1 5 1 0 5 0 80-1 9 1 1 59 20-1 1 99 60-1 200-239 240-279 280-31 9 320-359 360-399 100 Cumulative relative frequency (%) 90 80 70 60 50 40 30 20 10 0 80-119 120-159 160-199 200-239 240-279 280-319 320-359 360-399 Relative frequency (%) Ages 25-34 Ages 55-64 Ages 25-34 Ages 55-64 Serum cholesterol levels (m g/100m l) Serum cholesterol levels (mg/100ml) .Frequency polygon and Cumulative frequency polygons of serum cholesterol levels for 2294 males aged 25-34 and55-64 years.

Box Plots ♦ A visual picture called box plot can be used to convey a fair amount of information about the distribution of a set of data. ♦ The median is marked as a line within the box and ♦ The end lines show the minimum and maximum values respectively . ♦ The box shows the distance between the first and the third quartiles.

Illustration of Box-plot 18 20 22 24 26 28 30 32 34 36 Numbers .

A box-plot indicating birth weight of 5092 newborns by gestational age at Tikur Anbessa Hospital studied Pre Gest. age Term Post 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 Birth weight(grams) .

Tables Summary Diagrams – Although a certain information is lost when data are summarized using tables and graphs. add a spark of interest and as such catch the attention as much as the figures dispel it. This is so because the impression left by the diagram is of a lasting nature. a great deal is gained – Tables are effective ways of summarizing categorical data – Tables are more informative when they are not overly complex – Tables and the columns within them should always be clearly labeled and units of measurement be specified – Diagrams have greater attraction than mere figures. – The give delight to the eye. – They have great memorizing value than mere figures. – They facilitate comparison . – They help in deriving the required information in less time and without any mental strain.