Professional Documents
Culture Documents
Information 2. for each value, record the number of persons who had that
- Result of data processing variable (twins and other multiple-birth pregnancies count
- data only becomes information that you can use to make only once)
decisions after it has been processed
summarizing data Table 2. Methods of summarizing data
- increases ability to interpret data
- raw data → experiment → results → summarize
descriptive statistics
- mathematical summaries of data
organizing data
- one common method is line list or line listing
o one type of epidemiologic database
o organized like a spreadsheet with rows and
column
- each row → observation or record III. DESCTIPTIVE STATISTICS
o represents one person or disease - 2 types that are generally most useful
- each column → variable
o contains information about one characteristic of A. MEASURES OF CENTRAL TENDENCY
the individual, such as race or date of birth - summaries that calculate the "middle" or "average" of data
- 1st column/variable is usually a person’s name, initals, ID no.
- Other columns: demographic information, clinical details,
1. MEAN
exposures related to illness
- average
I. VARIABLE - calculate the mean by adding up all of the measurements in
- can be any characteristic that differs from person to person, a group and then dividing by the number of measurements
- height, sex, smallpox vaccination status, or physical activity
- The value of a variable – number or descriptor that applies to 2. MEDIAN
a particular person, - value at the midpoint of the group
o 5'6" (168 cm), female, and never vaccinated - exactly half of values in the group are smaller than median
- other half of values in the group are greater than the median
A. TYPES OF VARIABLE odd number of measurements
- The type of values influences the way in which the variables - median = middle value when values are arranged in
can be summarized. ascending order
Classified into one of four types depending on its scale even number
- median = mean of the two middle values when values are
arranged in ascending order
3. MODE
- the value that appears most frequently in the group of
measurements
- It is entirely possible for a group of data to have no mode at
all, or for it to have more than one mode.
- If all values occur with the same frequency (for example, if
Qualitative/Categorical variables
all values occur only once), then the group has no mode.
- Nominal-scale variale
- If more than one value occurs at the highest frequency, then
o Categories without any numerical ranking, such as
each of those values is a mode
county of residence
o Bimodal data set
o Dichotomous variable – two categories are very
o Multimodal data set
common: alive or dead, ill or well, vaccinated or
unvaccinated, or did or did not eat
- Ordinal-scale WHICH MEASURE TO USE
o values that can be ranked - mean, median, mode are all clustered towards the center in
o not necessarily evenly spaced a graph
o stage of cancer - Each a slightly different measure of what happened "on
Quantitative/continuous variables average" in the experiment
- Interval-scale Mean
o measured on a scale of equally spaced units, but - most often used to describe the central tendency
without a true zero point, such as date of birth - most sensitive measurement, because it reflects the
- Ratio-scale contributions of each of the data values in the group
o interval variable with a true zero point median and the mode
o duration of illness - less sensitive to "outliers"—data values at the extreme
- sometimes it is an advantage to have MCT that is less
II. FREQUENCY DISTRIBUTIONS sensitive to changes in extremes of the data
- displays the values a variable can take and the number of - eg, small number of outliers at one extreme → median is a
persons or records with each value better MCT than mean
- For example, data from a study of women with ovarian categorical variables
cancer and number of times each woman has given birth - best MCT is the most frequent outcome (the mode)
(parity) – ratio-scale - eg, a survey on the most effective way to quit smoking →
To construct frequency table reasonable MCT of results – the method that works most
1. list all the values that the variable can take, from lowest frequently
value to highest If data contains more than one mode
- summarizing them with mean or median will obscure this fact
- Median. Notice how the data in this graph is non-
Data: Groups, or classes of things. Survey results often fall in this symmetrical. The peak of the data is not centered, and the
category, such as, "What is the most effective way to quit smoking?" or body mass values fall off more sharply on the left of the peak
"Gender Differences in After-School Activities" than on the right. When the peak is shifted like this to one
- best measure of central tendency: side or the other, we call it skewed data. For skewed data,
- Mode. In these made-up survey results, 'cold turkey' is the the median is the best choice to measure central tendency.
most frequent response The median body mass for this skewed population is 185
grams.
Data: Position on a ranking scale, such as: 1-5 stars for movies, books,
or restaurants - Notice how this graph has two peaks. We call data with two
- Median. The median movie ranking in this survey was 2.3 prominent peaks bimodal data. In the case of a bimodal
stars. distribution, you may have two populations, each with its
own separate central tendency. Here one group has a mean
body mass of 147 grams and the other has a mean body
mass of 178 grams.
3. Chart result
1. RANGE - tables, line graphs or bar charts to get look at the big picture
- simplest of the three measures. - It depends on the kind of questions the assessments are
- defined by the smallest and largest data values in the set needed to answer
- The range of data set 1 is 3–8 - Tips
- only minimal information about the spread of the data, by o AVOID complex statistics
defining the two extremes o Use round numbers
- It says nothing about how the data are distributed between o simple charts – easier to read and understand
those two endpoints o Sort results from highest to lowest [optional]
o Percentages – more meaningful than averages
2. VARIANCE, σ2 o Show trend data if assessing over time
- measure of how far each value in data set is from the mean Example 1: Table with percentages added, column with total %
- defined by: students successful (Exemplary + Good + Minimally Acceptable). N=18
1. Subtract the mean from each value in the data → Target=78%
measure of the distance of each value from mean
2. Square each of these distances (so that they are
all positive values), and add all the squares
3. Divide the sum of squares by the number of
values in the data set
3. STANDARD DEVIATION, σ
- simply the (positive) square root of the variance
variance and standard deviation
- provide a numerical summary of how much the data are
scattered
Example 2: Line chart using data from tally above with target the - Focus on most important findings
program hope to achieve. - Use data and results to justify conclusions
- Be careful how you describe your results
- Did you really prove your hypothesis or did you just find
evidence supporting it
- Ask audience for questions or comments. They may have a
different and equally valid interpretation of your results