Professional Documents
Culture Documents
studied. e.g. 15 million Filipinos, Children ages 6-10 who attend public school
LECTURE 1: LEVELS OF MEASUREMENT
Statistics - is a branch of mathematics working with data collection,
organization, analysis, interpretation, and presentation (Muhrey, 2008).
- is the science of conducting studies to collect, organize, summarize, analyze,
and draw conclusions from data (Bluman, 2012).
- common definition: process of data analysis, not just analyzing data but the
whole process of using scientific method. Includes research design, data
collection, organization, interpretation, and presentation of data.
- main goal: use data in answering questions and making decisions
Two Areas of Statistics:
• Mathematical Statistics – involves the area in mathematics;
involved in development of statistical inference. Application of that A sample is a group of subjects selected from a population. E.g. 450 Filipino
statistical inference is turned into applied statistics. men
• Applied Statistics - a subset of a population
Biostatistics - is the application of statistics to problems in the biological Hypothesis testing - decision-making process for evaluating claims about a
sciences, health, and medicine. population, based on information obtained from samples.
- is a branch of applied statistics for statistical methods, whether existing, new,
or applied to medical sciences or biological sciences, to health and medicine For example, a researcher may wish
• A tool for decision making to know if a new drug will reduce the
VARIABLE AND DATA number of heart attacks in men over
Example: 70 years of age. For this study, two
In your research problem, you want to know if there is an association between groups of men over 70 would be
gender and intellectual level of secondary students studying in a public school selected. One group would be given
- What/who is the subject of your study? Secondary students in public school the drug, and the other would be
- Next thing you must do is gather information about your subjects, this info given a placebo (a substance with no
will serve as your data. medical benefits or harm). Later, the number of heart attacks occurring in each
• Research subjects – ones that provide you data group of men would be counted, a statistical test would be run, and a decision
- What data would you want to obtain or measure? Must be in conjunction with would be made about the effectiveness of the drug.
your research problem Variables and Types of Data
- For you to obtain the data, you must first identify the variable.
Variable -a characteristic or attribute that can assume a different value; e.g.
Gender, Intelligence Quotient
Once you identify your variables, you can now gather data.
Gender IQ
1 Male 9
2 Female 8
3 Male 8
4 Female 7
5 Female 8
Qualitative variables are variables that can be placed into distinct categories,
• Numbers represent subject (Subject 1-5) according to some characteristic or attribute.
• Each row represents the information you gathered from each of - more on nominal data in which mathematics cannot be applied
your subject. This info is your data. - “men” vs ‘women”
• This set of data is your data set - “Patricia””Louie”
• A single data is known as a datum - names. gender, eye color, flavors, brands, etc.
• Data - values (measurements or observations) that a variable can assume Quantitative Data
• Data set - collection of data • Discrete variables assume values that can be counted.
USE OF DATA - usually a whole number
1. Descriptive Statistics - e.g. number of children in the family, number of students in a
2. Inferential Statistics classroom, and numbers of pizza slices
DESCRIPTIVE STATISTICS VS INFERENTIAL STATISTICS • Continuous variables can assume an infinite number of values
DESCRIPTIVE - consists of the collection, organization, summarization, and between any two specific values. They are obtained by measuring.
presentation of data. They often include fractions and decimals.
- describes the data - BMI, glucose level, hemoglobin level
* Before you make analysis of the data, you need first to describe the data. LEVELS OR SCALES OF MEASUREMENT
- main objective: make the data presentable and easy to understand. • The nominal level of measurement classifies data into mutually exclusive
INFERENTIAL - consists of generalizing from samples to populations, (nonoverlapping) categories in which no order or ranking can be imposed on
performing estimations and hypothesis tests, determining relationships among the data.
variables, and making predictions. - includes types (types of cars, cellphones, gender)
- analytical type of statistics - no order or rank, mutual category (walang mas mataas)
- Goal: to test the hypothesis to prove claim - yes or no, positive, or negative
- utilizes tests for relationship, effect, difference (RED) - pass or fail
- Uses probability • The ordinal level of measurement classifies data into categories that can be
Population ranked; however, precise differences between the ranks do not exist.
- the order or ranking matters
- satisfied, moderately satisfied, very satisfied
- cancer stages
- high, moderate, low Categories of Data
- a-grade, b-grade, c-grade • According to source
- no precise measurement between the differences ✓ Primary: data collected first-hand
• The interval level of measurement ranks data, and precise differences - collected through interview, FGD or Focus Group
between units of measure do exist; however, there is no meaningful zero. Discussion, self-administered questionnaire, observation
- with order and has equal differences because of a certain measurement ✓ Secondary: have been previously collected, gathered, which
- no true zero, zero is arbitrary may have been published for some other purposes
- temperature, there is no such thing as no temperature - RRL
• The ratio level of measurement possesses all the characteristics of • According to relationship
interval measurement, and there exists a true zero. In addition, ✓ Independent
true ratios exist when the same variable is measured on two ✓ Dependent
different members of the population. • Use
- there is a true zero ✓ Nominal - classification, no order
- weight, bmi, hemoglobin levels - blood groups, patient ID number
- continuous data ✓ Ordinal-ranking; no absolute value but only order; discrete
- 250 kilometers ✓ Interval-score/ mark; no absolute zero
✓ Ratio - has absolute zero, continuous
Example of Secondary Data Sources
• Census - complete enumeration of population, best source of data
on population size and distribution according to age
• Vital events - civil status and deaths, morbidity, and mortality rate
• Reports of occurrence of notifiable diseases - surveillance/
monitoring
• Logbooks – recorded file in which you could acquire secondary
data sources
Methods of Data Collection
• Documented sources
• Sample Survey
• Census Survey
• Physical Observation
• Interview
Qualities of a good statistical data
• Timeliness
• Completeness
- Completeness of coverage - geography and inclusion of target
population
- Completeness of accomplishing
• Accuracy - reflection of true situation
• Precision - repeatability
• Relevance
• Adequacy
Sampling Human Populations
• Act of studying or examining only a segment of the population representing
the whole
Advantages of Sampling
• Cheaper
• Faster
• Better quality
• More comprehensive data may be obtained
Uses of Sampling in Public Health
• Prevalence Survey - evaluating health status of a population
- how many numbers of patients have anemia?
LECTURE 2: DATA COLLECTION AND SAMPLING • Risk Factors Investigation - identify risk factors
Research and Statistics - Framingham Study, relationships: how high blood pressure and
• Research is a problem-solving activity high blood cholesterol be a major factor for you to have CVD
• Research follows scientific method of inquiry • Evaluating effectiveness of Health measures - health programs
• Research involves collection of data to answer scientific inquiry that will lead - effectiveness: is contraceptive still effective in preventing
us into an informed decision or discoveries pregnancy?
• Evaluating reliability and completeness of record systems
- current or past research become a baseline for new or other
research
Sampling
- studying only a segment of a population to represent the whole
Population: the entire pool from which statistical sample is drawn
Target Population: total group of individuals from which the sample might be
drawn
Data Collection
- a group from which representation, information is desired and to which 2. Accidental/ Haphazard sampling – the sample is made up to those who
interferences will be made come at hand or who is available
Sampling Units: units which are chosen in selecting the sample 3. Quota Sampling – samples of a fixed size are obtained from
- Parilla family as a sample to get glucose level predetermined subdivisions of the population
Sampling frame: collection or list of all the sampling units - counterpart of your stratified population
Elementary units or element: an object or a unit from which a person or which 4. Convenient Sampling – a study unit that is easily accessible are selected
a measurement is taken, or an observation is made sample
- among the Parilla family who will be extracted with blood 5. Snowball technique - for "hidden population"
- the sample is obtained by a process whereby an individual to be
included is identified by a member who was previously included (referrals)
Criteria for Selecting Good Sampling Design
• Sample obtained should be a representative
• Adequate
• Practical and feasible
• Economic and efficient
Sample Size Estimation
Slovin’s Formula – used to calculate the sample size given the population
size in a margin of error.
If a sample is taken from a population, a formula must be used to take into
account confidence levels and margin of error
Margin of error – statistic expressing the amount of random sampling error in
the result
- the larger the margin of error, the less confidence one should have that
result would reflect the result of a survey of the entire population
Example: you are given an age data of 1-10 so data must go around these
values
MEDIAN
• Divides the observations into two equal parts
- in finding, arrange data from lowest to highest value then select the middle
value
- If n is odd, the median is the middle number.
- If n is even, the median is the average of the 2 middle numbers.
Tables and Graphs – limited MEDIAN (MD)
Numbers – used for continuous data • may not be an actual observation in the data set
MEASURES OF LOCATION • can be applied in at least ordinal level
•A Measure of Location summarizes a data set by giving a "typical value" • a positional measure; not affected by extreme values
within the range of the data values that describes its location relative to entire
data set.
• Some Common Measures:
• Minimum, Maximum
• Central Tendency (MEAN, MEDIAN, MODE) MODE
• Percentiles, Deciles, Quartiles - occur most often in data set
MINIMUM AND MAXIMUM • occurs most frequently nominal average
• computation of the mode for ungrouped or raw data
- same mean but different distribution of data
S – absolute dispersion
- the closer the value from the mean, the smaller the SD and lesser variation
of data
- the further the data, the higher the SD
•Absolute Measures of Dispersion:
MODE
• can be used for qualitative as well as quantitative data
• Range
• may not be unique • Inter-quartile Range (difference between third and first quartile)
• not affected by extreme values • Variance (sigma^2 or SD^2)
• may not exist • Standard Deviation
- unimodal, bimodal, multimodal, no mode • Relative Measure of Dispersion:
- modal class • Coefficient of Variation (divide mean by the SD)
•Use the mean when: - percentage reporting (x100)
-sampling stability is desired RANGE
-other measures are to be computed • The difference between the maximum and minimum value in a data set, i.e.
•Use the median when: R= MAX-MIN
- the exact midpoint of the distribution is desired • Example: Pulse rates of 15 male residents of a certain village
- there are extreme observations
•Use the mode when:
- when the "typical" value is desired
- when the dataset is measured on a nominal scale
PERCENTILES
- most common way to report relative standing of a number PROPERTIES OF RANGE
Is the percentage of individuals in the data set who are below where your • The larger the value of the range, the more dispersed the
particular number is located observations are.
EXAMPLE • It is quick and easy to understand
•Suppose LJ was told that relative to the other scores on a certain test, • A rough measure of dispersion.
his score was the 90th percentile. STANDARD DEVIATION (SD)
• How to find the percentile? If K (number of scores) = 25 • most important measure of variation
- arrange data in order
• square root of Variance
- (.9 x 25) = 22.5 = 23
- start counting from left to right until you reach the 23 rd number, which is 98 • has the same units as the original data
43, 54, 56, 61, 62, 66, 68, 69, 70,71, 72, 77, 78, 79, 85, 87, 88, 89, 93.
95, 96, 98, 99, 99
- LJ’s score is 98 over 100
• This means that 90% of those who took the test had scores less than or
equal to LJ's score, while 10% had scores higher than LJ's.
DECILE
- suppose you have 1-100 values
- once you have data array, divide 100 into 10, every tenth value is the decile
• Divide an array into ten equal parts, each part having ten percent of the
distribution of the data values, denoted by Dj.
• The 1st decile is the 10th percentile; the 2nd decile is the 20th
percentile…
QUARTILES
• Divide an array into four equal parts, each part having 25% of the REMARKS ON STANDARD DEVIATION
distribution of the data values, denoted by Qj.
• If there is a large amount of variation, then on average, the data
• The 1st quartile is the 25th percentile; the 2nd quartile (considered as the
values will be far from the mean. Hence, the SD will be large.
median) is the 50th percentile, also the median and the 3rd quartile is the
75th percentile. • If there is only a small amount of variation, then on average, the
- these quartiles as well as minimum and maximum value makes up the box data values will be close to the mean. Hence, the SD will be
plot and whiskers plot small.
MEASURES OF VARIATION
MEASURES OF VARIATION
• A measure of variation is a single value that is used to describe the
spread of the distribution
• A measure of central tendency alone does not uniquely describe a
distribution
MEASURES OF KURTOSIS
•Describes the extent of peakedness or flatness of the distribution of the
data.
• Measured by coefficient of kurtosis (K) computed as,
BOX PLOT