SPREAD

SPREAD & VARIABILITY: VARIABILITY refers to how “spread out” a group score is.
To see what we mean by spread

out, let’s consider the graphs below which represent the scores of students in 2 quizzes
Based from the graphs, which between the two sets of scores are more spread out?
The mean score for each quiz is 7. Despite the equality of means, you can see that the distributions are quite
different. Specifically, the scores on Quiz 1 are more densely packed and those on Quiz 2 are more spread out. The
differences among students were much greater on Quiz 2 than on Quiz 1.
The terms variability, spread and dispersion are synonymous, and refer to how spread a distribution is. The most
frequently used measures of variability are: RANGE, VARIANCE & STANDARD DEVIATION. Variability describes how
far apart data points lie from each other and from the center of a distribution. Along with measures of central
tendency, measures of variability give you descriptive statistics that summarize your data. While the central
tendency, or average, tells you where most of your points lie, variability summarizes how far apart they are. This is
important because it tells you whether the points tend to be clustered around the center or more widely spread out.
Low variability is ideal because it means that you can better predict information about the population based on
sample data. High variability means that the values are less consistent, so it’s harder to make predictions. Data sets
can have the same central tendency but different levels of variability or vice versa. If you know only the central
tendency or the variability, you can’t say anything about the other aspect. Both of them together give you a complete
picture of your data. Example: variability in normal distributions
You are investigating the amounts of time spent on phones daily by different groups of people.
Using simple random samples, you collect data from 3 groups:
Sample A: high school students

Sample B: college students
Sample C: adult full-time employees
All three of your samples have the same average phone use, at 195 minutes or 3 hours and 15 minutes. This is the x-
axis value where the peak of the curves are. Although the data follows a normal distribution, each sample has
different spreads. Sample A has the largest variability while sample C has the smallest variability.
1. RANGE is the distance covered by the scores in a distribution (from smallest value to highest value). It is also
defined as the simplest measure of variability.
• The problem with using the range is that it is extremely sensitive to outliers, and one number far away from
the rest of the data will greatly alter the value of the range. It is considered as imprecise, unreliable measure
of variability.
Variance vs. Standard Deviation
The standard deviation is derived from variance and tells you, on average, how far each value lies from the
mean. It’s the square root of variance.
Both measures reflect variability in a distribution, but their units differ:
Standard deviation is expressed in the same units as the original values (e.g., meters).
Variance is expressed in much larger units (e.g., meters squared)
Since the units of variance are much larger than those of a typical value of a data set, it’s harder to interpret the
variance number intuitively. That’s why standard deviation is often preferred as a main measure of variability.
However, the variance is more informative about variability than the standard deviation, and it’s used in making
statistical inferences.
You and your friends have just measured the heights of your dogs (in millimeters)
The heights (at the shoulders) are: 600mm, 470mm, 170mm, 430mm and 300mm.
Find out the Range the Mean, the Variance, and the Standard Deviation.
So, the mean (average) height of the dogs is

394 mm. Plotting it on the chart.
Let’s get the difference of the heights from

the mean and present it thru graph.
X = 600 + 430 + 170 + 300 + 470
X = 394 mm
Finding the variance
Variance = 2 = (X – X)2
(X – X)2 = 2062 + 762 + (224)2 + 362 + (94)2
(X – X)2= 42 436 + 5 776 + 50 176 + 1 296 + 8 836
Variance = 2 = (X – X)2

N = 21 704
Standard Deviation:
 = 21 704
= 147.32
 = 147 mm
Now we can show which heights are within one Standard Deviation (147mm) of the Mean.
So, using the Standard Deviation we have a "standard" way of knowing what is normal, and what is extra large or
extra small. Rottweilers are tall dogs. And Dachshunds are a bit short, right?
Using the table:
X (heights of the dogs in /X – / (X – )2

mm)
170 224 2242 = 50 176
300 94 942 = 8 836
430 36 362 = 1 296
470 76 762 = 5 776
600 206 2062= 42 436
X = 1 970 /X – / = 636 (X – )2 = 108 520
Find the mean
= 1970  5
 = 394
Solutions:
1. Range = HS – LS = 600 – 170 = 430 mm
2. Variance = 2 = (X – X)2 = 108 520 = 21 704
N 5
2. Standard deviation =  = (X – X)2
 = 108 520
 = 147.32 = 147 mm
X
Given the data below, find the following: 9
a. Range 8
b. Mean Absolute Deviation 6
4
c. What does the computed value of the Standard Deviation mean?
2
Given the data below, find the following:
a. Range = 9 – 1 = 8
b. Mean Absolute Deviation
c. What does the computed value of the Standard Deviation mean?
d. /X – X/ = 16
e. b. MAD = /X – X/ = 16 = 2.67
n 6
f.  = 52 = 2.94
g. 6
Standardly, the scores are deviated from the mean by 2.94 or 3.22
The coefficient of variation (CV) is a measure of relative variability. It is the ratio of the standard deviation to
the mean (average). For example, the expression “The standard deviation is 15% of the mean” is a CV.
The CV is particularly useful when you want to compare results from two different surveys or tests that have
different measures or values. For example, if you are comparing the results from two tests that have different scoring
mechanisms. If sample A has a CV of 12% and sample B has a CV of 25%, you would say that sample B has more
variation, relative to its mean.
COEFFICIENT OF VARIATION (CV)

 it is the ratio of the standard deviation to the mean
this is used to compare the variability of two or more sets of data even when they are expressed in different units of
measurement
Formula: CV = s (100%)
The higher the coefficient of variation, the greater the level of dispersion around the mean. It is generally
expressed as a percentage. Without units, it allows for comparison between distributions of values whose scales of
measurement are not comparable.
When we are presented with estimated values, the CV relates the standard deviation of the estimate to the
value of this estimate. The lower the value of the coefficient of variation, the more precise the estimate.
Distributions with a CV of less than 1 are considered to be low-variance, whereas those with CV higher than 1
are considered to be high-variance.
A researcher is comparing two multiple-choice tests with different conditions. In the first test, a typical
multiple-choice test is administered. In the second test, alternative choices (i.e. incorrect answers) are randomly
assigned to test takers. The results from the two tests are:
Regular Test Randomized Answers
Mean 59.9 44.8
SD 10.2 12.7
a. Which test has a larger variation from the mean? RA
b. If you were asked to take an exam, which is better for you to take?
CV = s (100%)
Regular test: CV = 10.2 (100) = 17.03% = 0.1703
59.9
Randomized Test: CV = 12.7 (100) = 28.35% = 0.2835
44.8
a. The randomized test has a larger variation from the mean since its CV is greater.
b. If I were asked to choose which exam to take, I will choose the regular test since it has a lower CV.
2. The mean score of a Statistics test of Class A is 80 with a standard deviation of 12 while Class B has a mean
score of 88 with a standard deviation of 15.
a. Which class has a larger variation from the mean? Class B
b. Who has a better performance? Class A
CVA = 12 (100) = 15%
80
CVB = 15 (100) = 17.05%
88
3. Workers X and Y are assigned to the same job. The table below shows the results of their work performance over a
long period of time.
Mean time of completing job (in hours) Standard deviation (in hours)
Worker X 7 2
Worker Y 6.5 2
a. Who is more efficient from an overall point of view? Both
b. Who appears to be more consistent in performance? X
CV = 2 (100) = 28.57% CV = 2 (100) = 30.77
7 6.5
A measure of position is a method by which the position that a particular data value has within a given data set
can be identified. As with other types of measures, there is more than one approach to defining such a measure.
Although quantiles have a fairly simple conceptual interpretation, computation is entirely another matter. Many
different rules and formulas are in use, and none of them have become the overwhelming standard. If every data set
was infinite and every variable continuous, then all of the rules and formulas in use would give the same result.
Finite data sets and discrete variables generate issues usually not found in the infinite continuous case.
• Should a percentile give the percentage of data "at or below" rather than just "below"?
• Should the zeroth and hundredth percentiles be defined or undefined?
• In small data sets, how should the selection of an intermediate location be made (by rounding, or by
interpolation)?
• Should the first quartile be equal to the 25th percentile?

FRACTILES or QUANTILES divide the distribution into several equal parts
1. QUARTILES divide the distribution into 4 equal parts
2. DECILES divide the distribution into 10 equal parts
3. PERCENTILES divide the distribution into 100 equal parts
NOTE: The Q2 = D5 = P50. These measurements are also equal to the value of the median.
Below is the frequency distribution of the results for the entrance examination scores of 60 students
a. Compute for the standard deviation. CI f

b. What is the score just above the median? 18 – 26 6
th
c. What is the value of the 6 deciles? 27 – 35 11
d. Compute for the quartile deviation
36 – 44 17
CI f Xm f(Xm)
45 – 53 14
18 – 26 6 22 132 54 – 62 8
27 – 35 11 31 341 63 – 71 3
72 – 80 1
36 – 44 17 40 680
i=9 n = 60
45 – 53 14 49 686
54 – 62 8 58 464
63 – 71 3 67 201
72 – 80 1 76 76
i=9 n = 60 ∑f(Xm) = 2580

SOLUTION EXAMPLE CI f Xm fXm /Xm – X/
1:
18 – 26 6 22 132 21
X = ∑fXm = 2580 = 43
27 – 35 11 31 341 12
n 60
36 – 44 17 40 680 3
45 – 53 14 49 686 6
54 – 62 8 58 464 15
63 – 71 3 67 201 24
72 – 80 1 76 76 33
i=9 n = 60 ∑f(Xm) = 2580
CI f Xm f(Xm) (Xm – X) (Xm –X)2 f(Xm – X)2
18 – 26 6 22 132 21 441 2646
27 – 35 11 31 341 12 144 1584
36 – 44 17 40 680 3 9 153
45 – 53 14 49 686 6 36 504
54 – 62 8 58 464 15 225 1800
63 – 71 3 67 201 24 576 1728
72 – 80 1 76 76 33 1089 1089
i=9 n = 60 ∑f(Xm) = 2580 ∑f(Xm – X)2 =
9504
MEASURES OF VARIABILITY: GROUPED DATA SOLUTION EXAMPLE 1:
s = ∑f(Xm – X)2 = 9504 = 12.69
n–1 59
∑f(xm – x)2 = 9504
n = 60
Standardly, the scores are deviated from the mean by 12.69
b. Value just above the median
P51 = 51n = 51()60 = 30.6 (Locate 30.6 to the <cf)
100 100
51n
P51 = LB + /100 – <cf i
Q1 = 35.5 + 30.6 – 17 9 = 42.7
17
c. D6 = 6n = 6(60) = 36 (Locate 36 to the <cf)
10 10
6n
D6 = LB + /10 – <cf i
Q1 = 44.5 + 36 – 34 9 = 45.79
14
d. Q1 = n = 60 = 15 (Locate 15 to the <cf)
4 4
Q1 = LB + n/4 – <cf i
Q1 = 26.5 + 15 – 6 9 = 33.86
11
d. Q3 = 3n = 3(60) = 45
4 4
3n
Q3 = LB + /4 – <cf i
Q3 = 44.5 + 45 – 34 9 = 51.57
14
d. QD = Q3 – Q1
QD = 51.57 – 33.86 = 8.86
CI f <cf d’ fd’ d’2 fd’2
18 – 26 6 6 0 0 0 0
27 – 35 11 17 1 11 1 11
36 – 44 17 34 2 34 4 68
45 – 53 14 48 3 42 9 126
54 – 62 8 56 4 32 16 128
63 – 71 3 59 5 15 25 75
72 – 80 1 60 6 6 36 36
i=9 n = 60 ∑fd’ = 140 ∑fd’2 = 444
The table below shows the frequency distribution of the wages per day of the laborers in a certain eatery.
CI (Wages) f
d’ fd’ d’2 f(d’2) <cf
180 – 184 8 2 16 4 32 50
175 – 179 12 1 12 1 12 42
170 – 174 10 0 0 0 0 30
165 – 169 13 -1 -13 1 13 20
160 – 164 7 -2 -14 4 28 7
i=5 n = 50 ∑fd’ = 1 ∑fd’2 = 85
STATISTICS is a scientific body of knowledge/branch of Mathematics that deals with the theory and method of
collecting, tabulating or presenting (summarizing or organizing), analyzing and interpreting numerical data
STATISTICS is a scientific body of knowledge/branch of Mathematics that deals with the theory and method of
collecting, tabulating or presenting (summarizing or organizing), analyzing and interpreting numerical data
Collection of Data is the process of gathering and obtaining numerical data
Tabulation or Presentation of Data involves summarizing of data or information in textual, graphical or tabular
method
Analysis of Data involves describing the data by using statistical methods & procedures
Analysis of data is also the process of extracting from the given data relevant information from which numerical
descriptions can be formulated
Interpretation of Data refers to the process of making or drawing conclusions based on the analyzed data
DESCRIPTIVE STATISTICS
 is a statistical procedure concerned with describing the characteristics and properties of a group of persons,
places, or things
 it involves gathering, organizing, presenting and describing the data
 is used to synopsize data from a sample exercising the mean or standard deviation
INFERENTIAL STATISTICS
 it uses sample data to make inferences about a population
 it is calculated with the purpose of generalizing the findings from samples to populations it represents,
performing hypothesis testing, determining relationships among variables and making decisions
 this kind of statistics uses the concept of probability --- the chance of an event to happen
PARAMETRIC TESTS assume a normal distribution of values. Parametric statistics are the most common type of
inferential statistics
PARAMETIRC TESTS make assumptions about a population or a data set.
NONPARAMETRIC TESTS make fewer assumptions about the data set.
NON-PARAMETRIC TESTS are used in cases where parametric tests are not appropriate and are used when the
distribution is skewed, the distribution is not known or the sample size is too small (n < 30)
Parametric Test for Independent Measures between Two Groups: A T-test is used to compare between the means of
two data sets, when the data is normally distributed.
Parametric Correlation Test: Pearson Product-Moment Correlation is a common parametric method of measuring
correlation between two variables.
The Spearman Rank Correlation is similar to the Pearson coefficient, but it is used when data are ordinal (usually
categorical data, set into a position on some kind.
The Mann-Whitney Test is used to compare the means between two groups of ordinal data.

POPULATION {N} refers to a large collection of objects, persons, places or things or a group of phenomena that have
something in common
The mean income of the subscribers of ABS-CBN TV Plus
The daily maximum temperatures in February for major Isabela towns
The number of registered voters in Region 02
SAMPLE {n} is a small portion or a subset of a population. SAMPLE is a representative group drawn from a sample.
The mean income of 250 subscribers of ABS-CBN TV Plus
The GPA of a freshmen class
PARAMETER is any numerical or nominal measurement describing some characteristics of a population. PARAMETER is
any summary number, like an average or percentage, that describes the entire population
The average weight of all middle-aged female Filipinos
The proportion of likely Filipino voters approving the president’s job
STATISTIC is an estimate of a parameter, it is a value/measurement obtained from a sample

The mean grade of a random sample of 60 Grade 12 students
A group of 100 students from a school with 4, 500 students
DATA are facts or a set of information or observation under study. DATA are individual pieces of factual information
recorded and used for the purpose of analysis
VARIABLE
 is a characteristic or property of a population or sample which makes the numbers different from each other
 it also refers to the observable phenomena of a person or object whereby the members of the group or set vary
from one another
 this is considered to be the raw data or materials gathered by a researcher or investigator for statistical analysis
OBSERVATIONAL DATA are captured thru observation of a behavior or activity.

Data are collected using human observation, open-ended surveys, or the use of an instrument to monitor and record
information.
Because observational data are captured in real time, it would be very difficult or impossible to re-create if lost.
EXPERIMENTAL DATA are collected thru active intervention by the researcher to produce and measure change or to
create difference when the variable is altered. These type of data are often reproducible, but they often can be
expensive to do so.
SIMULATION DATA are generated by imitating the operation of a real-world process or system over-time using computer
test models.
For example, to predict weather conditions, economic models, chemical reactions, or seismic activity.
This method is used to try to determine what would or could happen under certain condition.
DERIVED/COMPILED DATA involves using existing data points, often from different data sources, to create new data thru
some sort of transformation such as arithmetic formula or aggregation .
For example, combining the twin cities metro area to create population density data.
QUALITATIVE DATA assume values that manifest the concept of attributes or categories, thus, they are sometimes
called categorical data
The notes taken during a focus group on the quality of a certain fast food chain
Responses from an open-ended questionnaire
Strands/courses of senior high school students
QUANTITATIVE DATA are data that are numerical in nature which are obtained through measuring or counting.
QUANTITATIVE DATA are used when a researcher is trying to address the “WHAT” or “HOW MANY” aspects of a
research question.
DISCRETE DATA only used in whole numbers and it continues and assume exact values only and can be obtained
through counting
CONTINUOUS DATA can take any value on an interval of real numbers and can assume all values between any two
specific values through measuring
An INDEPENDENT VARIABLE, sometimes called an EXPERIMENTAL or PREDICTOR variable, is a variable that is being
manipulated in an experiment in order to observe the effect on a DEPENDENT VARIABLE, sometimes called an
OUTCOME variable.
DEPENDENT VARIABLE
Test Mark (measured from 0-100)
INDEPENDENT VARIABLE
Revision Time (measured in hours)
Intelligence (measured using IQ score)
Independent Variable Dependent Variable
Type of treatment: different types of Behavioral variables: measures of adjustment, activity

drug/psychological treatments levels, eating behavior, smoking behavior
Treatment factors: brief vs. long-term treatment, in- Physiological variables: measures of physiological
patient vs. out-patient treatment responses such as heart rate, blood pressure and brain
wave activity
Experimental manipulations: types of beverage Self-report variables: measures of anxiety, mood, or

consumed (alcoholic vs. non-alcoholic marital/life satisfaction
PRIMARY SOURCE
 a source of data from which firsthand information is obtained usually by means of personal interview and actual
observation
 data gathered from this are called PRIMARY DATA
SECONDARY SOURCE
 a source of data where information is taken from other people’s works
 SECONDARY DATA may be gathered from magazines, newspapers, television, radios, internet, etc.
DIRECT METHOD or INTERVIEW METHOD
– is one of the most effective methods of collecting original data or accurate responses
• Gathering of data may be done through a personal encounter between the interviewer and the interviewee.
Example: Interviewing teacher applicants
INDIRECT METHOD or QUESTIONNAIRE METHOD
– it is one of the easiest methods of gathering data
• Data gathering by means of getting information with the use of written questionnaires.
Example: Identifying the study habits of 1, 500 students of ISU.
REGISTRATION METHOD
– it imposed/enforced by private organizations and government agencies
• Data gathering may be done by asking complied files from different offices or organizations.
Example: Getting the number of LET passers in CoEd for the past 2
years
OBSERVATION METHOD
– is a specific method of investigation that makes possible use of all senses to measure or obtain outcomes/responses
from the object of study
Example: Determining the stimuli that causes a mentally-ill patient to go wild all of a
sudden.
EXPERIMENTATION METHOD
– used when the objective is to determine the cause and effect of a certain phenomenon under some controlled
conditions
Example: Use of technology in teaching Statistics.
TEXTUAL FORM – data are presented in paragraph form
TABULAR FORM – it is a more effective way of presenting relationships or comparisons of numerical data
 it provides a more precise, systematic and orderly presentation of data in rows and columns through the use
of tables
 It is brief; it reduces the matter to the minimum.
 It provides the reader a good grasp of the meaning of the quantitative relationship indicated in the report.
 It tells the whole story without the necessity of mixing textual matter with figure.
 The systematic arrangement of columns and rows makes them easily read and readily understood.
 The columns and rows make comparison easier.
 GRAPHICAL FORM – is the most effective form of presenting data for it uses visual form where important
relationships are brought out more clearly in pictorial form
 Types: Line Graph, Bar Graph, Pie Chart, Scatter-Point Diagram, Pictogram,
LINE GRAPH – it shows relationships between two sets of quantities using a straight line
BAR GRAPH – it consists of rectangles of equal widths, either drawn vertically or horizontally, segmented or non –
segmented
PICTURE GRAPH or PICTOGRAM – it is a visual presentation of statistical quantitative by means of drawing pictures or
symbols related to the subject under study
– legends are sometimes used to represent magnitude of a single unit
MAP GRAPH or CARTOGRAPH – it is one of the best ways to present geographical data
-this kind of graph is always accompanied by a legend which tells us the meaning of the lines, colors or other symbols
used and positioned in a map
SCATTER POINT DIAGRAM – it is a graphical device to show the degree of relationship between two quantitative
variables
CIRCLE GRAPH or PIE CHART – it represents relationships of the different components of a single total as revealed in the
sectors of a circle where the angles or size of the sectors should be proportional to the percentage components of the
data which gives a total of 100%
Scale of Measurement
 It is a classification that describes the nature of information within the values assigned to variables.
 It gives the importance of the data under observation

NOMINAL SCALE
 this is the most primitive level of measurement the nominal level of measurement is used when we want to
distinguish one object from another for identification purposes
 it is the most limited and simplest type of measurement
EXAMPLES: Gender of a patient
Course of students in ISU
Religion of Teachers
ORDINAL SCALE
 it does not only classify items but also gives the order or ranks of classes, items or objects, however it does not
say anything about the differences in two positions with ranking
EXAMPLES: Stages of Diseases
Rank in the class of a student
Views about some political issues
(Totally Agree, Mostly Agree, Disagree)
INTERVAL SCALE
 it is the same as the ordinal level, with an additional property that we can determine meaningful amounts of
differences between the data
 data at this level may lack an inherent starting point
EXAMPLES: The Score of IQ Test
Salary of an Employee
Temperature measured in ⁰C
RATIO SCALE
 it can differentiate between any two classes but it always starts from an absolute or true zero point or a true
zero exists
 it is also the highest level of measurement
EXAMPLES: Driving Speed
Time taken for completing a task
Weights of students
Distance travelled by a tourist
SAMPLING TECHNIQUES
 are utilized to test the validity of conclusions or inferences from the sample to the population
A. RANDOM SAMPLING
 most basic method of drawing probability sample

 it is a method of selecting a sample size{n} from a populations such that each member of the population has
an equal chance of being included in the sample
SYSTEMATIC RANDOM SAMPING
 drawing every nth/kth element in a population
STRATIFIED RANDOM SAMPLING
 is done by dividing the population into categories or strata and getting the members at random proportionate
to each stratum or sub–group
COMPUTATION OF SAMPLE SIZE
In computing a sample size, the SLOVIN’s formula below is usually used:
The Slovin’s formula is being used to calculate an appropriate sample size from a population
1. A professor of a certain college institution was commissioned by his dean to conduct an inquiry with regards to the
efficiency of all faculty members. If there are 350 faculty members, then determine the sample size he should
consider if he wants to have a margin of error of only 1%?
2. Mr. Andrews is conducting an inquiry regarding the Study Habits of Nursing students. If there are 975 students and
he wants to allow the margin of error of 5%, what would be the sample size that he should he
take?
N = 975 e = 5%
n = 975 ÷ (1 + 975 x 0.052)
n = 283.64 or 284
3. The table below shows the population of 4 different courses of a particular college.
Course Number of students
Psychology 150
BSE 370
BTVTED 500
BTLED 380
a. How many sample size must be drawn?
b. How many students must be considered as respondents for Psychology?
c. What is the total number of respondents that must be used for BTVTED?
a. How many sample size must be drawn?

N = 1400 e = 5% = 0.05
n = 1400  (1 + (1400 x 0.052)) = 311
b. How many residents must be considered as respondents for Psychology?
N = 1400 n = 311 f = 150 e = 5% = 0.05
fi = (f  N) (n)
fPsy = (150  1400)(311) = 33.32 = 33
c. What is the total number of respondents that must be used for BTVTED?
N = 1400 n = 311 fB = 500 e = 5% = 0.05
fBTVTED = (500  1400)(311) = 111.07 = 111
LOTTERY/FISHBOWL SAMPLING
 this is done by simply writing the names or numbers of all the member of the population in a small rolled
pieces of paper which are later placed in a container
SAMPLING with the USE of TABLE of RANDOM NUMBERS
 it contains rows and columns of digits randomly ordered by a computer
CLUSTER SAMPLING
 advantageous procedures if the population, with common characteristics, is spread out over a wide
geographical area
 it also means as a practical sampling technique used if the complete list of the members of the population is
not available
MULTI-STAGE SAMPLING
 it is the most complex sampling technique
 it is an extension or a multiple application of the stratified random sampling technique
B. NON–RANDOM SAMPLING TECHNIQUE
 in this method, not all elements are given equal opportunities to be selected as sample
PURPOSIVE or JUDGMENTAL SAMPLING
 this method is also referred as non–probability sampling
 it plays a major role in the selection of a particular item and/or in making decision in cases of incomplete
responses or observation
 this is usually based on the criteria laid down by the researcher
CONVENIENCE SAMPLING
 this method has been widely used in television and radio programs to find out opinions of TV viewers and
listeners regarding controversial issue
 is a non–probability sampling technique where the subjects are selected because of their convenient
accessibility and proximity to the researcher
QUOTA SAMPLING
 this is a relatively quick and inexpensive method to operate since the choice of the number of persons or
elements to be included in a sample is done at the researcher’s own convenience or preference and is not
predetermined by some carefully operated randomizing plan
INCIDENTAL SAMPLING
 this design is applied to those samples which are taken because they are the most available
 the investigator simply takes the nearest individuals as subjects of the study until it reaches the desired
Example: The researcher can simply choose to ask those people around him or in a coffee shop where he is taking a
break.
FREQUENCY DISTRIBUTION
 is a tabular arrangement of data by classes, showing the number of observation falling under a class–interval
PARTS OF A FREQUENCY DISTRIBUTION
1. CLASS INTERVAL (CI)– is a grouping or category defined by a lower limit and an upper limit
CLASS SIZE (i) – class width of the distribution which gives the distance between the lower and the upper limit of 3.
CLASS LIMITS – end numbers of a class interval
class interval
4. CLASS BOUNDARIES – are more precise expressions of the class limits by at least 0.5
 they are considered to be the true class limits for they leave no gap
CLASS FREQUENCY – number of observation falling within a class interval
CLASS MARKS – are the midpoint of class intervals
 these are obtained by getting the average of the lower limit and the upper limit
CUMULATIVE FREQUENCY DISTRIBUTION – is a tabular arrangement of data by class interval whose frequency is
cumulated
Cumulative frequency is a table which shows the number of cases falling below or above a particular value
RELATIVE FREQUENCY DISTRIBUTION – it is also known as the percentage distribution
Relative Frequency – is obtained by dividing the frequency of each interval by the total frequency and always in
percent form

SPREAD

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

SPREAD

Uploaded by

Copyright:

Available Formats

SPREAD & VARIABILITY: VARIABILITY refers to how “spread out” a group score is.

To see what we mean by spread

Sample A: high school students

Variance vs. Standard Deviation

Both measures reflect variability in a distribution, but their units differ:

Variance is expressed in much larger units (e.g., meters squared)

So, the mean (average) height of the dogs is

Let’s get the difference of the heights from

X = 600 + 430 + 170 + 300 + 470

Finding the variance

Variance = 2 = (X – X)2

(X – X)2 = 2062 + 762 + (224)2 + 362 + (94)2

(X – X)2= 42 436 + 5 776 + 50 176 + 1 296 + 8 836

Variance = 2 = (X – X)2

Using the table:

X (heights of the dogs in /X – / (X – )2

170 224 2242 = 50 176

300 94 942 = 8 836

430 36 362 = 1 296

470 76 762 = 5 776

600 206 2062= 42 436

X = 1 970 /X – / = 636 (X – )2 = 108 520

Find the mean

1. Range = HS – LS = 600 – 170 = 430 mm

2. Variance = 2 = (X – X)2 = 108 520 = 21 704

2. Standard deviation =  = (X – X)2

b. Mean Absolute Deviation 6

Given the data below, find the following:

b. Mean Absolute Deviation

c. What does the computed value of the Standard Deviation mean?

e. b. MAD = /X – X/ = 16 = 2.67

COEFFICIENT OF VARIATION (CV)

Regular Test Randomized Answers

Mean 59.9 44.8

a. Which test has a larger variation from the mean? RA

Regular test: CV = 10.2 (100) = 17.03% = 0.1703

Randomized Test: CV = 12.7 (100) = 28.35% = 0.2835

a. Which class has a larger variation from the mean? Class B

b. Who has a better performance? Class A

CVA = 12 (100) = 15%

a. Who is more efficient from an overall point of view? Both

b. Who appears to be more consistent in performance? X

CV = 2 (100) = 28.57% CV = 2 (100) = 30.77

• Should the zeroth and hundredth percentiles be defined or undefined?

• Should the first quartile be equal to the 25th percentile?

1. QUARTILES divide the distribution into 4 equal parts

2. DECILES divide the distribution into 10 equal parts

3. PERCENTILES divide the distribution into 100 equal parts

a. Compute for the standard deviation. CI f

i=9 n = 60 ∑f(Xm) = 2580

i=9 n = 60 ∑f(Xm) = 2580

CI f Xm f(Xm) (Xm – X) (Xm –X)2 f(Xm – X)2

18 – 26 6 22 132 21 441 2646

27 – 35 11 31 341 12 144 1584

54 – 62 8 58 464 15 225 1800

63 – 71 3 67 201 24 576 1728

i=9 n = 60 ∑f(Xm) = 2580 ∑f(Xm – X)2 =

MEASURES OF VARIABILITY: GROUPED DATA SOLUTION EXAMPLE 1:

s = ∑f(Xm – X)2 = 9504 = 12.69

∑f(xm – x)2 = 9504

Standardly, the scores are deviated from the mean by 12.69

b. Value just above the median

P51 = 51n = 51()60 = 30.6 (Locate 30.6 to the <cf)

Q1 = 35.5 + 30.6 – 17 9 = 42.7