You are on page 1of 30

LESSON 1 DATA MANAGEMENT Clarifications cannot be made if the respondent does not understand

the question.
 Data management is an administrative process that includes
acquiring, validating, storing, protecting, and processing -REGISTRATION METHOD
required data to ensure the accessibility, reliability, and Registration method refers to continuous, permanent, compulsory
timeliness of the data for its users. recording of the occurrence of vital events together with certain
 Data management is also the development, execution and identifying or descriptive characteristics concerning them, as provided
supervision of plans, policies, programs and practices that through the civil code, laws or regulations of each country.
control, protect, deliver and enhance the value of data and
information assets. -OBSERVATION METHOD

STATISTICS Is a way of collecting data through observing? Observation as a


data collection method can be structured or unstructured. In structured
 The practice or science of collecting and analyzing numerical or systematic observation, data collection is conducted using specific
data in large quantities, especially for the purpose of inferring variables and according to a pre-defined schedule.
proportions in a whole from those in a representative sample.
 A branch of mathematics dealing with the collection,
organization, presentation, analysis and interpretation of data. -EXPERIMENTAL METHOD

DATA COLLLECTION/DATA GATHERING The experimental method involves the manipulation of variables to
establish cause and effect relationship, in an experiment, an
 It’s the process of gathering and measuring information on independent variable (the cause) is manipulated and the dependent
variables of interest, in an established systematic fashion that variable (the effect) is measured; any extraneous variables are
enables one to answer stated research questions, test controlled.
hypotheses, and evaluate outcomes.
 DATA ORGANIZATION AND PRESENTATION
METHODS USED IN GATHERING OR COLLECTING DATA
These are data organized into categories or intervals, usually
-DIRECT OR INTERVIEW METHOD presented in a table or diagram. Grouped data refer to tables,
A face to face contact is made with the informants (persons from diagrams, or other presentations of data where values or sets of
whom the information is to be obtained) under this method of values are grouped together into categories or intervals.
collecting data. The interviewer asks them questions pertaining to the Data collected or obtained from whatever manner are called
survey and collects the desired information. RAW DATA. Data collected can be classified according to the scale of
-INDIRECT OR QUESTIONNAIRE METHOD measurement used.

The Indirect or Questionnaire Method The researcher makes use of  FOUR LEVELS OF MEASUREMENT
a written questionnaire. The researcher gives or distributes the -NOMINAL DATA
questionnaire to the respondents either by personal delivery or by mail,
Nominal data is defined as data that is used for naming or  DIFFERENT WAYS OR FORMS TO PRESENT DATA
labelling variables, without any quantitative value. It is sometimes
called “named” data - a meaning coined from the word nominal. There -TEXTUAL FORM
is usually no intrinsic ordering to nominal data. Makes use of words, sentence and paragraphs in
presentation.
EXAMPLE:
-TABULAR FORM
Fruits (apple – 3; orange – 4; guava – 5) A systematic presentation of data in rows and columns.
-GRAPHICAL FORM
Vegetable’s (carrots – 1; potato – 2) Shows numerical values and relationships in a pictorial
-ORDINAL DATA form. It makes use of graphs, symbols or visual aids.
 TABULAR PRESENTATION
An Ordinal Number is a number that tells the position of - should be simple
something in a list, such as 1st, 2nd, 3rd, 4th, 5th etc. Most ordinal - should focus the reader’s attention on data rather
numbers end in "th" except for: one ⇒ first (1st) two ⇒ second (2nd) than form.
three ⇒ third (3rd) - Should make the meanings significance of
EXAMPLE: information being presented clear.

Educational attainment (elementary-1; highschool-2; college-3)


 GRAPHICAL PRESENTATION
- ACCURATE – should not be deceptive, distorted
-INTERVAL DATA or misleading or in any way susceptible to wrong
Interval data is measured along a numerical scale that has equal interpretation.
distances between adjacent values. These distances are called - SIMPLE – the basic design should be simple and
“intervals.” There is no true zero on an interval scale, which is what straight forward not loaded with irrelevant, or trivial
distinguishes it from a ratio scale. symbols and ornamentation.
- CLEAR – should be easily read and understood.
- ATTRACTIVE – design and constructed to attract
-RATIO DATA and hold the attention by having a neat, dignified
and professional appearance.
Ratio Data is defined as quantitative data, having the same
properties as interval data, with an equal and definitive ratio between  ORGANIZING COLLECTED NUMERICAL DATA CAN
each data and absolute “zero” being treated as a point of origin. In BE DONE IN TWO WAYS:
other words, there can be no negative numerical value in ratio data.
ARRAY - arrangement of numerical data/values according to order of
EXAMPLE: magnitude either ascending or descending.
(weight; number of member in a family)
FREQUENCY DISTRIBUTION TABLE – condensed version of an There are four blood types: A, B, O and AB. These types will
array. It categorizes the numerical data into classes or intervals. be used as the classes for the distribution.

Make a table:
 FREQUENCY DISTRIBUTION Class Tally Frequency Percent
A
A frequency distribution is a representation, either in a graphical or B
tabular format, that displays the number of observations within a given O
interval. The interval size depends on the data being analyzed and the AB
goals of the analyst. The intervals must be mutually exclusive and
exhaustive. Tally the data and place the result in the proper column.
TWO TYPES OF FREQUENCY DISTRIBUTION
Find the percentage of values in each class by using the
 CATEGORICAL FREQUNCY DISTRIBUTION - The formula
categorical frequency distribution is used for data that can be
%= x 100
placed in specific categories, such as nominal- or ordinal-level
data. For example, data such as political affiliation, religious
affiliation, or major field of study would use categorical Where f= frequency of the class
frequency distributions. n= total number of values

 GROUPED FREQUENCY DISTRIBUTION - Another way Percentages are not normally a part of a frequency distribution
of arranging data is by grouping the observations into intervals but they can be added since they are used in certain types of
and tabulating the frequencies for each interval. The result is graphic presentation, such as pie graphs.
called a grouped frequency table or grouped frequency
distribution. In this distribution the intervals are called classes. The frequency distribution for the data:

Example for Frequency Distribution Class Frequency Percent


Twenty-five army inductees were given a blood test to A 5 20
determine their blood type. The data is as follows: B 7 28
O 9 38
A B B AB O AB 4 16
O O B AB B 25 100
B B O A O For the sample, more people have type O blood than any other.
A O O O AB
AB A O B A
LESSON 2 FREQUENCY DISTRIBUTION Class Mark =
lower limit (left) + upper limit (right)

GROUP FREQUENCY DISTRIBUTION – When the range of data frequency of each interval
Relative frequency (percentage) =
is large, the data must be grouped into classes that are more than one ‫ݐ݋‬ nterv ‫݋‬ vr݁nr ‫ݑ‬
unit in width. x 100

STEPS in Constructing a Grouped Frequency Distribution Class Boundary = Lower limit (left) – 0.5; upper limit
(right) + 0.5
1. Determined the classes
 Find the lowest and highest value <cf = add the frequency from previous interval. The sum of
 Find the range frequency in the last interval must be equal to the total
 Select the number of classes desired number of frequencies.
 Find the width by dividing the range by the number of >cf = in the 1st interval, it must be the total number of
classes and rounding up frequencies. Then, from the 2nd up to the last interval, the
 Select a starting point (usually the lowest value or any difference from the previous interval is subtracted to the
convenient number less than the lowest value); add the frequency in each interval. The last interval should be equal
width to get the lower limits to the frequency of the first interval
 Find the upper class limits
Example
 Find the boundaries
2. Tally the Data Make a frequency distribution table of the scores below
3. Find the numerical frequencies
4. Find the cumulative Frequencies 3, 3, 1, 5, 4, 6, 4, 6, 5, 7, 8, 7, 8, 9, 8, 9, 7, 8, 9, 12, 12, 11, 10, 10, 14,
15, 13, 14, 13, 16, 16, 17, 17, 18, 18, 16, 17, 18
SOME IMPORTANT FORMULA
Arrange from lowest to highest value
To determine the range R of the numerical data
1, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 7, 8, 8, 8, 8, 9, 9, 9, 10, 10, 11, 12, 12, 13,
 R = /highest value – lowest value/ 13, 14, 14, 15, 16, 16, 16, 17, 17, 17, 18, 18, 18
To the number of classes K to which the data are to be
grouped using Sturge’s Approximation
R = highest value – lowest value
 K = 1 + 3.322 log N R = 1 – 18
R = 17
where N = total number of values to be grouped
C=R/K
C = 17 / 7
To determine the class size C C = 2.42
 C=R/K C=3
K = 1 + 3.322 log N median is the midpoint of the data array. Arrange the data in order and
K = 1 + 3.322 log 38 select the middle point
K = 6.25
Arrange the numbers from least to greatest.
K=7
EXAMPLE OF MEDIAN
Class Class Class Relative
f <cf >cf
Interval Mark Boundary Frequency
1-3 3 2 0.5 – 3.5 8 3 38
4-6 6 5 3.5 – 6.5 16 9 35
7-9 10 8 6.5 – 9.5 26 19 29
10-12 5 11 9.5– 12.5 13 24 19
13-15 5 14 12.5-15.5 13 29 14
16-18 9 17 15.5-18.5 24 38 9
Total=38

MEASURES OF CENTRAL TENDENCY – Measure of average MODE – the mode is the value that occurs most often in the data set.
are also called measures of central tendency and include the mean, A data can have more than one mode or no mode at all.
median and mode Arrange the numbers from least to greatest.
MEAN – also known as the arithmetic average, is found by adding the EXAMPLE OF MODE
values of the data and dividing by the total number of value
3 4 4 7 10 12 12 12 12 27 27 27 34 35 36 40 40 41 42 49
FORMULA OF MEAN
 Mode – 12
5 8 13 15 17 21 24 27 30 36 39 42
 No mode
Do not say that the mode is zero. That would be incorrect, because in
some data, such as temperature, zero can be an actual value
3 4 4 7 12 12 12 12 27 27 34 35 36 36 36 36 40 41 42 49
 Bimodal – because 12 and 36 both occur four times

MEDIAN – the
DISTRIBUTION SHAPES – frequency distributions can assume
many shapes. The three most important shapes are:
 positively skewed
 symmetrical distribution
 negatively skewed.
POSITIVELY SKEWED – or right skewed distribution, the majority
of the data values fall to the left of the mean and cluster at the lower
end of the distribution the “tail” is to the right. Also, the mean is to
the right of the median, and the mode is to the left of the median.
SYMMETRICAL DISTRIBUTION – the data values are evenly
distributed on both side of the mean. In addition, when the distribution
is unimodal, the mean, median and mode, are the same and are at
the center of the distribution.
NEGATIVELY SKEWED – when majority of the data values fall to
the right of the mean and cluster at the upper end of the distribution,
with with the tail to the left, the distribution is said to be negatively
skewed. Also, the mean is to the left of the median, and the mode is
to the right of the median.
LESSON 3 MEASURES OF CENTRAL TENDENCY ‰
㐮‰
Measure of average are also called measure of central tendency
which includes mean, median, and mode. The mean is 28 plans.

 The Mean  The Median (MD)


Also known as the arithmetic average, is found by adding the The median is the midpoint of the data array. Arrange the data in
values of the data and dividing by the total number of the values. order and select the middle point.
1. The median is used when one must find the center or the
tr‫ݐ‬
middle value of a data set.
Where: 2. The median is used when one must determine whether the
X = Values of the data data values fall into the upper half or lower half of the
N = Total number of values distribution.
3. The median is used to find the average of an open-ended
1. One computes the mean by using all the values of the data. distribution.
2. The mean varies less than the median or mode when 4. The median is affected less than the mean by extremely
samples are taken from the same population and all three high or extremely low values.
measures are computed for these samples. Examples:
3. The mean is used in computing other statistics, such as The weights (in pounds) of seven army recruits are 180, 201,
variance. 220, 191, 219, 209 and 186. Find the median:
4. The mean for the data set is unique and not necessarily one Arrange the data in order.
of the data values.
5. The mean cannot be computed for an open-ended 180, 186, 191, 201, 209, 219, 220
frequency distribution. Select the middle value
6. The mean is affected by extreme high or low values and
may not be the appropriate average to use in these 180, 186, 191, 201, 209, 219, 220
situations. Hence, the median weight is 201 pounds
Example: Note: When the median falls between two data value. Find the median
The data represent the number of different plans 10 HMO by adding the two values and divide it by 2.
systems offer their enrollees. Find the mean:

84, 12, 27, 15, 40, 18, 33, 33, 14, 4

㐠 + 㐮 + 〰 + 㐮 + 㐠‰ + 㐮 + 橔橔 + 橔橔 + 㐮㐠 + 㐠
㐮‰
 The Mode  The Weighted Mean
When the value occurs most often in the data set. A data can have This type of mean that considers an additional factor is called the
more than one mode or no mode at all. weighted mean and it is used when the values are not all equally
represented. Find the weighted mean of a variable X by multiplying
1. The mode is used when the most typical case is desired.
each value by its corresponding weight and dividing the sum of the
2. The mode is the easiest average to compute.
products by the sum of the weights.
3. The mode can be used when the data are normal, such as
religious preference, gender or political affiliation.
㐮 㐮+ 㐮 㐮 + +
4. The mode is not always unique. A data set can have more
㐮 + + +
than one mode ore the mode may not exist for a data set.
Examples:
Where:
The following data represent the duration (in days) of US space
w1, w2, …, wn = Weights
shuttle voyages for the year (1992-94). Find the mode:
x1, x2,…, xn = Values
8, 9, 9, 14, 8, 8, 10, 7, 6, 9, 7, 8, 10, 14, 11, 8, 14, 11
Example:
The final grade of a student in six subjects where she was
It is helpful to arrange the data in order, although it is not necessary.
enrolled were taken and are shown below:
6, 7, 7, 8, 8, 8, 8, 8, 9, 9, 9, 10, 10, 11, 11, 14, 14, 14
Subject No. of Units Final Grade
Math 3 2.25
Since 8-day voyage occurred five times – a frequency larger Science 5 2.00
than any other number – the mode for the data set is 8. English 3 1.50
Computer 2 2.75
Note: Do not say that the mode is zero (0). That would be incorrect, Filipino 3 2.00
because in some data, such as temperature, zero can be an actual value. Accounting 6 1.75

Eleven different automobiles were tested at a speed of 15 miles Determine the weighted mean grade.
per hour for stopping distances. The data, in feet are shown below:
Find the mode: wn xn wn xn
3 2.25 6.75
15, 18, 18, 18, 20, 22, 24, 24, 24, 26, 26 5 2.00 10
3 1.50 4.5
Since 18 and 24 both occur three times, the modes are 18 and 2 2.75 5.5
3 2.00 6
24 feet. This data set is said to be “bimodal”
6 1.75 10.5
Total (n) = 22 Total = 43.25
㐠橔h  Where:
㐮h 〰
l = lower limit of the modal class
h = size of the class interval (assuming all class sizes to
Use the following data when dealing with an ungrouped data: be equal)
f1 =frequency of the modal class
nt ‫ ݐ ݐ ݋‬nr f0 = frequency of the class preceding the modal class
r‫ݐ‬
‫ ݐ ݋‬nterv ‫ ݐ ݋‬nr f2 = frequency of the class succeeding the modal class

r ݈‫ݐ‬ t݈ r ‫ ݐ‬nr ( hr hr ‫ݐ‬vr ‫ݐ‬vv‫ݑ ݐ‬r ݈ ‫݋‬v rv)

‫ ݋‬r t‫݋‬ ‫݋ݑ‬tt‫݋‬ ‫ ݐ‬nr

Use the following data when dealing with grouped data:

Where:
X = Values of the data
N = Total number of values

h
r ݈‫ݐ‬ + ‫ݑ‬
Where:
l = lower class boundary of the median class
h = Size of the median class interval
f = Frequency corresponding to the median class
N = Total number of observations i.e. sum of the
frequencies
c = Cumulative frequency preceding median class

㐮 + ‰
h
㐮 ‰
LESSON 4 MEASURES OF VARIATION AND Find the variance and the standard deviation for data scores of a
MEASURES OF POSITION subject. The scores were 20, 15, 19, 10, 16, 18, 11, 13, 20 and 15.

 MEASURES OF VARIATION - For the spread or variability of a Step 1. Arrange the data from lowest to highest. Make sure that the
data set, three measures are commonly used: number of population you arranged is the same from the given
problem.
 Range
 Variance Answer: 10, 11, 13, 15, 15, 16, 18, 19, 20, 20
 Standard deviation
Step 2. Find the mean.
Range – is the highest value minus the lowest value. The symbol ‘R’ x
Formula: ‫݋‬v µ =
is used for the range N
x 㐮‰+ 㐮㐮+ 㐮橔 + 㐮 + 㐮 + 㐮6 + 㐮 + 㐮 + ‰+ ‰
 Formula: Range (R) = highest value – lowest value Answer: ‫݋‬v µ = N
= 㐮‰
= 15.7
Variance – is the average of squares of the distance each value is from
the mean. The symbol for the population variance is σ2 (σ is the Greek Step 3. Make a table and complete each column.
lowercase letter ‘sigma’)
x x-µ (x - µ)2
( x µ)
 Formula: σ2 = N
 Where: x – is the individual value

µ - is the population mean


N – is the population size
Standard Deviation – is the square of the variance.

 Formula: σ = σ
where ‘x’ (first column) is the individual value and it should be
σ=
( x µ) arranged from lowest to highest value. The second column is
N
individual value minus the mean. The third column is the square of the
 Where: σ – is the sigma difference from the second column.
x – is the individual value
µ - is the population mean
N – is the population size
STEPS FOR MEASURES OF VARIATION
x x-µ (x - µ)2 the value that corresponds to the 50th percentile, since half of
10 -5.7 32.49 the values below it and half of the values above it.
11 -4.7 22.09
Standard score of z score
13 -2.7 7.29
15 -0.7 0.49  A standard score or z score for a value is obtained by
15 -0.7 0.49 subtracting the mean from the value and dividing the result by
16 0.3 0.09 standard deviation. The symbol for a standard score is ‘z’
18 2.3 5.29
19 3.3 10.89 Formulas:
20 4.3 18.49 z=
‫ ݐ‬nr tr‫ݐ‬

20 4.3 18.49 ‫ݐ‬ ‫ݐ‬v r ݈‫݋݈ ݐ‬


x x
Note: it should be in two decimal places. for samples; z = s
Step 4. Find the sum of the squares in third column. x µ
for population: z = σ
Solution: Σ(x - µ)2 = 32.49 + 22.09 + 7.29 + 0.49 + 0.49 + 0.09
+ 5.29 + 10.89 + 18.49 + 18.49 = 116.1 The z score represents the number of standard deviations a data value
falls above or below the mean
Step 5. Find the variance.
Σ(x µ) 㐮㐮6h㐮
Example:
Solution: σ2 = = = 11.61
㐮‰ A student scored 65 on a calculus test that has a mean of 50
Step 6. Find the standard deviation. and a standard deviation of 10; she scored 30 on a history test with a
mean of 25 and a standard deviation of 5. Compare her relative
Solution: σ =
Σ(x µ)
= 㐮㐮h6㐮 = 3.41 positions on the two tests.

Solution:
 MEASURES OF POSITION
x x
 In addition to measures of central tendency and measures of z= s
variation, there are also measures of position or location. 6 ‰
These measures include standard scores, percentiles, deciles for calculus: z = 㐮‰
= 1.5
and quartiles. They are used to locate the relative position of a 橔‰ 
data value in data set. For example, if a value is located at 80th for history: z = 
= 1.0
percentile, it means that 80% of the values fall below it in the
distribution and 20% of the values fall above it. The median is
Since the z score for calculus is larger, her relative position in
the calculus class is higher than her relative position in the history
class.
NOTE:
 If the z score is positive, the score is above the mean.
 If the z score is zero, the score is the same as the mean.
 If the z score is negative, the score is below the mean.

Example:
Find the z score for each test and state which is higher.
Test A: x = 40, = 45, s = 10
Test B; x = 63, = 60, s = 10
Solution:
x x
z= s
㐠‰ 㐠
For Test A: z = 㐮‰
= -0.5
6橔 6‰
For Test B: z = 㐮‰
= 0.3

Conclusion: Since the z score for Test B is larger than the z score for
Test A, the relative position in the Test B is higher than the relative
position in Test A.
LESSON 5 MEASURES OF RELATIVE POSITION Step 3. Use the formula of percentile to find the percentile rank of a
score of 16.
Percentiles
number of values below x +‰h
Solution: Percentile = x 100%
 position measures used in educational and health-related fields total number of values
to indicate the position of an individual or group. 㐮 + ‰h
= x 100%
 A percentile P is an integer (1 ≤ P ≤ 99) such that Pth 

percentile is a value where P% of the data values are less than = 30th percentile
or equal to the value and 100 – P% of the data values are
Thus, a student whose score was 15 did better than 30
greater than or equal to the value.
percent of the class
 Percentiles are also used to compare an individual’s test score
with the national norm. Example of Finding the Corresponding Value of a Percentile
 Percentiles are not the same as percentage. That is, if a student
Find the 65th percentile of the data: 10, 16, 17, 11, 20, 12, 14
gets 75 correct answers out of a possible 100, she obtains a
percentage score of 75. There is no indication of her position Step 1. Arrange the data in order from lowest to highest value.
with respect to the rest of the class. She could have scored the Answer: 10, 11, 12, 14, 16, 17, 20
highest or the lowest, or somewhere in between. On the other nxp
hand, if a raw score of 75 corresponds to the 67th percentile, Step 2. Find the corresponding value using the formula: c = 㐮‰‰
,
then she is better than 67% of the students of her class. where ‘n’ is the total number of values, ‘p’ is the given percentile
 Percentiles are symbolized by P1, P2, P3, … P10 〰 x 6
number of values below x +‰h Solution; c = = 4.55 round up to the next whole number,
 Formula: Percentile = x 100% 㐮‰‰
total number of values in this case, c = 5.
Where x is the given value Step 3. Using the value of c, count the arranged data from lowest to
Example of Finding the Percentile highest value.

A teacher gives 20-point test to 5 students. The scores are Answer; Since c = 5, start at the lowest value and count over to
shown below. Find the percentile rank of a score of 15. the fifth value. The answer is 16.

20, 16, 11, 19, 15 Hence the value 16, corresponds to the 65th percentile.

Step 1. Arrange the given data from lowest to highest value.


Answer: 11, 15, 16, 19, 20 Another Example, if c is a whole number.

Step 2. Count the numbers that are below the given value. Find the value corresponding to the 25th percentile: 5, 3, 2, 20,
15, 18, 6, 10, 12, 8
Answer: There is 1 value below a score of 15.
Step 1. Arrange the data in order from lowest to highest value
Answer: 2, 3, 5, 6, 8, 10, 12, 15, 18, 20 Step 1. Arrange the data in order from lowest to highest value.
nxp
Step 2. Find the corresponding value using the formula: c = , Answer; 5, 6, 12, 13, 15, 18, 22
㐮‰‰
where ‘n’ is the total number of values, ‘p’ is the given percentile Step 2. Find the Median or Q2. Use the formula c =
݁
, where the
㐮‰‰
Solution: c =
㐮‰ x 6‰
=6 value of q is equal to the number of percentile. Since Q2 = 50th
㐮‰‰
percentile, q is 50.
Step 3. If c is a whole number, use the halfway between c and c + 1 〰 ‰
values when counting up from the lowest value. Then, add the two Solution: c = 㐮‰‰
= 3.5 round it up to the nearest whole
values and divide it by two number. So, the value of c is 4. Start at the lowest value and count over
fourth value. The answer is 13.
Answer: In this case, the 6th and 7th value. In the data, 10 is the
6th and 12 is the 7th value. The sum of these two values is 22 and Step 3. Find Q1 using the data values less than the median or Q2.
diving it by 2, the quotient is 11.
Answer: The values less than 13 are 5, 6 and 12. There are 3
㐮‰ x 㐮
= 11 values less than 13. In this case, the number of values is divided by 2
to get the median. So, 3 divided by 2 = 1.5, round it up to the nearest
Hence, 11 corresponds to the 60th percentile whole number, in this case, the answer is 2. It means that the Q1 is the
second value of the data values less than 13. The answer is 6.
Step 4. Find Q3 using the data values greater than the median or Q2.
Quartiles
Answer: The values greater than 13 are 15, 18 and 22. There
 Quartiles divide the distribution into four groups, denoted by
are 3 values greater than 13. In this case, the number of values is
Q1, Q2, and Q3
divided by 2 to get the median. So, 3 divided by 2 = 1.5, round it up to
 Note that Q1 is the same as the 25th percentile; Q2 is the same the nearest whole number, in this case, the answer is 2. It means that
as the 50th percentile or the median; Q3 corresponds to the 75th the Q3 is the second value of the data values greater than 13. The
percentile. answer is 18.
 Quartiles can be computed using the formulas given for
percentiles; however it is much easier to arrange the data in Hence, Q1 = 6, Q2 = 13, Q3 = 18
order from smallest to largest and find the median. This is Q2.
To find Q1, find the median of the data values less than the
Example 2
median. To find Q3, find the median of the data values that are
larger than the median. Find Q1, Q2 and Q3 for the data set 15, 13, 6, 5, 22, 18, 12, 25

Example 1: Step 1. Arrange the data in order from lowest to highest value.

Find Q1, Q2 and Q3 for the data set 15, 13, 6, 5, 22, 18, 12 Answer; 5, 6, 12, 13, 15, 18, 22, 25
݁
Step 2. Find the Median or Q2. Use the formula c = , where the Example 3
㐮‰‰
value of q is equal to the number of percentile. Since Q2 = 50th Find Q3 for the data set 9, 11, 6, 8, 20, 18, 12, 24
percentile, q is 50.
Step 1. Arrange the data in order from lowest to highest value.
‰
Solution: c = =4
㐮‰‰ Answer: 6, 8, 9, 11, 12, 18, 20, 24
If c is a whole number, use the halfway between c and c + 1 values Step 2. Find the Q3. Use the formula c =
݁
, where the value of q is
when counting up from the lowest value. Then, add the two values and 㐮‰‰

divide it by two. equal to the number of percentile. Since Q3 = 75th percentile, q is 75.
〰
In this case, 4th and 5th value. In the data, 4th value is 13 and the Solution: c = 㐮‰‰
=6
5th value is 15. Adding the two values is 28, then dividing it by 2 to get
the answer, 14. So, the median or Q2 is 14. Step 3. If c is a whole number, use the halfway between c and c + 1
values when counting up from the lowest value. Then, add the two
Step 3. Find Q1 using the data values less than the median or Q2. values and divide it by two.
Answer: The values less than 14 are 5, 6, 12, 13. There are 4 Answer: In this case, 6th and 7th value. In the data, 6th value is
values less than 14. In this case, the number of values is divided by 2 18 and the 7th value is 20. Adding the two values is 38, then
to get the median. So, 4 divided by 2 = 2, use the halfway between MD dividing it by 2 to get the answer, 19. So, the median or Q3 is
and MD + 1 values when counting up from the lowest value. Then, 19.
add the two values and divide it by two. In this case, 2nd and 3rd value.
In the data, 2nd value is 6 and the 3rd value is 12. Adding the two values Example 4
is 18, then dividing it by 2 to get the answer, 9. So, the median or Q2 is Find Q1 for the data set 3, 11, 6, 8, 19, 18, 12
9.
Step 1. Arrange the data in order from lowest to highest value.
Step 4. Find Q3 using the data values greater than the median or Q2.
Answer: 3, 6, 8, 11, 12, 18, 19
Answer: The values less than 14 are 15, 18, 22, 23. There are 4 ݁
values less than 14. In this case, the number of values is divided by 2 Step 2. Find the Q3. Use the formula c = 㐮‰‰
, where the value of q is
to get the median. So, 4 divided by 2 = 2, use the halfway between MD equal to the number of percentile. Since Q1 = 25th percentile, q is 25.
and MD + 1 values when counting up from the lowest value. Then, 〰 
add the two values and divide it by two. In this case, 2nd and 3rd value. Solution: c = 㐮‰‰
= 1.75, round it up to the nearest whole
In the data, 2nd value is 18 and the 3rd value is 22. Adding the two number, in this case, c = 2.
values is 40, then dividing it by 2 to get the answer, 20. So, the median
Step 3. Using the value of c, count the arranged data from lowest to
or Q3 is 20.
highest value.
Hence, Q1 = 9, Q2 = 14, Q3 = 20
Answer; Since c = 2, start at the lowest value and count over to
the second value. The answer is 6.
Hence, the value 6 corresponds to the 2nd quartile. Step 3. Using the value of c, count the arranged data from lowest to
highest value.
Answer; Since c = 11, start at the lowest value and count over
Deciles
to the eleventh value. The answer is 20.
 Deciles divide the distribution into ten groups. They are
Hence, the value 20 corresponds to the 7th decile.
denoted by D1, D2, D3 … D9.
 Note that D1 corresponds to 10th percentile; D2 corresponds to
20th percentile, etc.
 Deciles can be found using the formulas given from percentiles.
Taken altogether then, these are relationships among
percentiles, deciles, and quartiles.

NOTE:
P10 = D1 P60 = D6
P20 = D2 P70 = D7
P25 = Q1 P75 = Q3
P30 = D30 P80 = D8
P40 = D4 P90 = D9
P50 = Q2 = D5
Example:
Find D7 for the data set 3, 11, 6, 8, 19, 18, 12, 16, 2, 5, 20, 26, 22, 30,
28
Step 1. Arrange the data in order from lowest to highest value.
Answer: 2, 3, 5, 6, 8, 11, 12, 16, 18, 19, 20, 22, 26, 28, 30

Step 2. Find the D7. Use the formula c = 㐮‰‰


, where the value of d is
equal to the number of percentile. Since D7 = 70th percentile, d is 70.
㐮 〰‰
Solution: c = 㐮‰‰
= 10.5, round it up to nearest whole
number, in this case, c = 11.
LESSON 6 NORMAL DISTRIBUTION Kurtosis

 Medical researchers have determined so-called intervals for a  Another measure that tells about the distribution
person’s blood, cholesterol, triglycerides and the like. By  The distribution is said to be normal when the values of
measuring these variables, a physician can determine if a kurtosis is equal to 0.265 (mesokurtic)
patient’s vital statistics are within the normal interval or if  When its value is less than 0.265 (leptokurtic), the distribution
some type of treatment is needed to correct a condition and is abnormal.
avoid future illnesses.  When its value is greater than 0.265 (platykurtic), the
 Data can be ‘distributed’ (spread out) in different ways. It can distribution is also to be abnormal
be spread out more on the left. It can be spread out more on the
right. It can all be jumbled up.
 When the data values are evenly distributed about the mean,
the distribution is said to be symmetrical. When the majority of
data values fall to the left or right of the mean, the distribution
is said to be skewed.
 When the majority of the data values fall to the right of the
mean, the distribution is said to be negatively or left skewed.
The mean is to the left of the median, and the mean and the
median are to the left of the mode.
 When the majority of the data values fall to the left of the mean,
the distribution is said to be positively or right skewed. The
mean falls to the right of the median, and both the mean and
the median to fall to the right of the mode.  But there are many cases where the data tends to be around a
central value with no bias left or right, and it gets close to
‘Normal Distribution’
 The ’Bell Curve’ is a normal distribution because it looks like a
bell.

Properties of Normal Distribution


 The normal distribution curve is bell-shaped.
 The mean, median and mode are equal and located at the center
of distribution.
 The normal distribution is unimodal.
 The curve is symmetrical about the mean, which is equivalent
to saying that its shape is the same on both sides of a vertical
line passing through the center.
 The curve is continuous – there are no gaps or holes. For each
value of X, there is a corresponding value of Y.
 The never touches the x-axis. Theoretically, no matter how far
in any direction the curve extends, it never meets the x-axis – it
gets increasingly closer.
 The total are under the normal distribution curve is equal to 1
to 100%.
 There are under normal distribution curve that lies within the 1
standard deviation of the mean is approximately 0.68 or 68%;
within 2 standard deviations, about 0.95 or 95%; and within
three standard deviation is about 0.997 or 99.7%.

Standard Normal Distribution


 The standard normal distribution is a normal distribution with a
mean of 0 and a standard deviation of 1.
LESSON 7 CORRELATION ANALYSIS
 It is used to measure the degree of linear relationship or
association between two variable
Correlation coefficient
 May be positive or negative
 Positive correlation is present when high values in one variable
are associated with high values of another variable or vice
versa.
 Negative correlation is present when high values in one  The degree of linear relationship can be interpreted through the
variable are associated in low values of the other variable or use of range of values (Bermudo, 2005) for the Pearson
vice versa. Product Moment Correlation Coefficient
 Perfect positive correlation is represented by +1.00 value while
perfect negative correlation is represented by -1.00 value. Range of Value Decision
±1.0 Perfect Relationship
Pearson Product Moment Correlation Coefficient or simply Pearson r ±0.80 to ±0.99 Very Strong Relationship
 The most widely used measure of correlation ±0.60 to ±0.79 Strong Relationship
 There are 2 basic assumptions underlying the use of the ±0.40 to ±0.59 Moderate Relationship
Pearson r, linear relationship is present, and the level of ±0.20 to ±0.39 Weak Relationship
measurement of the data for the two variables are either ±0.01 to ±0.19 Very Weak Relationship
interval or ratio scale. 0.00 No Relationship
Σ Σ (Σ )  T-test for the significance of the Pearson r
 Formula: v
Σ (Σ ) Σ Σ

Where: Formula: v 㐮 v
r – degree of relationship between x and y
x – the observed data for independent variable Where:
y – the observed data for the independent t – computed t value
variable r – Pearson r value
n – sample size n – number of respondents
Regression
 If the value of correlation coefficient is significant, the next
step is to determine the regression line, which is the data’s line
of best fit.
 Determining the regression line when r is not significant and Mathematics Science
Respondents x2 y2 xy
then making predictions using regression line is meaningless. (x) (y)
 The purpose of regression line is to enable the researcher to see 1 9 8
the trend and make predictions on the basis of the data. 2 8 10
 Formulas for the Regression Line: 3 10 10
y’ = a + bx 4 5 7
Σ Σ Σ (Σ )
a= 5 7 6
Σ (Σ )
6 4 6
Σ Σ (Σ )
b= 7 9 7
Σ (Σ )
8 9 9
where: 9 10 8
a is y’ intercept and b is the slope of the line 10 6 4
Σ Σ Σ Σ Σ
NOTE: you need to identify the regression if the value of coefficient is
significant!
Step1. Analyze the problem, the data and what is asked on the
problem. In this case, Null hypothesis and Alternative hypothesis are
Example in this step. Null hypothesis (Ho) is a hypothesis that says there is no
statistical significance between the two variables in a given problem.
Give the following data for 10 students, determine if there is Alternative hypothesis (Ha) is a hypothesis that alternates the null
significant relationship between their performance test in Mathematics hypothesis if the null hypothesis is rejected.
and Science. If there is significant relationship, a) determine the
equation of the regression line and b) predict y’ when x = 6 Answer: Ho: There is no significant relationship between the
students’ performance test in Mathematics and Science.
Ha: There is a significant relationship between the
students’ performance test in Mathematics and Science.
Step 2. Complete the table. 1st column is the respondents. 2nd column
is the x-values. The values of first variables are listed here. 3rd column
is the y-values. 4th column is the square of x-values. 5th column is the
square of y-values. 6th column is the product of x and y-values. Do not
forget to get the sum of each column except the first column.
Answer:
Mathematics Science of the value or r. Also, determine if ‘r’ is direct or indirect. If variables
Respondents x2 y2 xy
(x) (y) change in the same direction, the correlation is called a direct
1 9 8 81 64 72 correlation or a positive correlation. If variables change in opposite
2 8 10 64 100 80 directions, the correlation is called an indirect correlation or a negative
3 10 10 100 100 100 correlation.
4 5 7 25 49 35 Answer: The value of r is 0.65 and it is positive. In the table of
5 7 6 49 36 42 value, 0.65 is in the strong relationship. Therefore, the decision is
6 4 6 16 36 24 strong positive relationship. Also, it is direct correlation.
7 9 7 81 49 63
Step 5. Compute the t-value using t-test formula. Note that the critical
8 9 9 81 81 81
t-value at 0.05 level of confidence is 2.306. Tip: List all the values or
9 10 8 100 64 80 the needed things in the formula (para di kayo malito juzq)
10 6 4 36 16 24
Σxy= Solution:
Σx = 77 Σy = 75 Σx2=633 Σy2=595
601 n = 10 t=?
Step 3. Find r using Pearson Product Moment Correlation Coefficient
formula. Tip: List all the values or the needed things in the formula r = 0.65
(para di kayo malito juzq)
Solution:
㐮 v
n = 10 Σy2 = 595
Σx = 77 Σxy = 601
Σy = 75 r=? 㐮‰
v
Σx2 = 633 㐮 (‰h6)
Σ Σ (Σ )
v
Σ (Σ ) Σ Σ
‰h6
‰h〰〰
㐮‰ 6‰㐮 〰〰 (〰)
v
㐮‰ 6橔橔 (〰〰) 㐮‰   〰 t = 2.42 (the answer is rounded up to decimal points or places)
橔 Step 6. Decision Rule. In this part, after you computed the t-value, the
v answer will now compare to the critical t-value at 0.05 level of
㐮橔‰ 橔 
confidence which is 2.306 (standard). NOTE: If the t-value is greater
r = 0.65 (the answer is rounded up to 2 decimal points) than 2.306, the hypothesis is rejected. If the t-value is less than -2.306,
the null hypothesis is rejected.
Step 4. After you compute ‘r’, look in the table of range of values for
the Pearson Product Moment Correlation Coefficient for the decision
Answer: Since computed t-value (2.42) is greater than the 㐮‰ 6‰㐮 〰〰 (〰)
b=
critical t-value (2.306), the null hypothesis is rejected. 㐮‰ 6橔橔 (〰〰)

Step 7. Interpret the data from decision rule. 橔


b=
㐠‰㐮
Answer: There is strong positive relationship between the
students; performance test in Mathematics and Science. The better is b = 0.59 (the answer is rounded up to decimal points or places)
the student’s performance in Mathematics; the better is the student’s Therefore, the equation is y’ = 2.99 + 0.59x
performance in Science.
Predicting y’ when x = 6
Step 8. Since the problem has significant relationship between two
variables, determine the equation of the regression line and predict y’ y’ = a + bx
using the formulas of regression. Tip: List all the values or the needed y’ = 2.99 + 0.59x
things in the formula (para di kayo malito juzq)
y’ = 2.99 + 0.59 (6)
Solution:
y’ = 6.53
Determining the equation of the regression line
y’ = a + bx
Σ Σ Σ (Σ )
a=
Σ (Σ )
Σ Σ (Σ )
b=
Σ (Σ )

n = 10 Σy2 = 595
Σx = 77 Σxy = 601
Σy = 75 a=?
Σx2 = 633 b=?

〰 6橔橔 〰〰 (6‰㐮)
a=
㐮‰ 6橔橔 (〰〰)

㐮㐮
a=
㐠‰㐮

a = 2.99 (round up to 2 decimal places)


ANSWER ME IF YOU CAN!!! 5. It is a condensed version of an array. It categorizes the
numerical data into classes or intervals.
Test A
a. Frequency Distribution
1. The practice or science of collecting and analyzing numerical b. Frequency Distribution Table
data in large quantities, especially for the purpose of inferring c. Statistics
proportions in a whole from those in a representative sample. d. Correlation Analysis
a. Data Management
b. Frequency Distribution
c. Statistics 6. Also known as the arithmetic average, is found by adding the
d. Correlation Analysis values of the data and dividing by the total number of value
a. Mean
2. It is an administrative process that includes acquiring, b. Median
validating, storing, protecting, and processing required data to c. Mode
ensure the accessibility, reliability, and timeliness of the data d. No answer
for its users.
a. Data Management
b. Frequency Distribution 7. The value that occurs most often in the data set.
c. Statistics a. Mean
d. Correlation Analysis b. Median
c. Mode
d. No answer
3. A systematic presentation of data in rows and columns.
a.Tabular form
b. Graphical form 8. The midpoint of the data array
c. Tabular presentation a. Mean
d. Textual form b. Median
c. Mode
d. No answer
4. What is an example of ratio data?
a. Colors (green – 5, blue – 7, white – 10)
b. 2 tsp of sugar : 1 cup of flour 9. Mean, median and mode
c. College of MT (1st year – 69, 2nd year – 75, 3rd year – 80, 4th a. Measure of Variation
year – 50) b. Measures of Position
d. Size (small – 10, medium – 7, large – 5) c. Correlation Analysis
d. Measures of Central Tendency
Test D
10. Which is not included in the group? A student scored 100 on a calculus test that has a mean of 85 and a
a. Interval standard deviation of 10; she scored 85 on a history test with a mean of
b. Class Mark 80 and a standard deviation of 10. Compare her relative positions on
c. Frequency the two tests.
d. All are included in the group.

Test E
Test B
A teacher gives 20-point test to 15 students. The scores are shown
Make a frequency distribution table of the scores of 25 students in below.
their Statistics examination.
20, 16, 11, 19, 15, 13, 17, 18, 6, 8, 10, 14, 9, 12, 7
30, 15, 17, 29, 25, 25, 14, 11, 10, 19, 20, 21, 21, 17, 18, 10, 22, 26, 24,
a. Arrange the scores in order from lowest to highest value
28, 28, 30, 23, 24, 12
b. Find the 45th percentile
a. Arrange the data from lowest to highest c. Find the 3rd quartile
b. Find the range d. Find the 1st decile
c. Find the number of classes e. Find the percentile rank of 14
d. Find the width of each interval
e. Complete the table:
i. Class interval Test F
ii. Frequency
iii. Class mark Give the following data for 8 cellular phones; determine if there is
iv. Boundary significant relationship between phone usage in minutes and battery
v. Relative frequency (2 decimal places) life percentage. If there is significant relationship, a) determine the
vi. <cf equation of the regression line and b) predict y’ when x = 100
vii. >cf

Test C
Find the variance and standard deviation for data scores of Chemistry
activity. The scores were 10, 5, 11, 15, 13, 7, 9, 14, 15, 6
Phone Battery Created by:
Cellular
Usage Percentage x 2
y 2
xy
phones Agudo, Rene Jayson
(x) (y)
1 60 90 Asuncion, John Vincent Miguel
2 60 85
Balamban, Lea Arabela
3 90 77
4 140 79 Enciso, Sophia Therese
5 180 75
6 145 65
7 85 85
8 120 70
Σ Σ Σ Σ Σ

1. Null hypothesis
2. Alternative hypothesis
3. r and interpretation of the Pearson r
4. t test value; critical t value at 0.05 confidence level = 2.571
5. decision rule
6. interpretation

Walang answer key!


Answer Key!!! Class Class Class Relative
f <cf >cf
Interval Mark Boundary Frequency
Test A
10-13 4 11.5 9.5 – 13.5 16 4 25
1. c
2. a 14-17 4 15.5 13.5 – 17.5 16 8 21
3. a
4. b 18-21 5 19.5 17.5 – 21.5 20 13 16
5. b 22-25 6 23.5 21.5– 25.5 24 19 10
6. a
7. c 26-29 4 27.5 25.5-29.5 16 23 6
8. b
30-33 2 31.5 29.5-33.5 8 25 4
9. d
10. d Total=25 Total=100

Test B Test C
30, 15, 17, 29, 25, 25, 14, 11, 10, 19, 20, 21, 21, 17, 18, 10, 22, 26, 24, Arrange: 5, 6, 7, 9, 10, 11, 13, 14, 15, 15
28, 28, 30, 23, 24, 12
Mean (µ): 10.5
a. 10, 10,11, 12, 14, 15, 17, 17, 18, 19, 20, 21, 21, 22, 23, 24, 24,
25, 25, 26, 28, 28, 29, 30, 30 x x-µ (x - µ)2
b. R = 20 5 -5.5 30.25
c. K = 6 6 -4.5 20.25
d. C = 4 7 -3.5 12.25
9 -1.5 2.25
10 -0.5 0.25
11 1.5 2.25
13 3.5 12.25
14 4.5 20.25
15 5.5 30.25
15 5.5 30.25
Σ(x - µ) = 160.5
2

σ2 = 16.05
σ = 4.01
Test D Test F
Z score in Calculus: 1.5 Phone Battery
Cellular
Usage Percentage x2 y2 xy
Z score in History: 0.5 phones
(x) (y)
Since the z score for calculus is larger, her relative position in the 1 60 90 3600 8100 5400
calculus class is higher than her relative position in the history class. 2 60 85 3600 7225 5100
3 90 77 8100 5929 6930
4 140 79 19600 6241 11060
Test E 5 180 75 32400 5625 13500
a. 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 6 145 65 21025 4225 9425
b. C = 6.75, round it up to whole number, 7. Therefore, 12 7 85 85 7225 7225 7225
corresponds to 45th percentile 8 120 70 14400 4900 8400
c. C = 11.25, round it up to whole number, 11. Therefore 16, Σ=880 Σ=626 Σ=109950 Σ=49470 Σ=67040
corresponds to 3rd quartile 1. Ho = There is no significant relationship between phone usage
d. C = 1.5, round it up to whole number, 2. Therefore, 7 in minutes and battery life percentage.
corresponds to 1st decile 2. Ha = There is a significant relationship between phone usage in
e. P = 56.67, round it up to whole number, 57th percentile. Thus, a minutes and battery life percentage.
student whose score was 14 did better than 57 percent of the 3. r = -0.72
class Interpretation of the Pearson r: strong relationship, negative,
indirect
4. t test value: -2.54 critical t value at 0.05 confidence level =
2.306
5. Since the t value is less than the t critical value, null hypothesis
is rejected.
6. There is strong negative relationship between phone usage in
minutes and battery life percentage. As the phone usage in
minutes increases, the battery life percentage of the phone
decreases.
7. Equation of the regression line:
a = 93.47
b = -0.14
y’ = 93.47 – 0.14(x)
8. Predict y’ if x = 100, y’ = 79.47

You might also like