You are on page 1of 21

Week 1

Ways to organize data:


4.1Quantitative methods emphasize objective measurements and the statistical, mathematical, or numerical
analysis of data collected through polls, questionnaires, and surveys, or by manipulating pre-existing statistical data
using computational techniques

Ways to organize data:


Data can be described as quantitative if it can be measured or identified on a numerical scale. Examples
include length, height, area, volume, weight, speed, age, distance, cost and so on. However, not all data using
numbers is quantitative
•      Data management is an administrative process that includes acquiring, validating, storing, protecting, and
processing required data to ensure the accessibility, reliability, and timeliness of the data for its users
•      Quantitative data deals with numbers and things you can measure objectively: dimensions such as height, width,
and length. Temperature and humidity. Prices. Area and volume.
•      Qualitative data deals with characteristics and descriptors that can't be easily measured, but can be observed
subjectively—such as smells, tastes, textures, attractiveness, and color.
•      Broadly speaking, when you measure something and give it a number value, you create quantitative data. When
you classify or judge something, you create qualitative data
•      Continuous data, on the other hand, could be divided and reduced to finer and finer levels. For example, you can
measure the height of your kids at progressively more precise scales—meters, centimeters, millimeters, and beyond
—so height is continuous data. (See picture below)
Ways to organize data:
Data can be described as quantitative if it can be measured or identified on a numerical scale. Examples
include length, height, area, volume, weight, speed, age, distance, cost and so on. However, not all data using
numbers is quantitative
•      Data management is an administrative process that includes acquiring, validating, storing, protecting, and
processing required data to ensure the accessibility, reliability, and timeliness of the data for its users
Textual-the data are presented in paragraph form, as in a narrative.
Tabular representation- the data are presented in tables showing the statistics systematically in columns and rows.
Graphs and charts-shows the statistical values and relationships in a pictorial or diagrammatic form hence it is
considered the most effective data presentation.

Bar Graphs- a graphical display using bars of different heights displayed either vertically or horizontally. 
They are used to compare magnitudes. 
Histogram- graphical representation of the distribution of data. The rectangles in the histogram touch each other to
indicate that the original variable is continuous
Linear graphs- express linear relationships in everyday life. When graphed, they show a straight line in a sketch or
plotted ways
Frequency Polygon-
a graphical device for understanding the shapes of distributions.  It is specifically helpful in comparing sets of data.
Also good for displaying cumulative frequency distributions
Frequency Ogive(o-jive)– a graph that represents the cumulative frequency distribution and its continuous frequency
curve.  It is shaped like an S.
Pie chart-a circular chart divided into wedge-like sectors to illustrate proportions. The total of each pie is always
100%.
Statistical Maps- a special type of map in which the variation in a quantity of rainfall,  population or crops in a
geographic area is indicated.
Pictograms or Pictographs an ideogram that conveys the meaning through its resemblance to a physical object.

DATA COLLECTION AND TECHNIQUES:


Examples: Direct or interview Method- a conversation between two or more people 
where questions are asked by the interviewer to elicit facts or statements from the interviewee.
Indirect or Questionnaire method- set of questionnaires are given
Registration Method-refers to continuous, permanent, compulsory recording of the occurrence of vital events. Ex.
Live births, deaths, marriages, etc.
Observation method-requires looking and listening

WAYS OF ORGANIZING DATA:


Two Ways of Organizing Data
Grouped Data- data that are organized in categories
Ungrouped data not organized or if arranged, can be done in either ascending or descending order.
The table above shows an unarranged and an arranged data set.

Week 2
Measures of Central Tendency In statistics, mode, median, mean and range are typical values to represent a pool of
numerical observations.
They are calculated from the pool of observations.
Mode is the most common value among the given observations. For example, a person who sells ice creams might
want to know which flavor is the most popular, or we may want what color is worn by girls, by merely counting on the
occurrence or the observation during an event we can know what is the mode or the trending color among the girls.
 The mode or the Modal scores is a score or scores that occurred the most in the distribution. It is classified as
unimodal, bimodal, trimodal, or multimodal.
• Unimodal is a distribution of scores with only one mode.
• Bimodal is a distribution of scores with two modes.
• Trimodal as the name implies is a distribution with three modes.
Median is the middle value, dividing the number of data into 2 halves. In other words, 50% of the observations is
below the median and 50% of the observations is above the median.
Mean is the average of all the values. For example, a teacher may want to know the average marks of a test in his
class.
Range tells how far apart the greatest and least numbers in a set are. It is the difference between the largest and
smallest numbers.
Range The range of a set of numbers is the difference between the least number and the greatest number in the set.
Finding the range and mid-range
The range is the difference between the largest and smallest numbers.
The midrange is the average of the largest and smallest number.
 Example1 / Exercise 1
Fin the range: a) 150, 250, 825, 400, 18, 500
                           b) 2.2, 1.8, 5.1, 0.3 Solution:
Solution:
a)      The largest value is 825. The smallest value is 18
    Range = largest value − smallest value
                = 825 − 18 = 807
b)      The largest value is 5.1. The smallest value is 0.3
Range = largest value − smallest value = 5.1 − 0.3 = 4.8
CALCULATING THE MEAN, MEDIAN, AND MODE
THE STEM AND LEAF PLOT:
Another way to present a data is a STEM AND LEAF PLOT. By merely looking at the data presented on a plot like
this, we could already tell the frequency of a given data set and counting from both direction till the middle or the
meeting point half-way, the mid-point will be readily seen and calculated if in case of an even-numbered data set.
For the instruction of constructing a STEM-AND-LEAF-PLOT, please see the illustration below
HOW TO CONSTRUCT THE STEM AND LEAF PLOT
Arrange the given data in either descending or ascending order
Draw a horizontal line/s (row/s) for each number from 1 to 10 or any designated maximum number
(1-100 if given)
draw a vertical line for unit or digit separator for the given data
The stem will be the tens unit to the left of the vertical line separator and the ones digit to the left of the vertical line
Determine the midpoints by counting in clockwise and counterclockwise direction; that is from lowest to the middle
number, and from highest back to the middle number (see illustration)

12 40 15 19 40 31

23 34 37 23 33 36

25 26 29 30 21 28

32 29 20 34 26 38

 
The most frequent data that appear are: 23, 26, 29, 34, and 40, each appearing twice. This is a multi-modal data set.
If the data has no repeated event, it has no frequency or mode
If it has two events occurring, it is bi-modal, if it has three events repeating, it is tri-modal and if it has more than three
it is said to be multi-modal. 
Exercise 2
4.2 Mean is the average of all the values
Sunny collects the data on the number of the ages of respondents in the science class and it yields the following.
Determine the average of the respondents.
16        18        28        17        17        21        22        22        23        17        n= 10 
Solution:
Add all the total values of x and divide them by the number of the given values
16 + 18 + … + 17 =202.  this is the formula for finding the mean of an ungrouped data     

 
Exercise 3 
FREQUENCY DISTRIBUTION:
Ungrouped data: To complete the frequency distribution table, one has to determine the following factors from a
given set of data
Sample Problem (see the data set) (pp.78, Math in the Modern World-Dr. Charlie I. Cari o)
Steps to follow:
Construct the frequency table using
Rule 1: find the k such that 1. 2^k >n where n is the number of observations.
2. for the class interval i = Range/ Number of classes (k)
Range (R) = Highest value – Lowest value R = HV – LV
Hence;
i = HV-LV/k Class Limits Class Boundaries (see the illustration)
The numbers at the left are called the Lower Limits And the numbers at the Right are the Upper Limits
How do we get the Class Boundaries of the class boundaries?
We SUBTRACT 0.5 from the Lower Limits and ADD 0.5 to the Upper Limits. 10-0.5 =9,5; 16+.05 =16.5.
(see Table)
 Before we go further, let’s define the following terms for better understanding:
Range -The difference between the highest and lowest scores in a distribution.
Class Limits – the smallest and largest observations in each class.
Class Boundaries- midpoints in the upper Lower Class- limits in each class
Frequency- how many times a data appears in the observed data set
Relative frequency –the ratio of the number of times a data occurred in a class divided by the total frequency
Cumulative frequency- the total of each frequency in a class which is equal to the total frequency (in this case the
number of observation)
Percentage- the relative frequency of each class multiplied by 100 Midpoint- midway between two points
Exercise 3/ Activity 3
4.3 To find the median of a data set, we have to first arrange the data in ascending or descending order.
The median is the midpoint of the given data set.
To find the median, all we have to do is to count the number of data in the data set.
If the data set is odd-numbered (e.g. 3, 5, 7, 8, 9), the median is at the midpoint, in this example, it is at 7.
In our example above, we have even-numbered data set, therefore, we get the middle number where the data set is
divided by two,
then the two middle numbers are 18 and 21. (n = 10; 10/2 = 5)
 16        17        17        17        18        21        22        22        23        29       
Add the two middle numbers 18 + 21 = 39
HOW TO COMPLETE THE FREQUENCY TABLE:
Steps to follow
1. Given an ungrouped data, arrange the data in either ascending or descending order
2. Determine the Range by subtracting the lowest data value from the highest data value.
3. Determine the value of k which would give 2^k>n; where n is the total number of observations
4. Determine i =R/k to find the number of class limits
5. Starting from the lowest data value, we count 7 onwards
6. Same will be done to find the next class limit
7. We SUBTRACT 0.5 from the Lower Limits and ADD 0.5 to the Upper Limits. 10-0.5 =9,5; 16+.05 =16.5. (see
Table)
8. The frequencies can be found by counting the number of data appearance from the data set (See table 1b) See
illustration on Table 2 for the completion of frequency distribution
Relative frequency = frequency of each class divided by the total frequency
Percentage is taken by multiplying the Relative frequency by 100 Cumulative
frequency is the sum of the first frequency added to the second, the second to the third and so on
The Midpoint is the Average of the class limit divided by 2. (see table 3)
Exercise 4 
1) Given the data below complete the frequency distribution.
 Follow the step-by-step-solution Show your solution. You can use extra sheet of paper if necessary 
2) From the given data below, construct the Stem and Leaf- Plot.
Exercise 3 
FREQUENCY DISTRIBUTION:
Ungrouped data: To complete the frequency distribution table, one has to determine the following factors from a
given set of data
Sample Problem (see the data set) (pp.78, Math in the Modern World-Dr. Charlie I. Cari o)
Steps to follow:
Construct the frequency table using
Rule 1: find the k such that 1. 2^k >n where n is the number of observations.
2. for the class interval i = Range/ Number of classes (k)
Range (R) = Highest value – Lowest value R = HV – LV
Hence;
i= HV-LV/k Class Limits Class Boundaries (see the illustration)

The numbers at the left are called the Lower Limits And the numbers at the Right are the Upper Limits
How do we get the Class Boundaries of the class boundaries?
We SUBTRACT 0.5 from the Lower Limits and ADD 0.5 to the Upper Limits. 10-0.5 =9,5; 16+.05 =16.5.
(see Table)
 Before we go further, let’s define the following terms for better understanding:
Range -The difference between the highest and lowest scores in a distribution.
Class Limits – the smallest and largest observations in each class.
Class Boundaries- midpoints in the upper Lower Class- limits in each class
Frequency- how many times a data appears in the observed data set
Relative frequency –the ratio of the number of times a data occurred in a class divided by the total frequency
Cumulative frequency- the total of each frequency in a class which is equal to the total frequency (in this case the
number of observation)
Percentage- the relative frequency of each class multiplied by 100 Midpoint- midway between two points
 Exercise 3/ Activity 3
4.3 To find the median of a data set, we have to first arrange the data in ascending or descending order.
The median is the midpoint of the given data set.
To find the median, all we have to do is to count the number of data in the data set.
If the data set is odd-numbered (e.g. 3, 5, 7, 8, 9), the median is at the midpoint, in this example, it is at 7.
In our example above, we have even-numbered data set, therefore, we get the middle number where the data set is
divided by two,
then the two middle numbers are 18 and 21. (n = 10; 10/2 = 5)
 16        17        17        17        18        21        22        22        23        29       
Add the two middle numbers 18 + 21 = 39
HOW TO COMPLETE THE FREQUENCY TABLE:
Steps to follow
1. Given an ungrouped data, arrange the data in either ascending or descending order
2. Determine the Range by subtracting the lowest data value from the highest data value.
3. Determine the value of k which would give 2^k>n; where n is the total number of observations
4. Determine i =R/k to find the number of class limits
5. Starting from the lowest data value, we count 7 onwards
6. Same will be done to find the next class limit
7. We SUBTRACT 0.5 from the Lower Limits and ADD 0.5 to the Upper Limits. 10-0.5 =9,5; 16+.05 =16.5. (see
Table)
8. The frequencies can be found by counting the number of data appearance from the data set (See table 1b) See
illustration on Table 2 for the completion of frequency distribution

 
Relative frequency = frequency of each class divided by the total frequency
Percentage is taken by multiplying the Relative frequency by 100 Cumulative
frequency is the sum of the first frequency added to the second, the second to the third and so on
The Midpoint is the Average of the class limit divided by 2. (see table 3)
1)       Given the data below complete the frequency distribution.
 Follow the step-by-step-solution Show your solution. You can use extra sheet of paper if necessary 

 
2) From the given data below, construct the Stem and Leaf- Plot

2) From the given data below, construct the Stem and Leaf- Plot.

Week 3
HOW TO CALCULATE TME MEAN, MEDIAN, AND MODE OF GROUPED dATA

Week 4
definition of Terms
Dispersion is the state of getting dispersed or spread. Statistical dispersion means the extent to
which a numerical data is likely to vary about an average value. In other words, dispersion helps
to understand the distribution of the data. (see illustration on the right) we can see that as the data
are closer to its center, the steeper is the peak of the curve. The farther they get away from the
center, the wider is the peak.
Standard Deviation
In statistics, the standard deviation is a measure of the amount of variation or dispersion of a set
of values
A low standard deviation indicates that the values tend to be close to the mean of the set,
while a high standard deviation indicates that the values are spread out over a wider range.
One of the most useful measures of dispersion is the standard deviation.
It is based on deviations from the mean of the data.
Steps to follow in solving for the Standard of Variation:
1.Calculate the mean of the numbers
2.Find the deviations from the mean
3.Square each deviation
4.Sum the squared deviations
5.Divide the sum in step 4 by (n – 1)
6.Take the square root of the quotient in step 5.
Note: the deviation is always equal to zero
Coefficient of Variation
The coefficient of variation expresses the standard deviation as a percentage of the mean.
It is not strictly a measure of dispersion as it combines central tendency and dispersion.
For any set of data, the coefficient of variation is given by:
C.V. = (ẟ /  χ )(100)
it is simply the standard deviation divided by the mean.
 It answers the question: how big is the S.D. of relative to the mean of the distribution?

WEEK 5
Descriptive measure that locate the relative position of an observation in relation to the
other observations are called measures of relative position. 
They are referred to as the Percentiles, Deciles and the Quartiles.
The Quartiles and the median divide the array into four equal parts.
Deciles into ten and
percentiles into a hundred equal parts.
25th percentile = Q1 50th percentile = Q2 or the Median
75th percentile = Q3
COMMON MEASURES OF POSITION
BOX AND WHISKERS PLOT - Box and Whiskers Plot A box and whiskers plot shows the
spread and center of data. It is a graphical representation of the five- number summary:
minimum, maximum, median, and the first and third quartiles
Deciles are similar to quartiles. But where quartiles split the data in four equal parts, deciles split
the data into ten parts: 
The 10th, 20th, 30th, 40th, 50th, 60th, 70th, 80th, 90th and 100th percentiles.
Five Number Summary The five- number summary is an overview of your data. The statistics
in the summary are the smallest value (minimum), the largest (maximum), the middle (median)
and the first and third quartiles. Interquartile Range (IQR)-The interquartile range tells you
where the “middle fifty” is in a data set.
While the range tells you where the beginning and end are in a set, the IQR shows you where
the bulk of the “middling” values lie.
Outliers are unusual values that fall outside of an expected range of values. For example, if
you’re measuring IQ values of children, your statistics would be thrown off if Einstein and
Stephen Hawking were in your class: their IQs would be outliers.
 Percentiles A percentile is a number where a certain percentage of scores fall below that
number. For example, a 90th percentile marks the spot where 90% of values fall below that cut-
off point.
Percentiles indicate the score in the population. 25th percentile: scored better than 25% of the
observed data,
It is also known as the Q1, or the first Quartile, 50th percentile is also known as the Q2 as it
splits the data in half https://www.youtube.com/watch?v=mDJvDRvvDXo 3a-
QUARTILES AND PERCENTILES OF GROUPED DATA
 TO FIND THE QUARTILES:

1)      Compute for Q1, Q2, and Q3


 Solution:
Q1 ={k(N)}/4 =[ 1(50) ]/ 4 = 12.5
locate cumulative frequency where 12.5 might be in from investigation,
we found that 12.5 is at cf17 from class 36-44 So that
c f = 17 L = 36 -0.5 = 35.5         and       f = 9      with and i = 9
Plugging in the values of the variables 
Q1 = 40 (the same procedure follows for Q2 and Q3) 2)

From the same set of data find P36


Solution: P36 = k N/ 100 = 36(50)/100= 18;          c f =31 L = 44.5 f = 14
P36 = 44.5 + (18 – 31) /14 = 43.571
observations then Range = X max – X min from our example the X max= 12 and X min=3
Quartiles Simply put, quartiles divide your data into quarters: the lowest quarter, two middle
quarters, and a highest quarter

Standard scores (i.e. z-scores) Z-scores are a way to compare results from a test to a “normal”
population.
Now that we already understand what relative positions are, let us take some numerical
understanding of each.
Given the following data:
find the position of the 55th percentile:

STEPS TO FOLLOW: 
Before proceeding always make sure that the data given are arranged in either ascending or
descending order (in this case the data are already arranged)
Determine the index
i = (Px/ 100) n i = (P55 /100) (16) =8.8 =9 Count 9 places from the lowest value, then that is the
position of the 55th percentile
If the result in step B is a whole number, the index shall be taken from this: (i and i + 1)/2
What is the importance of the index?
 It tells the position of the percentile.
Given a value has the percentile value of 16 in the data set, what is the rank?
Px = 16 

Px = [(x + 0.5y)/n] (100) – use this when finding the rank given the Percentile
x = the number of values before 16
y = how many times 16 occur in the data set? = 2
Px = [(8 + 0.5(2)}/6] (100)                                   Px = 56.55
- QUARTILES AND PERCENTILES OF GROUPED DATA
TO FIND THE QUARTILES:

Compute for Q1, Q2, and Q3


Solution:
Q1 ={k(N)}/4 = [ 1(50)] / 4 = 12.5
locate cumulative frequency where 12.5 might be in. From investigation, we found that 12.5 is at
cf17 from class 36-44
So that c f = 17 L = 36 -0.5 = 35.5
and f = 9 with and i = 9 
Plugging in the values of the variables.
Q1 = 40 (the same procedure follows for Q2 and Q3)
3) From the same set of data find P36
Solution: P36 = k N/ 100 = 36(50)/100= 18;                      c f =31 L = 44.5            f = 14
P36 = 44.5 + (18 – 31) /14 = 43.571

4 THE FIVE-NUMBER SUMMARY 


The five- number summary is used to construct a box plot (Box and Whisker Plot). 
These five numbers are the minimum value, the Q1, Q2, Q3, and the maximum value.
 From the box plot, the interquartile range 
IQR is taken as the difference between Q3 and Q1 and the outliers, 
those values beyond the minimum and the maximum can be determined by multiplying IQR by
1.5 and adding it to Q3 for the positive outliers and subtracting it to Q1 for the negative outliers.

• HOW TO DRAW THE BOX-AND-WHISKER PLOT


for the following data: 98, 77, 85, 88, 82, 83, 87 1.
order the data: 77, 82, 83, 85, 87, 88, 98 2.
find the median. Q2 = 85
The median splits the remaining data into two sets.
The first set is 77, 82, 83. The median of this set is: Q1 = 82
The other set is 87, 88, 98. The median of this set is Q3 = 88
IQR = Q3 -Q1 88- 82 = 6
the outliers, those values beyond the minimum and the maximum can be determined by
multiplying IQR by 1.5 and adding it to Q3 for the positive outliers and subtracting it to Q1 for
the negative outliers
+outliers = 1.5(6) +88 = 97
 -outliers = 1.5(6) – 82 = -73
Case 2- when the index is a decimal, round the quotient to the next higher level
Find the 25  percentile from the given data
th

25  percentile index = (25/100) (26) = 6.25 round- up to the next higher number = 7
th

The 25  percentile = 75 (the 7  number from the lowest value)


th th

Find the 36th percentile of the following scores. (Be careful)

 Exercise 2/ Activity 2
Try and define the following terms:
Percentile
Decile
Quartile
Relative position
Whisker box plot
Five-number summary
Interquartile range Outliers
Exercise 3/ Activity 3
Using the formula, we have learned for calculating the Percentiles and Quartiles of grouped data:

You might also like