You are on page 1of 10

POLYTECHNIC UNIVERSITY

OF THE PHILIPPINES
SAN PEDRO CAMPUS

FUNDAMENTALS OF
DESCRIPTIVE ANALYTICS
BUMA 30063
INSTRUCTIONAL MATERIAL

PREPARED BY
Mc Joben R. Reyes, MIS, LPT

2023
BASIC
DESCRIPTIVE
STATISTICS
CHAPTER 2

BUMA 30043 - Fundamentals of Descriptive Analytics


BUMA 30063 - Fundamentals of Descriptive Analytics

BASIC DESCRIPTIVE STATISTICS

INTRODUCTION
We will focus on Summary Statistics. These are the different measures that are used to
describe any set of data. If we want to know the typical value of a certain variable, how
different the values are from one another, how a certain data point compares to the
rest, we can use these measures.

FREQUENCY DISTRIBUTION
Frequency is simply the number of occurrences of an event. A frequency distribution is
a list, table or graph that displays the frequency of various outcomes in a sample. It
tells us how many there are of each item in the data set.

Frequency distribution can show us the raw number of each item and its percentage
toward the total.

What Is a Frequency Distribution?


A frequency distribution is a representation, either in a graphical or tabular format,
that displays the number of observations within a given interval. The interval size
depends on the data being analyzed and the goals of the analyst. The intervals must be
mutually exclusive and exhaustive. Frequency distributions are typically used within a
statistical context. Generally, frequency distributions can be associated with the
charting of a normal distribution.

Understanding a Frequency Distribution


As a statistical tool, a frequency distribution provides a visual representation of the
distribution of observations within a particular test. Analysts often use a frequency
distribution to visualize or illustrate the data collected in a sample. For example, the
height of children can be split into several different categories or ranges.

In measuring the height of 50 children, some are tall and some are short, but there is a
high probability of a higher frequency or concentration in the middle range. The most
important factors for gathering data are that the intervals used must not overlap and
must contain all of the possible observations.

25
BUMA 30063 - Fundamentals of Descriptive Analytics

BASIC DESCRIPTIVE STATISTICS

FREQUENCY DISTRIBUTION (CONT..)


Visual Representation of a Frequency Distribution
Both histograms and bar charts provide a visual display using columns, with the y-axis
representing the frequency count, and the x-axis representing the variable to be
measured. In the height of children, for example, the y-axis is the number of children,
and the x-axis is the height. The columns represent the number of children observed
with heights measured in each interval.

In general, a histogram chart will typically show a normal distribution, which means
that the majority of occurrences will fall in the middle columns. Frequency distributions
can be a key aspect of charting normal distributions which show observation
probabilities divided among standard deviations.

Examples of frequency distribution:

What Are the Types of Frequency Distribution?


The types of frequency distribution are grouped frequency distribution, ungrouped
frequency distribution, cumulative frequency distribution, relative frequency
distribution, and relative cumulative frequency distribution.

26
BUMA 30063 - Fundamentals of Descriptive Analytics

BASIC DESCRIPTIVE STATISTICS

FREQUENCY DISTRIBUTION (CONT..)


What Is the Importance of a Frequency Distribution?
A frequency distribution is a means to organize a large amount of data. It takes data
from a population based on certain characteristics and organizes the data in a way that
is comprehensible to an individual that wants to make assumptions about a given
population.

How Can I Construct a Frequency Distribution?


To construct a frequency distribution, first, note the specific classes determined by
intervals in one column then sum the numbers in each isolated category based on how
many times it shows up. The frequency can then be noted in the second column.

Understanding Frequency Distribution gives us a way of understanding and organizing


our data in a logical way. Once we have done this, we will be able to apply different
summary statistics measures to our data.

26
BUMA 30063 - Fundamentals of Descriptive Analytics

BASIC DESCRIPTIVE STATISTICS

MEASURES OF CENTRAL TENDENCY


Measures of Central Tendency give us the typical value of data. There are three
measures of central tendency, the Mean, Median, and Mode.

The Mean
The mean (also known as the arithmetic mean) is the most commonly used measure
of central position. It is used to describe a set of data where the measures cluster or
concentrate at a point. As the measures cluster around each other, a single value
appears to represent distinctively the typical value.

It is the sum of measures x divided by the number N of measures in a variable. It is


symbolized as x̄ (read as x bar). To find the mean of an ungrouped data, use the
formula

x̄ = ∑x / N

where ∑x = the summation of x (sum of the measures)


and N = number of values of x

Example: The grades in Geometry of 10 students are 87, 84, 85, 85, 86, 90, 79, 82, 78,
76. What is the average grade of the 10 students?

Solution: x̄ = ∑x / N
x̄ = 87 + 84 + 85 + 85 + 86 + 90 + 79 + 82 + 78 + 76 / 10
= 832 / 10
x̄ = 83.2

Hence, the average grade of the 10 students is 83.2.

27
BUMA 30063 - Fundamentals of Descriptive Analytics

BASIC DESCRIPTIVE STATISTICS

MEASURES OF CENTRAL TENDENCY


The Median
The middle value here or term in a set of data arranged according to size/magnitude
(either increasing or decreasing) is called the median.

Example: The library logbook shows that 58, 60, 54, 35, and 97 books, respectively,
were borrowed from Monday to Friday last week. Find the median.

Solution: Arrange the data in increasing order.


35, 54, 58, 60, 97
We can see from the arranged numbers that the middle value is 58.
Since the middle value is the median, then the median is 58.

The Mode
The mode is the measure or value which occurs most frequently in a set of data. It is
the value with the greatest frequency.

To find the mode for a set of data:


1. select the measure that appears most often in the set;
2. if two or more measures appear the same number of times, then each of these
values is a mode; and
3. if every measure appears the same number of times, then the set of data has no
mode.

28
BUMA 30063 - Fundamentals of Descriptive Analytics

BASIC DESCRIPTIVE STATISTICS

MEASURES OF LOCATION
Sometimes, we want to know how a certain data point compares with the rest. This is
for example, in the case of rankings and quotas. In some situations, we could also
divide data into a certain number of equal sections to answer our questions, as with
certain problems that would involve brackets, classes, and other groupings.

Measures of Location specify points in the data set in which a specified amount of
data lie. This allows us to find the position of a data in relation to the entire data set.

Some examples of these are percentiles, deciles and quartiles. Percentiles divide the
data into 100 equal parts, deciles divide the data into 10 equal parts, and quartiles
divide the data into 4 equal parts.

Median, a measure of central tendency discussed earlier, is also a special measure of


location. If you can recall, the median is the middle value in the data set so it divides
the data into two equal parts.

MEASURES OF DISPERSION
There are two types of Measures of Dispersion. First is Absolute, which is the measure
of the variability within a data set, and relative dispersion which compares this data set
with other data sets.

Variance and Standard Deviation are measures of dispersion with reference to the
mean. The higher these values are, the farther away from the mean the data values are.
Standard deviation is the square of variance, resulting in a number that is always
positive and is in the same units as the mean.

Example: The mean is given as (3 + 5 + 8 + 1) / 4 = 4.25. Then by using the definition


of variance we get [(3 - 4.25)^2 + (5 - 4.25)^2 + (8 - 4.25)^2 + (1 - 4.25)^2] / 4 =
6.68. Thus, variance = 6.68 and the standard deviation = √6.68 or 2.584.

29
BUMA 30063 - Fundamentals of Descriptive Analytics

BASIC DESCRIPTIVE STATISTICS

MEASURES OF DISPERSION (CONT..)


Let’s calculate the variance of the follow data set: 2, 7, 3, 12, 9.

The first step is to calculate the mean. The sum is 33 and there are 5 data points.
Therefore, the mean is 33 ÷ 5 = 6.6. Then you take each value in data set, subtract the
mean and square the difference. For instance, for the first value:

(2 - 6.6)^2 = 21.16

The squared differences for all values are added:

21.16 + 0.16 + 12.96 + 29.16 + 5.76 = 69.20

The sum is then divided by the number of data points:

69.20 ÷5 = 13.84

The variance is 13.84. To get the standard deviation, you calculate the square root of
the variance, which is 3.72.

30
BUMA 30063 - Fundamentals of Descriptive Analytics

BASIC DESCRIPTIVE STATISTICS

LEARNING ASSESSMENT
Answer the following discussion point.

When is it best to use mean? What about median or mode? Name some specific
examples of situations in which one would choose a certain measure over the two
others.

31

You might also like