Professional Documents
Culture Documents
STATISTICS: AN OVERVIEW
Mark Train
“Without big data, you are blind and deaf and in the middle of a freeway.”
– Geoffrey Moore.
Learning Outcome:
Pretest
Directions: Each word belongs to a particular kind of data. You are to identify the following words
as to the particular kind of data where they belong. Write N for Nominal data, O for Ordinal Data,
I for Interval Data and R for Ratio Data. Write your answer on the space provided before each
number:
_____ 1. Gender
_____ 2. Current rank in the class
_____ 3. Color of the shirt
_____ 4. Birth Order
_____ 5. Percentage of profit over sales
_____ 6. Age bracket of teenagers
_____ 7. Grade range for “With honors”
_____ 8. Return on Investments
_____ 9. Interest Rate
_____10. World ranking for universities
Select the best answer from the options given in each of the statements below:
1. Which of the following is an example of a categorical variable?
a. Flavour of softdrink ordered by each customer in a fastfood
b. Height measured in inches for each student in a class
c. Points scored by each player in a team
2. Numerical and pictorial information about variables are called
a. Analytical statistics
b. Inferential statistics
c. Descriptive statistics
3. The entire group of interest for a statistical conclusion is called the
a. Data
b. Population
c. sample
4. A subgroup that is representative of a population is called
a. Category
b. Data
c. sample
5. Statistical inference is:
a. The process of estimates and conclusions carefully based on data from a sample
b. The process of estimates and conclusions carefully based on data from the entire
population
c. Pictorial displays that summarize a data
6. Two types of statistical variables are:
a. Categorical and descriptive
b. Categorical and numerical
c. Descriptive and numerical
7. It is a mathematical science that deals with data collection, organization, analysis and
interpretation.
a. Mathematics
b. Inforgraphics
c. Statistics
8. _____________ is a set of raw numbers and/or words that are collected through
observations and/or just descriptions of things.
a. Information
b. Data
c. collections
9. An estimate of the characteristics of a population is called:
a. Data
b. Sample
c. population
10. Statistic is to a sample, as __________ is to population.
a. Representation
b. Parameter
c. Mentimeter
Learning Content
Data is a set of raw numbers and/or words that are collected through observations and/or
just descriptions of things. In a more technical sense, data is a set of values of qualitative or
quantitative variables about one or more persons or objects, while a datum is a single value of a
single variable. Data, basically, are unprocessed information.
Data Sets are elements with which elements are characterized by variables.
Types of Data
Basically, there are four kinds of data:
o Nominal
o Ordinal
o Interval
o Ratio
Types of Data
Categorical Numeric
Nominal data is a kind of data that is taken from nominative variables like gender, degree
completed, color of the skin, etc. Take for example gender. When gender is used as a variable,
the data gathered will be male, female, lesbian, gay, bisexual, transgender, queer, questioning,
intersex and others for which these are pure descriptions of gender. There are no numbers
assigned nor order on it. Another example is degree completed wherein the data gathered would
possibly be, Bachelor of Secondary Education, Bachelor in Elementary Education, Bachelor of
Science in Engineering (Electrical, Electronics and Communications, Civil, Mechanical, etc.).
These are pure nomenclatures of degree programs completed by the respondent.
Nominal data can be analyzed using the grouping method. The variables can be grouped
together into categories, and for each category, the frequency or percentage can be calculated.
The data can also be presented visually, such as by using a pie chart.
Although nominal data cannot be treated using mathematical operators, they still can be
analyzed using advanced statistical methods. For example, one way to analyze the data is
through hypothesis testing.
For nominal data, hypothesis testing can be carried out using nonparametric tests such
as the chi-square test. The chi-square test aims to determine whether there is a significant
difference between the expected frequency and the observed frequency of the given values.
Ordinal Data is a kind of data that is expressed in order or rank. For example, ordinal data
is said to have been collected when a responder inputs his/her financial happiness level on a
scale of 1-10. In ordinal data, there is no standard scale on which the difference in each score is
measured.
Considering the example highlighted above, let us assume that 50 people earning
between $1000 to $10000 monthly were asked to rate their level of financial happiness. An
undergraduate earning $2000 monthly may be on an 8/10 scale, while a father of 3 earning $5000
rates 3/10. This is to show that the scale is usually influenced by personal factors and not due to
a set rule.
Another example is ranking in competitions. Ordinal data are built upon nominal scales
by assigning numbers to objects to reflect a rank or ordering on an attribute (formplusblog, 25th
June, 2020). Examples are that of the extent of satisfaction of customers on a restaurant service,
which may range from very satisfied, satisfied to dissatisfied. These extent of satisfaction
descriptions may be assigned with numbers that denote order like 3 is assigned to an extent
where customers are “very satisfied”, 2 can be assigned to “satisfied” customers and 1 for
“dissatisfied” customers.
Interval data is a kind of data which are expressed in scales. Each point of the scale is
placed at an equal distance from one another. Interval data is one of the 2 types of numerical
data and is reflects as an extension of the ordinal data.
Class Size. The term refers to the difference between the class boundaries of the upper
limit and the lower limit be it overlapping or non-overlapping lower and upper limits. Example,
The class size of the overlapping interval 10 – 20.
Lower Limit = 10
Upper Limit = 20
Take the following sample data on weights of people on a diet plan. 52, 75, 92, 101, 83,
68, 133, 78, 104, 61, 39, 46, 135, 87, 131, 99, 104, 86, 67, 116, 89, 57, 87, 98, 131, 116, 135,
93.
1. With a class interval of 14, determine how many classes you get
2. Present the weights by using a frequency distribution table.
39, 46, 52, 57, 61, 67, 68, 75, 78, 83, 86, 87, 87, 89, 92, 93, 98, 99, 101, 104, 104, 116, 116,
131, 131, 133, 135, 135
Class Interval = 14
1 39 – 53 3 6.86
2 54 – 67 3 6.86
3 68 – 81 3 6.86
4 82 – 95 7 25.00
5 96 – 110 5 17.86
Total 28 100.00
Range. The term is used to describe the difference between the lowest and the highest
values. Take the following data set as an example: (4, 6, 9, 3, 7). The lowest value is 3 and the
highest value is 9 so the range is computed as: 9 – 3 = 6. So simple!
Real Limit. The term refers to the boundaries that separate each interval. The real limit
separating two adjacent scores is located exactly halfway between the scores. Each score has
two real limits – one at the top of its interval called the upper real limit and the one at the bottom
of its interval called the lower real limit.
Cumulative Frequency. This is the total of a frequency and all frequencies so far in a
frequency distribution. It is the running total of frequencies. For example:
39 – 53 3 3
54 – 67 3 6
68 – 81 3 9
82 – 95 7 16
96 – 110 5 21
111 – 124 2 23
125 – 139 5 28
Total 28
Cumulative Percent. This refers to the total percentage and all percentages so far in a
percentage distribution. It is the running total of all percentages. For example:
39 – 53 3 10.71 10.71
54 – 67 3 10.71 21.42
68 – 81 3 10.71 32.13
82 – 95 7 25.00 57.13
Total 28 100.00
Ratio data is the 2 types of numerical data. It is an extension of the interval data and is
also the peak of the measurement variable types. The only difference between the ratio data and
interval data is that the ratio data already has a zero value. For example, temperature, when
measured in Kelvin is an example of ratio variables. The presence of a zero-point accommodates
the measurement in Kelvin. Also, unlike the interval data multiplication and division operations
can be performed on the values of a ratio data.
Parameter. In statistics, parameter is a value that tells something about a population and
is the opposite from a statistic, which tells you something about a small part of the population. A
parameter never changes, because everyone (or everything) was surveyed to find the parameter.
For example, you might be interested in the average age of everyone in your class. Maybe you
asked everyone and found the average age was 25. That’s a parameter, because you asked
everyone in the class. Now let’s say you wanted to know the average age of everyone in your
grade or year. If you use that information from your class to take a guess at the average age, then
that information becomes a statistic. That’s because you can’t be sure your guess is correct
(although it will probably be close!).
Discrete Variables. These are countable variable in a finite amount of time. For example,
you can count the change in your pocket. You can count the money in your bank account. You
could also count the amount of money in everyone’s bank accounts. It might take you a long time
to count that last item, but the point is—it’s still countable.
Continuous Variables. They are variables that would (literally) take forever to count. In
fact, you would get to “forever” and never finish counting them. For example, take age. You can’t
count “age”. Why not? Because it would literally take forever. For example, you could be: 25
years, 10 months, 2 days, 5 hours, 4 seconds, 4 milliseconds, 8 nanoseconds, 99 picosends…and
so on.
Time series Data are data that are collect over several time periods.
Census. A census is a survey conducted on the full set of observation objects belonging
to a given population or universe. Context: A census is the complete enumeration of a population
or groups at a point in time with respect to well defined characteristics: for example, population,
production, traffic on particular roads.
Now that you have read the content of the unit, you perform the learning
activities. If your problem is internet connection, feel free to contact me on
the mobile phone number included in this learning package
Learning
Let’s Activity
have fun:
Evaluation
Convert the following ordinal data into interval and ratio data:
Item No Responses
5 4 3 2 1
1 16 29 32 6 17
2 18 27 30 8 17
3 14 31 30 20 5
4 16 20 30 17 17
5 10 25 35 15 15
6 15 25 27 27 6
7 12 32 20 18 18
8 15 27 23 28 7
9 8 35 32 20 5
10 10 13 23 26 28
Basis for Description:
5 Always
4 Often
3 Sometimes
2 Rarely
1 Never