You are on page 1of 8

MODULE 6: DESCRIPTIVE STATISTICS

LESSON 1
Introduction to Statistics
Definition of terms:

Statistics is a branch mathematics that deals with the collection, organization or presentation, analysis, and
interpretation of data. Its fundamental purpose is to describe and draw inferences about the numerical
properties of a population.
Descriptive Statistics is a statistical procedure concerned with describing
Historical/Bibli
cal Note the characteristics and properties of a group of persons, places or things; it is based
The origin of descriptive statistics on easily verifiable facts. It organizes the presentation, description, and
can be traced to data collection
methods used in censuses taken interpretation of data gathered. It includes the study of relationships among
by the Babylonians and
Egyptians between 4500 and
3000 BC.
variables.
In the Roman Empire between If you have gathered data from a survey and have organized them in a
27 BC to 17 AD conducted
surveys on births and deaths its systematic, easy-to-read manner then you have succeeded in applying the basic
citizens, the number of livestock
and the harvested crops yearly. principles of descriptive statistics.
Luke 2:1-4
In those days a decree went out
Among the measurements falling under descriptive statistics are the
from Caesar Augustus that all
the world should measures of central tendency, measures of variability, skewness, kurtosis,
be registered. 2 This was the
first registration minimum, maximum, summation and other items which help in describing a data
when[a] Quirinius was governor of
Syria. 3 And all went to be set.
registered, each to his own
town. 4 And Joseph also went
up from Galilee, from the town
Descriptive statistics answers the questions such as:
of Nazareth, to Judea, to the city
of David, which is 1. How many students are interested to take online classes?
called Bethlehem, because he
was of the house and lineage of 2. What months has the highest and the lowest number of covid-19 positive?
David…
3. What are the most likable Netflix series according to students?
4. Who performed better in the entrance examination?
5. What proportion of the ULS college students likes online class?
Inferential Statistics is a statistical procedure used to draw inferences for the population on the
basis of the information obtained from the sample. With inferential statistics, you are going to try to arrive
at conclusions extending beyond the data alone. You may use it to make judgments of the possibility that
an observed difference between groups/data is a dependable one or it just happened due to chance. It is a
matter of deciding between reality and coincidence.
Inferential statistics can answer questions such as:
1. Is there a significant difference in the academic performance of students enrolled in an online
and modular class?
2. Is there a significant difference between the proportions of students who are interested to take
statistics online and those who are not?

Population – refers to a large collection of objects, places or things.


Parameter – is any numerical value which describes a population.
Example: There are 7, 592 students enrolled in a certain Marian Institution.
N = 7, 592 Parameter (N)

Sample – is a small portion or part of a population; a representative of the population in a research study.
Statistic – is any numerical value which describes a sample.
Example: Out of the 7, 592 students enrolled in a Marian Institution, 3,568 are Female.
n = 3,568 Statistic (n)

Data - are facts, or a set of


information gathered or
under study. Data

Qualitative Data – are variables


that can be placed into distinct
categories, characteristic or
attributes.
Quantitative Data – are
Quantitative Qualitative
numerical and can be ordered or
ranked.

Discrete Data – assume exact


value and can be obtained
through counting. e.g: number of
students
Continuous Data – assume
infinite values within an interval Discrete Continuous
and obtained through
measurement. e.g: Temperature

Note this:
Constant – is a characteristic or property of a population or sample which makes the
Researchers are not members similar to each other.
interested in constant
since they do not Variable – is a characteristic or property of a population or sample which makes the
make the subjects of
research different from members different from each other.
another. They are
specifically interested Dependent Variable – A variable that is affected by another variable.
in variables
Independent Variable – a variable which affects the dependent variable.
Scales of Measurement
1. Nominal level of measurement classifies data into mutually exclusive categories in which no order
or ranking can be imposed on the data. Nominal numbers are just labels. e.g. SSS number
2. Ordinal level of measurement classifies data into categories that can be ranked; however, precise
differences between the ranks do not exist. e.g. size of t-shirt.
3. Interval level of measurement ranks data, and precise differences between units of measure do exist;
however, there is no meaningful zero. e.g. temperature.
4. Ratio level of measurement possesses all the characteristics of interval measurement, and there exists
a true zero. in addition, true ratios exist when the same variable is measured on two different members
of the population. e.g. height

Sampling Techniques
In doing research, if the population is too big a scientific number of samples is acceptable. One way
of getting a number of samples is by using the RAOSOFT survey tool. You can use the raosoft calculator
tool online to compute the desired and substantial sample size. http://www.raosoft.com/samplesize.html

Note that the “e” is called the margin of error. It is a value which quantifies possible sampling errors.
Usually the margin of error is either 0.01 or 1%, 0.10 or 10 % and 0.05 or 5%. Sampling error means that
the results in the sample differ from those of the target population because of the “luck of the draw”.

Note this: Since you already know what to use to compute the appropriate sample size, the
Sampling is the process next is how to select the samples from the population. This is referred to as sampling.
of selecting samples from
a given population We will only consider and discuss the probability sampling techniques, these are:
There are two types of
sampling techniques Simple random sampling. This is a procedure where a sample is selected in such a way
(1) Probability that every element is as likely to be selected as any other element in the population.
Sampling: samples
are chosen in such a
way that each Example:
member of the
population has an Lottery: this needs a complete list of the population. you write the names or codes
equal chance of
being selected in the of each member and place them in a container, then randomly draw the
samples
(2) Non Probability desired number of samples. This is easy if the population is small.
Sampling: each
member of the Systematic random sampling. This method is a sampling procedure with a random start.
population does not
have a known chance Samples are randomly chosen using the rules set by the researchers. This involves
of being included in 𝑁
the sample. Hence, choosing the 𝑘 𝑡ℎ member of the population, with 𝑘 = 𝑛 , but there should be a random
personal judgment
plays an important start.
role in the selection.
Example: Choose a sample of size 10 from N = 500.
1. Choose a random start, say 10.
500
2. Determine the 𝑘 𝑡ℎ period by 𝑘 = = 50, so every 50th member will be chosen starting from 10
10
3. So the respondents will be member number 10, 60, 110, 160, 210, 260, 310, 360, 410, 460.
Stratified random sampling. This is used when the population can be naturally classified
into groups or strata.
Example: A survey to find out families living in a certain municipality are in favor of charter
change will be conducted. To ensure that all income groups are represented, respondents will
be divided into high-income (Class A), middle (class B) and low-income (class C) groups.
Below is the distribution of income groups.
Strata Number of Families
Class A 1000
Class B 2 500
Class C 1 500
N 5 000
1. Using Raosoft Calculator to find the sample size (n), use 5% margin of error with 50% response
rate, n = 357

2. Use proportional allocation, how many from each group should be taken as sample?

Strata Number of Families Percent Number of


Samples (n)
Class A 1000 1000 (0.2)(357)
= 0.2 = 20%
5000 = 71.4 = 71
Class B 2 500 2500 (0.5)(357)
= 0.5 = 50%
5000 = 178.5 = 179
Class C 1 500 1500 (0.3)(357)
= 0.3 = 30%
5000 = 107.1 = 107
N 5 000 n = 357

So, 71 families should be taken as respondents from Class A, 179 from Class B and 107 from
Class C, for a total of 357.
Collecting and Organizing Data in a Table
The study of statistics begins with the collection of data or measurements. Data collected should be
organized systematically for easier and faster interpretation. They may be presented in any of the following
forms:
The textual form can be used if the data to be presented is few.
The tabular and graphical forms are used when more detailed information about the data is to be
presented.
A table is used when you want to present a data in a systematic and organized manner so that reading
and interpretation will be simpler and easier.
When a table is used, you must consider the following parts:
1. Table number Table 3
Distribution of students Hogwarts School According to Year Level
2. Table Title
3. Column header Year Level Number of Students
Freshman 350
4. Row classifier Sophomore 300
5. Body of the table Junior 250
Senior 200
6. Source note total 1 100
Source: Hogwarts Registrar
Example 1:
Table 1
Mahusay National High
School Enrolment, SY
2005-2006
Year Level Male Female
First 216 267
Second 197 216
Third 187 227
Fourt 176 215
h 776 925
Total
You will observe that the table above shows clearly the enrolment data in Mahusay National High School
for the school year 2005-2006.
Another type of tabular presentation is the frequency table also known as a frequency distribution. It
is an arrangement of the data that shows the frequency of occurrence of different values of the variables.

A frequency table is constructed by listing the measurements from highest to lowest, then making
tally marks to record how often each number occurs. After tallying, count the marks and record them in
the proper column.

Example 2: The scores of 45 students on a 20-point Science quiz are as follows:

17 20 15 18 19 16 11 10 15 16
12 12 13 14 11 10 14 13 12 11
13 15 14 10 15 16 17 17 18 20
20 18 19 19 18 17 16 15 12 12
13 14 15 19 20

Prepare a frequency table for the set of data.

Solution: To prepare a frequency table for the given set of scores, the scores are listed from highest
to lowest, tally marks are made and counted. The counted tally marks will then be
recorded under the column frequency. Notice that every 5th tally crosses the first four
tallies. This is done to make counting of marks easier especially if the number of cases is
rather big.
Score Tallies Frequency
20 //// 4
19 //// 4
18 //// 4
17 //// 4
16 //// 4
15 ////// 6
14 //// 4
13 //// 4
12 ///// 5
11 /// 3
10 /// 3
Total 45

Frequency Distribution Tables

If the number of measures in consideration is rather big, the presentation of data is further
simplified by grouping the measures into class intervals called a frequency distribution.
A frequency distribution is a distribution of the total number of measures or frequencies over
arbitrarily defined categories or classes. The number of measures falling under a class is called class
frequency.

Example 1.
The frequency distribution below shows the scores obtained by 300 students in an English
test of 50 items.

Number of
Score Students
45-49 15
40-44 32
35-39 42
30-34 108
25-29 67
20-24 21
15-19 10
10-14 5
Total 300

In the example above, the symbol 45-49 and the other symbols which follow up to 10-14 are called
class intervals. The end numbers are called class limits. For instance in the class interval 45-49, 45 is
called the lower limit while 49 is called the upper limit.

Each class interval has also a lower boundary and a higher boundary. For the class interval 45-49,
the lower boundary is 44.5 while the higher boundary is
49.5. Hence, for the class interval 45-49, 44.5 – 49.5 are called the class boundaries.

The size of the class interval, also called class size is the difference between the upper boundary
and the lower boundary. Hence, the class size in the given example is 5

A class interval has also a midpoint or a class mark. It is obtained by taking half the sum of the lower and
45−49
upper class limit. For instance, the midpoint of the class interval 45-49 is 2 or 47.

Range (R) is the difference of the Highest score (H) and the lowest score (L) in the given data set.
The following are the suggested steps on how to make a class interval:
1. Determine the desired number of classes (n) (number of rows)
2. Solve for the class width (i)
𝑅𝑎𝑛𝑔𝑒
𝑖=
𝑛
3. Start the lowest class interval with the lowest value / score in the given data set. (lowest score
plus i). Continue until the highest value in the distribution is reached

Task 1.
A. Read the given statement and classify whether the statement is descriptive or inferential.
1. Americans shelled out $60 billion for 196 million barrels of cola in 1998, generating $29
billion in retail profits
2. Coca-Cola accounted for one-third of the market in 1990.
3. The company is projected to claim half of the cola market by 2013.
B. Below is the number of students taking up statistics from each department. A seminar will be
conducted outside the campus.
4. At 3% margin of error with 50% response rate, how many participants should be taken as
samples? (Use RAOSOFT)
5. Complete the table
Course Number of students Percent n
COA 120
CBE 89
CAS 57
COED 34
Total 300
6. Give a simple interpretation of the results

You might also like