Professional Documents
Culture Documents
Types of Data
Data are needed whenever we undertake studies or researches. They have been used
to undertake problems or to provide a basis which certain decisions are generated. The next
step after the problem has been defined in the study is data collection. There are two types of
data; namely, the primary and the secondary data. Primary data are data collected directly by
the researcher himself. These are first-hand or original sources. They can be collected through
the following: 1) by direct observation or measurement; 2) by interview using sets of
questions called questionnaires or rating scales as guides collecting objective and
measurable data 3) by mail of recording or of reporting forms via ordinary and special mail,
courier services, e-mail and fax to reach out distant data providers, 4) by experimentation to
find out cause and effect of a certain phenomenon, and 5) by registration such as registry of
births, deaths, marriages. The latter is governed by certain laws on the other hand, secondary
data are information taken from published or unpublished materials previously gathered by
other researchers or agencies such as book newspapers, magazines, journals, published and
unpublished thesis and dissertations Primary data collection can be more effective and
informative if given the necessary preparation and planning.
Sampling Techniques
One of the most important parts of the research work that needs preparation and
planning is choosing the right and appropriate sampling method. Any sampling procedure
that produces an inference that underestimates is biased or erroneous.
There are two types of sampling techniques: probability sampling and non-
probability sampling. The difference between them is that in probability sampling, every unit
has a chance of being selected, and that chance can be quantified (also known as
Representativeness). This is not true for non-probability sampling; every item in a population
does not have an equal chance of being selected.
Probability sampling involves the selection of a sample from a population, based on the
principle of randomization or chance. Probability sampling is more complex, more time
consuming and usually more costly than non-probability sampling.
2. Stratified Random Sampling involves splitting subjects into mutually exclusive groups
and then using simple random sampling to choose members from groups.
3. Systematic Sampling means that you choose every “nth” participant from a complete
list. For example, you could choose every 10th person listed.
4. Cluster Random Sampling is a way to randomly select participants from a list that is
too large for simple random sampling. For example, if you wanted to choose 1000
participants from the entire population of the U.S., it is likely impossible to get a complete
list of everyone. Instead, the researcher randomly selects areas (i.e. cities or counties)
and randomly selects from within those boundaries.
1. Convenience Sampling: as the name suggests, this involves collecting a sample from
somewhere convenient to you: the mall, your local school, your church. Sometimes
called accidental sampling, opportunity sampling or grab sampling.
4. Expert Sampling: in this method, the researcher draws the sample from a list of experts
in the field.
6. Modal Instance Sampling: The most “typical” members are chosen from a set.
7. Quota Sampling: where the groups (i.e. men and women) in the sample are
proportional to the groups in the population.
8. Snowball Sampling: where research participants recruit other members for the study.
This method is particularly useful when participants might be hard to find. For example, a
study on working tracer study finding the graduates of a certain university in different
years covered.
Presentation of data also needs planning and presentation. If data are properly and
interestingly presented the benefits will not only go to the readers or users out more so to the
statisticians who will make the analysis and interpretation of the data gathered.
The mere gathering of the information or data is not a small task. A greater task is to
take the data comprehensible and meaningful. The data gathered are summarized and
presented in different forms, namely: 1. textual form, 2. tabular form, and 3. graphical form.
In the textual form, the data are incorporated in the text of the report. In the tabular
form, the data are presented in rows and columns When large sets of data are presented, the
graphics form is utilized for an “easy to digest" information This comes in graphs and diagrams.
Raw data are data collected in an investigation and they are not organized
systematically. Raw data that are presented in the form of a frequency distribution are called
grouped data.
There are two methods of organizing the raw data-setting up an array and stem and-
leaf diagram. An ordering of the observations from smallest to the largest or vice-versa is an
array. It has advantages because the low and high values can be readily perceived. The stem
and leaf display give an good overall impression of the data.
For example, a nationwide travel agency offers special rates for package tours during
summer. To economize spending for the advertisement only certain age group of people will be
sent brochures for attraction. The agency gets to previous passenger customers from its files
and groups them according to ages. Only those age groups with least people are sent
brochures. The following are the ages of their previous customers
59 50 52 38 80 62 77 66
60 61 59 62 51 36 54 18
71 56 44 52 26 63 58 66
41 34 61 50 60 53 62 62
53 43 63 71 65 79 45 66
1 8
2 6
3 4 6 8
4 1 3 4 5
5 0 0 1 2 2 3 3 4 4 5 8 9 9
6 0 0 1 1 2 2 2 2 3 3 6 6 6
7 1 1 7 9
8 0
Setting up the data into stem – and – leaf diagram, the number (raw data) is broken into
tens and units’ digits are tallied together whose values share with the ten’s digits. In the first
row, we think 18 as 1 | 8. Each row represents a stem position and each digit to the right of a
vertical line is a leaf. Thus, the first row 1 | 8; 1 is the stem and 8 is the leaf.
Frequency Distribution
Another way of presenting raw data is the frequency table. When the data are arranged
in tabular form by the frequencies, the table is called frequency table. The arrangement itself
is called the frequency distribution.
It would be difficult to determine by scanning the mass of numerical data unless they are
organized into frequency distribution where drawing generalizations will be readily drawn. The
construction of frequency distribution consists essentially of three steps:
1. Deciding on a set of grouping called classes;
2. Sorting or tallying the data into classes;
3. Counting the number of tallies in each class called frequencies.
For deciding the number of classes, the formula in Class interval will be:
𝐻𝑖𝑔ℎ𝑒𝑠𝑡 𝑂𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑉𝑎𝑙𝑢𝑒 − 𝐿𝑜𝑤𝑒𝑠𝑡 𝑂𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑉𝑎𝑙𝑢𝑒
𝑆𝑢𝑔𝑔𝑒𝑠𝑡𝑒𝑑 𝐶𝑙𝑎𝑠𝑠 𝐼𝑛𝑡𝑒𝑟𝑣𝑎𝑙 (𝑐. 𝑖) =
1 + 3.322 log 𝑁
Where, N is number of observations.
80−18 62 62
𝑐. 𝑖. = 1+3.322 log 40 = 1+3.322(1.60206) = 6.322 = 𝟗. 𝟖 𝒐𝒓 𝟏𝟎
Since we already have the class interval, we can now construct the Classes of our
Frequency Distribution. Each Class has two limits – a lower stated class limit and upper
stated class limit. A common practice is to let the lower limit of the of the first class be a
number below the lowest observation (must be divisible by the class interval) and to make
all the classes in equal lengths or class size.
Class Tally Frequency Relative Percentage
Frequency
10 – 19 I 1 0.025 2.5%
20 – 29 I 1 0.025 2.5%
30 – 39 III 3 0.075 7.5%
40 – 49 IIII 4 0.100 10.0%
50 – 59 IIIII IIIII III 13 0.325 32.5%
60 – 69 IIIII IIIII III 13 0.325 32.5%
70 – 79 IIII 4 0.100 10.0%
80 – 89 I 1 0.025 2.5%
Sum 40 1.000 100.0%
A point that represent the halfway point between successive classes is called class
boundary. It is obtained by adding the upper limit of one class and the lower limit of the next
class, and divide by 2. A Class Mark is the midpoint of the class. To obtain it, just add the lower
limit and upper limit of the same class and then divide it by 2. While in cumulative frequency, in
<cf (less than) in starting in the lower class the first entry will be the frequency of the class, then
add it to the frequency of the next class up to the highest class.