Unit 2 Stat PDF

Unit 2.
DATA COLLECTION AND PRESENTATION
At the end of this Unit, the students should be able to:

• Identify the different types of data;
• Determine the various techniques of selecting a sample;
• summarize and present data in different forms;
• Solve for the class marks, class limits and class boundaries
• construct frequency distribution.
Lesson 1. Data Collection
Types of Data
Data are needed whenever we undertake studies or researches. They have been used
to undertake problems or to provide a basis which certain decisions are generated. The next
step after the problem has been defined in the study is data collection. There are two types of
data; namely, the primary and the secondary data. Primary data are data collected directly by
the researcher himself. These are first-hand or original sources. They can be collected through
the following: 1) by direct observation or measurement; 2) by interview using sets of
questions called questionnaires or rating scales as guides collecting objective and
measurable data 3) by mail of recording or of reporting forms via ordinary and special mail,
courier services, e-mail and fax to reach out distant data providers, 4) by experimentation to
find out cause and effect of a certain phenomenon, and 5) by registration such as registry of
births, deaths, marriages. The latter is governed by certain laws on the other hand, secondary
data are information taken from published or unpublished materials previously gathered by
other researchers or agencies such as book newspapers, magazines, journals, published and
unpublished thesis and dissertations Primary data collection can be more effective and
informative if given the necessary preparation and planning.
Sampling Techniques
One of the most important parts of the research work that needs preparation and
planning is choosing the right and appropriate sampling method. Any sampling procedure
that produces an inference that underestimates is biased or erroneous.
There are two types of sampling techniques: probability sampling and non-
probability sampling. The difference between them is that in probability sampling, every unit
has a chance of being selected, and that chance can be quantified (also known as
Representativeness). This is not true for non-probability sampling; every item in a population
does not have an equal chance of being selected.
Probability sampling involves the selection of a sample from a population, based on the
principle of randomization or chance. Probability sampling is more complex, more time
consuming and usually more costly than non-probability sampling.
1. Simple Random Sampling (sometimes it is called as fishbowl sampling)

is a completely random method of selecting subjects. These can include
assigning numbers to all subjects and then using a random number generator to choose
random numbers. Classic ball and urn experiments are another example of this process
(assuming the balls are sufficiently mixed). The members whose numbers are chosen
are included in the sample.
2. Stratified Random Sampling involves splitting subjects into mutually exclusive groups
and then using simple random sampling to choose members from groups.
3. Systematic Sampling means that you choose every “nth” participant from a complete
list. For example, you could choose every 10th person listed.
4. Cluster Random Sampling is a way to randomly select participants from a list that is
too large for simple random sampling. For example, if you wanted to choose 1000
participants from the entire population of the U.S., it is likely impossible to get a complete
list of everyone. Instead, the researcher randomly selects areas (i.e. cities or counties)
and randomly selects from within those boundaries.
5. Multi-Stage Random sampling uses a combination of techniques.
Non – Probability sampling
1. Convenience Sampling: as the name suggests, this involves collecting a sample from
somewhere convenient to you: the mall, your local school, your church. Sometimes
called accidental sampling, opportunity sampling or grab sampling.
2. Haphazard Sampling: where a researcher chooses items haphazardly, trying to

simulate randomness. However, the result may not be random at all and is often tainted
by selection bias.
3. Purposive Sampling: where the researcher chooses a sample based on their

knowledge about the population and the study itself. The study participants are chosen
based on the study’s purpose. There are several types of purposive sampling.
4. Expert Sampling: in this method, the researcher draws the sample from a list of experts
in the field.
5. Heterogeneity Sampling / Diversity Sampling: a type of sampling where you

deliberately choose members so that all views are represented. However, those views
may or may not be represented proportionally.
6. Modal Instance Sampling: The most “typical” members are chosen from a set.
7. Quota Sampling: where the groups (i.e. men and women) in the sample are
proportional to the groups in the population.
8. Snowball Sampling: where research participants recruit other members for the study.
This method is particularly useful when participants might be hard to find. For example, a
study on working tracer study finding the graduates of a certain university in different
years covered.
Lesson 2. Data Presentation
Presentation of data also needs planning and presentation. If data are properly and
interestingly presented the benefits will not only go to the readers or users out more so to the
statisticians who will make the analysis and interpretation of the data gathered.
The mere gathering of the information or data is not a small task. A greater task is to
take the data comprehensible and meaningful. The data gathered are summarized and
presented in different forms, namely: 1. textual form, 2. tabular form, and 3. graphical form.
In the textual form, the data are incorporated in the text of the report. In the tabular
form, the data are presented in rows and columns When large sets of data are presented, the
graphics form is utilized for an “easy to digest" information This comes in graphs and diagrams.
Stem and Leaf Diagram
Raw data are data collected in an investigation and they are not organized
systematically. Raw data that are presented in the form of a frequency distribution are called
grouped data.
There are two methods of organizing the raw data-setting up an array and stem and-
leaf diagram. An ordering of the observations from smallest to the largest or vice-versa is an
array. It has advantages because the low and high values can be readily perceived. The stem
and leaf display give an good overall impression of the data.
For example, a nationwide travel agency offers special rates for package tours during
summer. To economize spending for the advertisement only certain age group of people will be
sent brochures for attraction. The agency gets to previous passenger customers from its files
and groups them according to ages. Only those age groups with least people are sent
brochures. The following are the ages of their previous customers
59 50 52 38 80 62 77 66
60 61 59 62 51 36 54 18
71 56 44 52 26 63 58 66
41 34 61 50 60 53 62 62
53 43 63 71 65 79 45 66
I. Setting up a array from highest to lowest.

80 79 77 71 71 66 66 66
63 63 62 62 62 62 61 61
60 60 59 59 58 55 54 54
53 53 52 52 50 50 50 45
44 43 41 38 36 34 26 18
II. An Array from smallest to highest.

18 26 34 36 38 41 43 44
45 50 50 51 52 52 53 53
54 54 55 58 59 59 60 60
61 61 62 62 62 62 63 63
66 66 66 71 71 77 79 80
III. Setting up a stem and leaf diagram.
1 8
2 6
3 4 6 8
4 1 3 4 5
5 0 0 1 2 2 3 3 4 4 5 8 9 9
6 0 0 1 1 2 2 2 2 3 3 6 6 6
7 1 1 7 9
8 0
Setting up the data into stem – and – leaf diagram, the number (raw data) is broken into
tens and units’ digits are tallied together whose values share with the ten’s digits. In the first
row, we think 18 as 1 | 8. Each row represents a stem position and each digit to the right of a
vertical line is a leaf. Thus, the first row 1 | 8; 1 is the stem and 8 is the leaf.
Frequency Distribution
Another way of presenting raw data is the frequency table. When the data are arranged
in tabular form by the frequencies, the table is called frequency table. The arrangement itself
is called the frequency distribution.
It would be difficult to determine by scanning the mass of numerical data unless they are
organized into frequency distribution where drawing generalizations will be readily drawn. The
construction of frequency distribution consists essentially of three steps:
1. Deciding on a set of grouping called classes;
2. Sorting or tallying the data into classes;
3. Counting the number of tallies in each class called frequencies.
Rules in the Construction of Frequency Distribution

1. We seldom use fewer than 5 or more than 15 classes. Note that it is impractical to
group a thousand measurements into 4 classes or group 10 observations to 8 classes.
2. Whenever possible we make the classes cover equal ranges of values and make ranges
multiple of numbers that are easy to work with. Open classes should be avoided such
as classes if “less than” or “more than”
3. Make sure that each item goes into one class only.
4. In presenting the Frequency table, tally is usually omitted.
Suggested formula of class interval by Freud and Simon:
𝐻𝑖𝑔ℎ𝑒𝑠𝑡 𝑂𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑉𝑎𝑙𝑢𝑒 − 𝐿𝑜𝑤𝑒𝑠𝑡 𝑂𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑉𝑎𝑙𝑢𝑒

𝑆𝑢𝑔𝑔𝑒𝑠𝑡𝑒𝑑 𝐶𝑙𝑎𝑠𝑠 𝐼𝑛𝑡𝑒𝑟𝑣𝑎𝑙 (𝑐. 𝑖) =
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝐶𝑙𝑎𝑠𝑠𝑒𝑠
For deciding the number of classes, the formula in Class interval will be:
𝐻𝑖𝑔ℎ𝑒𝑠𝑡 𝑂𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑉𝑎𝑙𝑢𝑒 − 𝐿𝑜𝑤𝑒𝑠𝑡 𝑂𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑉𝑎𝑙𝑢𝑒
𝑆𝑢𝑔𝑔𝑒𝑠𝑡𝑒𝑑 𝐶𝑙𝑎𝑠𝑠 𝐼𝑛𝑡𝑒𝑟𝑣𝑎𝑙 (𝑐. 𝑖) =
1 + 3.322 log 𝑁
Where, N is number of observations.
Applying to the example data above:
80−18 62 62
𝑐. 𝑖. = 1+3.322 log 40 = 1+3.322(1.60206) = 6.322 = 𝟗. 𝟖 𝒐𝒓 𝟏𝟎
Since we already have the class interval, we can now construct the Classes of our
Frequency Distribution. Each Class has two limits – a lower stated class limit and upper
stated class limit. A common practice is to let the lower limit of the of the first class be a
number below the lowest observation (must be divisible by the class interval) and to make
all the classes in equal lengths or class size.
Class Tally Frequency Relative Percentage
Frequency
10 – 19 I 1 0.025 2.5%
20 – 29 I 1 0.025 2.5%
30 – 39 III 3 0.075 7.5%
40 – 49 IIII 4 0.100 10.0%
50 – 59 IIIII IIIII III 13 0.325 32.5%
60 – 69 IIIII IIIII III 13 0.325 32.5%
70 – 79 IIII 4 0.100 10.0%
80 – 89 I 1 0.025 2.5%
Sum 40 1.000 100.0%
𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑡ℎ𝑒 𝑐𝑙𝑎𝑠𝑠

To compute the relative frequency, 𝑟𝑓 =
𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
True Limits , Class Marks and Cumulative Frequency
A point that represent the halfway point between successive classes is called class
boundary. It is obtained by adding the upper limit of one class and the lower limit of the next
class, and divide by 2. A Class Mark is the midpoint of the class. To obtain it, just add the lower
limit and upper limit of the same class and then divide it by 2. While in cumulative frequency, in
<cf (less than) in starting in the lower class the first entry will be the frequency of the class, then
add it to the frequency of the next class up to the highest class.
Classes Stated Lower Upper Class

f Lower Upper Boundary Boundary Mark <cf >cf
Limit Limit
10 – 19 1 10 19 9.5 19.5 14.5 1 40
20 – 29 1 20 29 19.5 29.5 24.5 2 39
30 – 39 3 30 39 29.5 39.5 34.5 5 38
40 – 49 4 40 49 39.5 49.5 44.5 9 35
50 – 59 13 50 59 49.5 59.5 54.5 22 31
60 – 69 13 60 69 59.5 69.5 64.5 35 18
70 – 79 4 70 79 69.5 79.5 74.5 39 5
80 – 89 1 80 89 79.5 89.5 84.5 40 1
N 40

Unit 2 Stat PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit 2 Stat PDF

Uploaded by

Copyright:

Available Formats

Unit 2.

DATA COLLECTION AND PRESENTATION

At the end of this Unit, the students should be able to:

Lesson 1. Data Collection

1. Simple Random Sampling (sometimes it is called as fishbowl sampling)

5. Multi-Stage Random sampling uses a combination of techniques.

Non – Probability sampling

2. Haphazard Sampling: where a researcher chooses items haphazardly, trying to

3. Purposive Sampling: where the researcher chooses a sample based on their

5. Heterogeneity Sampling / Diversity Sampling: a type of sampling where you

Lesson 2. Data Presentation

Stem and Leaf Diagram

I. Setting up a array from highest to lowest.

II. An Array from smallest to highest.

III. Setting up a stem and leaf diagram.

Rules in the Construction of Frequency Distribution

Suggested formula of class interval by Freud and Simon:

𝐻𝑖𝑔ℎ𝑒𝑠𝑡 𝑂𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑉𝑎𝑙𝑢𝑒 − 𝐿𝑜𝑤𝑒𝑠𝑡 𝑂𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑉𝑎𝑙𝑢𝑒

Applying to the example data above:

𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑡ℎ𝑒 𝑐𝑙𝑎𝑠𝑠

True Limits , Class Marks and Cumulative Frequency

Classes Stated Lower Upper Class

You might also like