You are on page 1of 15


Data are a set of facts, and provide a partial picture of reality. Whether data are being
collected with a certain purpose or collected data are being utilized, questions regarding what
information the data are conveying, how the data can be used, and what must be done to
include more useful information must constantly be kept in mind.

Since most data are available to researchers in a raw format, they must be summarized,
organized, and analyzed to usefully derive information from them. Furthermore, each data set
needs to be presented in a certain way depending on what it is used for. Planning how the
data will be presented is essential before appropriately processing raw data.

First, a question for which an answer is desired must be clearly defined. The more detailed
the question is, the more detailed and clearer the results are. A broad question results in vague
answers and results that are hard to interpret. In other words, a well-defined question is
crucial for the data to be well-understood later. Once a detailed question is ready, the raw
data must be prepared before processing. These days, data are often summarized, organized,
and analyzed with statistical packages or graphics software. Data must be prepared in such a
way they are properly recognized by the program being used. The present study does not
discuss this data preparation process, which involves creating a data frame, creating/changing
rows and columns, changing the level of a factor, categorical variable, coding, dummy
variables, variable transformation, data transformation, missing value, outlier treatment, and
noise removal.

We describe the roles and appropriate use of text, tables, and graphs (graphs, plots, or charts),
all of which are commonly used in reports, articles, posters, and presentations. Furthermore,
we discuss the issues that must be addressed when presenting various kinds of information,
and effective methods of presenting data, which are the end products of research, and of
emphasizing specific information.


Data collection is the process of gathering and measuring information on variables of interest, in an

established systematic fashion that enables one to answer stated research questions, test

hypotheses, and evaluate outcomes. The data collection component of research is common to all

fields of study including physical and social sciences, humanities, business, etc. While methods vary

by discipline, the emphasis on ensuring accurate and honest collection remains the same. The goal

for all data collection is to capture quality evidence that then translates to rich data analysis and
allows the building of a convincing and credible answer to questions that have been posed.

of the field of study or preference for defining data (quantitative, qualitative), accurate data

collection is essential to maintaining the integrity of research. Both the selection of appropriate

data collection instruments (existing, modified, or newly developed) and clearly delineated

instructions for their correct use reduce the likelihood of errors occurring.

Data collection is one of the most important stages in conducting a research. You can have the best

research design in the world but if you cannot collect the required data you will be not be able to

complete your project. Data collection is a very demanding job which needs thorough planning, hard

work, patience, perseverance and more to be able to complete the task successfully. Data collection

starts with determining what kind of data required followed by the selection of a sample from a

certain population. After that, you need to use a certain instrument to collect the data from the

selected sample.


In primary data collection, you collect the data yourself using qualitative and quantitative methods.

The key point here is that the data you collect is unique to you and your research and, until you

publish, no one else has access to it. There are many methods of collecting primary data.

The main methods include –

 Questionnaires

 Interviews

 Focus Group Interviews

 Observation

 Survey

 Case-studies

 Diaries

 Activity Sampling Technique

 Memo Motion Study

 Process Analysis

 Link Analysis

 Time and Motion Study

 Experimental Method

 Statistical Method etc


Secondary data is the data that is collected from the primary sources which can be used in
current research study. Collecting secondary data often takes considerably less time than
primary data where you would have to gather every information from scratch. It is thus
possible to
gather more data this way.
Secondary data can be obtained from two different research strands –
 Quantitative: Census, housing, social security as well as electoral statistics and other
 Qualitative: Semi-structured and structured interviews, focus groups transcripts, field
observation records and other personal, research-related documents.
Secondary data is often readily available. After the expense of electronic media and internet
availability of secondary data has become much easier.
Published Printed Sources: There are varieties of published printed sources. Their credibility
depends on many factors. For example, on the writer, publishing company and time and
date when
published. New sources are preferred and old sources should be avoided as new technology
researches bring new facts into light.
Books: Books are available today on any topic that you want to research. The use of books
before even you have selected the topic. After selection of topics books provide insight on
much work has already been done on the same topic and you can prepare your literature
Books are secondary source but most authentic one in secondary sources.
Journals/periodicals: Journals and periodicals are becoming more important as far as data
is concerned. The reason is that journals provide up-to-date information which at times
cannot and secondly, journals can give information on the very specific topic on which you
researching rather talking about more general topics.
Magazines/Newspapers: Magazines are also effective but not very reliable. Newspapers on
the other
hand are more reliable and in some cases the information can only be obtained from
newspapers as in
the case of some political studies. Chapter - 9 Methods of Data Collection Page 274
Basic Guidelines for Research SMS Kabir
Published Electronic Sources: As internet is becoming more advance, fast and reachable to
masses; it has been seen that much information that is not available in printed form is
available on
internet. In the past the credibility of internet was questionable but today it is not. The
reason is
that in the past journals and books were seldom published on internet but today almost
every journal
and book is available online. Some are free and for others you have to pay the price.
e-journals: e-journals are more commonly available than printed journals. Latest journals are
difficult to retrieve without subscription but if your university has an e-library you can view
journal, print it and those that are not available you can make an order for them.
General Websites: Generally websites do not contain very reliable information so their
should be checked for the reliability before quoting from them.
Weblogs: Weblogs are also becoming common. They are actually diaries written by different
These diaries are as reliable to use as personal written diaries.
Unpublished Personal Records: Some unpublished data may also be useful in some cases.
Diaries: Diaries are personal records and are rarely available but if you are conducting a
research then they might be very useful. The Anne Frank’s diary is the most famous example
of this.
That diary contained the most accurate records of Nazi wars.
Letters: Letters like diaries are also a rich source but should be checked for their reliability
using them.
Government Records: Government records are very important for marketing, management,
humanities and social science research.
Census Data/population statistics: Health records; Educational institutes’ records etc.
Public Sector Records: NGOs’ survey data; Other private companies records.

a. Tabel distribusi frekuensi Relative
b. Tabel distribusi frekuensi kumulatif


a. Histogram and polygon frekuensi

A histogram is similar in appearance and construction to a bar graph, but it is
used for quantitative variables rather than qualitative variables. It is constructed by erecting
vertical bars over the real limits of each class interval,
with the height of each bar corresponding to the number of scores in the
interval. The bars of adjacent class intervals should touch, leaving no space
between the bars; this emphasizes the continuous, quantitative character of the
class intervals.
Except for these differences, histograms and bar graphs are constructed in the same
manner: (1) The class intervals are represented along the horizontal axis, and frequency is
represented along the vertical axis; (2) the zero point or origin of each axis
is located at the X and Y intercept; (3) the height of the graph is 66% to 75% of its
width; and (4) the two axes are labeled appropriately, and a figure caption is given
to help the reader interpret the graph.

Frequency Polygon
To construct a frequency polygon from a frequency distribution, you begin as
though you were making a histogram. The horizontal axis is marked off into class
intervals, and the vertical axis is marked off into numbers representing frequencies.
However, the frequency of a class interval is not represented by a vertical bar but by
a dot placed at the proper height over the midpoint of the class interval. The
midpoint of a class interval is given by.

For example, the midpoint of the class interval 30–32 is (32 30)/2 31. Finally,
adjacent dots are joined by straight lines. At each end of the graph, two additional
class intervals containing no scores are identified and lines are dropped to their midpoints so as to
anchor the graph to the horizontal axis. A frequency polygon for the
data in Table 2.2-3 is shown in Figure 2.5-2. Frequency polygons and histograms impart the same
information; the choice between them is largely a matter of personal
preference. The histogram is probably a little easier for the general public to interpret, but the
stepwise bars tend to obscure the shape of the distribution. The
frequency polygon is preferred when two or more sets of data are represented in the
same graph because superimposed histograms often overlap and obscure one another.

Cumulative Polygon
Section 2.2 showed that a cumulative frequency distribution could be used to show
the number, proportion, or percentage of scores that lie below the real upper limit
of each class interval. This same information can be presented graphically by a
cumulative polygon. Instead of placing dots over the midpoints of class intervals,

you place them over the real upper limits. The vertical axis can represent Cum f,
Cum Prop f, or Cum % f. A cumulative percentage frequency polygon for the data in
Table 2.2-6 is shown in Figure 2.5-3. As is usually the case in the behavioral sciences
and education, the cumulative polygon has the characteristic S shape. The S shape occurs
whenever there are more scores in the middle of the frequency distribution than at
the extremes. Graphs that are S shaped are called ogives (pronounced “oh jives”).

Bar Graph
Once a frequency distribution has been made, most of the work of constructing a
bar graph has been done. The only step remaining is to represent the data in a twodimensional
figure, as illustrated in Figure 2.4-1 for the data in Table. 2.2-7. Class
intervals are represented along the horizontal axis (abscissa, or X axis), and frequencies are
represented along the vertical axis (ordinate, or Y axis). The zero point
or origin of the vertical axis is located at the X and Y intercept—the point where the
two axes cross. A vertical bar is erected over each class interval such that its height
corresponds to the number of scores in the interval. The bars can be any width, but
they should not touch. A space between the bars emphasizes the discrete, qualitative
character of the class intervals. By convention, the height of the graph should be
66% to 75% of its width. This results in a rectangular figure whose proportions
according to the ancient Greeks are the most aesthetically pleasing. Also, the X andY axes of the
graph should be labeled and a figure caption provided to help the
reader interpret the graph.
The Y axis also can be used to represent proportionate frequency or percentage
frequency, depending on the questions of interest to the researcher. You saw in
Section 2.2 that these transformations are useful in determining whether a frequency
is large in a relative rather than an absolute sense and in comparing frequency distributions with
different total numbers of scores.

Pie Chart
Perhaps the most easily interpreted graph is a pie chart, which is merely a
circle divided into sectors representing the proportionate frequency or percentage frequency of the
class intervals.
A pie chart is illustrated in Figure 2.4-2 for the data in Table 2.2-7. To construct a pie
chart, think of the pie chart as a circle that has 60 minutes like the face of a clock. To
determine the size of a pie sector corresponding to one of the class intervals, convert
its Prop f or % f into minutes. This is accomplished using the following formulas:
Prop f 60 or (% f/100) 60
For Figure 2.4-2, the minutes corresponding to the four percentage frequencies are
as follows:
Democrat (42%/100) 60 25.2 min
Independent (15%/100) 60 9.0 min
Republican (38%/100) 60 22.8 min
Unspecified or other (5%/100) 60 3.0 min
Thus, 42% corresponds to 25.2 minutes after 12 o’clock; the next 15% corresponds
to 25.2 9.0 34.2 minutes after 12 o’clock; the next 38% corresponds to 25.2 9.0 22.8 57
minutes; and the final 5% corresponds to 25.2 9.0
22.8 3.0 60 minutes or 12 o’clock. By visualizing the face of a clock, you can
mark off the four pie sectors on the pie chart. The last steps in constructing the pie
chart are to label the sectors and provide an appropriate figure caption.

Stem-and-Leaf Display
Another useful graphic procedure is the stem-and-leaf display.6 It resembles a
histogram that has been turned on its side. A stem-and-leaf display is illustrated in
Table 2.5-1 for the data in Table 2.2-1. The first step in constructing the display is
to specify class intervals following the procedures in Section 2.2. The class intervals become the
stems of the display. A score is represented by its class interval,
the stem, and by its trailing digit, the leaf. For example, the score 30 in Table 2.2-1
falls in the class interval 30–32, and its trailing digit is 0. This score of 30 is represented in Table
2.5-1 by the leaf 0 on the stem 30–32. The appearance of the display can be improved by ordering
the leaves on a stem from the smallest to the
largest. It is customary to put the smallest class interval at the top of the display
and the largest class interval at the bottom and to place a vertical line between the
leaves and stems, as shown in Table 2.5-1. If these conventions are followed and
the display is rotated 90° counterclockwise, the display looks like a histogram in
which the vertical bars have been replaced by columns of numbers.
An important advantage of a stem-and-leaf display over a histogram is that the
stem-and-leaf display provides all of the information that is contained in a histogram
and preserves the value of the individual scores. For example, in Table 2.5-1, you know the value
of the four scores in the class interval 54–56. They are 54, 54, 55,
and 56. If desired, the stem-and-leaf display can be supplemented with a frequency
distribution, as in column 3 of Table 2.5-1. Also, two sets of data can be presented
in the same table by placing one set on the left side of the stems and the other set on
the right side, as in Table 2.5-2. This back-to-back stem-and-leaf display makes it
easy to compare the two distributions. A stem-and-leaf display can be simplified by using only the
first or leading
digit(s) of a stem (class interval). For example, the class interval 10–19 can be represented by the
stem 1, the class interval 20–29 by the stem 2, the class interval
150–159 by the stem 15, and so on. Most statistical packages use this abbreviated
representation of stems.
Bar graph and histogram

A bar graph is used to indicate and compare values in a discrete category or group,
and the frequency or other measurement parameters (i.e. mean). Depending on the
number of categories, and the size or complexity of each category, bars may be
created vertically or horizontally. The height (or length) of a bar represents the amount
of information in a category. Bar graphs are flexible, and can be used in a grouped or
subdivided bar format in cases of two or more data sets in each category. Fig. 3 is a
representative example of a vertical bar graph, with the x-axis representing the length
of recovery room stay and drug-treated group, and the y-axis representing the visual
analog scale (VAS) score. The mean and standard deviation of the VAS scores are
expressed as whiskers on the bars (Fig. 3) [7].

Open in a separate window

Fig. 3
Multiple bar graph with whiskers. Pain scores in the recovery room. *P < 0.05 compared with
the control group. The nefopam group showed significant lower visual analogue scale (VAS)
score at 0, 5, 15, 30, 45 and 60 minutes on postanesthesia care unit compared with the control
group (Adapted from Korean J Anesthesiol 2016; 69: 480-6. Fig. 2).

By comparing the endpoints of bars, one can identify the largest and the smallest
categories, and understand gradual differences between each category. It is advised to
start the x- and y-axes from 0. Illustration of comparison results in the x- and y-axes
that do not start from 0 can deceive readers' eyes and lead to overrepresentation of the

One form of vertical bar graph is the stacked vertical bar graph. A stack vertical bar
graph is used to compare the sum of each category, and analyze parts of a category.
While stacked vertical bar graphs are excellent from the aspect of visualization, they
do not have a reference line, making comparison of parts of various categories
challenging (Fig. 4) [8].

Fig. 4
Stacked bar graph. Compressed volume of each component from the three operations. We
checked the compressed volume of each component from the three operations; pelviscopy
(with radical vaginal hysterectomy), laparoscopic anterior resection of the colon, and TKRA.
TKRA: total knee replacement arthroplasty, RMW: regulated medical waste (Adapted from
Korean J Anesthesiol 2017; 70: 100-4).

Pie chart

A pie chart, which is used to represent nominal data (in other words, data classified in
different categories), visually represents a distribution of categories. It is generally the
most appropriate format for representing information grouped into a small number of
categories. It is also used for data that have no other way of being represented aside
from a table (i.e. frequency table). Fig. 5 illustrates the distribution of regular waste
from operation rooms by their weight [8]. A pie chart is also commonly used to
illustrate the number of votes each candidate won in an election.

Fig. 5

Pie chart. Total weight of each component from the three operations. RMW: regulated
medical waste (Adapted from Korean J Anesthesiol 2017; 70: 100-4).
1. Lee CW, Kim M. Effects of preanesthetic dexmedetomidine on hemodynamic
responses to endotracheal intubation in elderly patients undergoing treatment for
hypertension: a randomized, double-blinded trial. Korean J Anesthesiol. 2017;70:39–
45. [PMC free article] [PubMed] [Google Scholar]

2. Sohn HM, Ryu JH. Monitored anesthesia care in and outside the operating
room. Korean J Anesthesiol.2016;69:319–326. [PMC free article] [PubMed] [Google

3. Nahm FS. Nonparametric statistical tests for the continuous data: the basic concept
and the practical use.Korean J Anesthesiol. 2016;69:8–14. [PMC free
article] [PubMed] [Google Scholar]

4. Kim TK. Understanding one-way ANOVA using conceptual figures. Korean J

Anesthesiol. 2017;70:22–26.[PMC free article] [PubMed] [Google Scholar]

5. Jung W, Hwang M, Won YJ, Lim BG, Kong MH, Lee IO. Comparison of clinical
validation of acceleromyography and electromyography in children who were
administered rocuronium during general anesthesia: a prospective double-blinded
randomized study. Korean J Anesthesiol. 2016;69:21–26.[PMC free
article] [PubMed] [Google Scholar]

6. Cho SH, Ko SH, Lee MS, Koo BS, Lee JH, Kim SH, et al. Development of the
Geop-Pain questionnaire for multidisciplinary assessment of pain sensitivity. Korean J
Anesthesiol. 2016;69:492–505. [PMC free article][PubMed] [Google Scholar]

7. Choi SK, Yoon MH, Choi JI, Kim WM, Heo BH, Park KS, et al. Comparison of
effects of intraoperative nefopam and ketamine infusion on managing postoperative
pain after laparoscopic cholecystectomy administered remifentanil. Korean J
Anesthesiol. 2016;69:480–486. [PMC free article] [PubMed] [Google Scholar]
8. Shinn HK, Hwang Y, Kim BG, Yang C, Na W, Song JH, et al. Segregation for
reduction of regulated medical waste in the operating room: a case report. Korean J
Anesthesiol. 2017;70:100–104. [PMC free article][PubMed] [Google Scholar]

9. Satomoto M, Adachi YU, Makita K. A low dose of droperidol decreases the

desflurane concentration needed during breast cancer surgery: a randomized double-
blinded study. Korean J Anesthesiol. 2017;70:27–32.[PMC free
article] [PubMed] [Google Scholar]

10. Few S. Show Me the Numbers. 2nd ed. Burlingame: Analytics Press;

2012. [Google Scholar]

11. Huff D. How to Lie with Statistics. London: Penguin Books; 1991. pp. 1–

124. [Google Scholar]

12. Lee JH. Handling digital images for publication. Sci Ed. 2014;1:58–61. [Google


What is a Relative frequency distribution?

A relative frequency distribution is a type of frequency distribution. What is a “frequency
distribution”? This can be better explained by using a table and a graph. The first image here is a
frequency distribution table. A frequency distribution table shows how often something happens.
In this particular table, the counts are how many people use certain types of contraception.

A frequency distribution table.

With a relative frequency distribution, we don’t want to know the counts. We want to know
the percentages. In other words, what percentage of people used a particular form of
This relative frequency distribution table shows how people’s heights are distributed.

Note that in the right column, the frequencies (counts) have been turned into relative frequencies
(percents). How you do this:
1. Count the total number of items. In this chart the total is 40.
2. Divide the count (the frequency) by the total number. For example, 1/40 = .025 or 3/40 = .075.

This information can also be turned into a frequency distribution chart. This chart shows the
relative frequency distribution table and the frequency distribution chart for the information. How
to we know it’s a frequency chart and not a relative frequency chart? Look at the vertical axis: it lists
“frequency” and has the counts:


This next chart is a relative frequency histogram. You know it’s a relative frequency distribution
for two reasons:
1. It’s labeled as relative frequency (which all good charts should be!).
2. The vertical axis has percentages (as decimals, .1, .2, 3 …)instead of counts (1, 2, 3…).
Chart showing how book sales compare to each other as percentages of a whole.

How to Make a Relative Frequency Table

Making a relative frequency table is a two step process.
Step 1: Make a table with the category names and counts.

Step 2: Add a second column called “relative frequency”. I shortened it to rel. freq. here for space.

Step 3: Figure out your first relative frequency by dividing the count by the total. For the category
of dogs we have 16 out of 56, so 16/56=0.29.

Step 4: Complete the rest of the table by figuring out the remaining relative frequencies.
Cats = 28 / 56 = .5
Fish = 8 / 56 = .14
Other = 4 / 56 = 0.07
Don’t forget to put the total at the bottom of the rel. freq. column: .29 + .5 + .14 +.07 = 1.
Note: The usual way to complete the table is with decimals or percents. However, it’s still
technically correct to just leave the ratios (i.e. 28/56) in the column (that way you don’t have to do
the math!).

Cumulative Relative Frequency

To find the cumulative relative frequency, follow the steps above to create a relative frequency
distribution table. As a final step, add up the relative frequencies in another column. Here’s the
column to the right is labeled “cum. rel. freq.”)
The first entry in the column is the same as the first entry in the rel.freq column (.29).
Next, I added the first and second entries to get 0.29 + 0.50 = 0.79.
Next, I added the first, second and third entries to get 0.29 + 0.50 + 0.14 = 0.93.
Finally, I added the first, second, third and fourth entries to get 0.29 + 0.50 + 0.14 + 0.07 = 1.

You might also like