You are on page 1of 34

PRESENTING AND

INTERPRETING RESEARCH
DATA

Kim Charies L. Okit


Research teacher
PRESENTING AND INTERPRETING
RESEARCH DATA

Editing data
Coding data
Measures of central tendency
Ungrouped data
Grouped data
Measures of variability
Tabulating data
Graphing data
Correlation
EDITING DATA
• Editing of data is a process of
examining and correcting the
collected raw data to detect errors
and omissions. Editing involves a careful
scrutiny of the completed
questionnaires and/or schedules. It is
done to assure that the data are
accurate, consistent with other facts
gathered, uniformly entered, complete
and well arranged to facilitate coding
and tabulation.
FIELD EDITING
• The review of the reporting forms by the investigator for
completing (translating or rewriting) what the latter has
written in abbreviated and/or in illegible form at the time of
recording the respondents’ responses. This type of editing is
necessary in view of the fact that individual writing styles
often can be difficult for others to decipher.
• Should be done as soon as possible after the interview. The
investigator must restrain himself and must not correct errors
of omission by simply guessing what the informant would
have said if the question had been asked.
CENTRAL EDITING
• Should take place when all forms or schedules have been
completed and returned to the office.
• It implies that all forms should get a thorough editing by a single
editor in a small study and by a team of editors in case of a large
inquiry.
• Editor(s) may correct the obvious errors such as an entry in the
wrong place, entry recorded in months when it should have
been recorded in weeks, and the like.
• In case of inappropriate or missing replies, the editor can
sometimes determine the proper answer by reviewing the other
information in the schedule. At times, the respondent can be
contacted for clarification.
THINGS TO REMEMBER IN EDITING DATA
• Editors should be familiar with instructions given to the interviewers and
coders as well as with the editing instructions supplied to them for the
purpose.
• While crossing out an original entry for one reason or another, they
should just draw a single line on it so that the same may remain
legible.
• They must make entries (if any) on the form in some distinctive color
and that too in a standardized form.
• They should initial all answers which they change or supply.
• Editor’s initials and the date of editing should be placed on each
completed form or schedule.
CODING DATA
• Coding refers to the process of assigning numerals or other symbols to
answers so that responses can be put into a limited number of
categories or classes.
• Classes should be appropriate to the research problem
• Must possess the characteristic of exhaustiveness (i.e., there must be a
class for every data item)
• Mutual exclusively (specific answer can be placed in one and only
one cell in a given category set
• Unidimensionality (every class is defined in terms of only one concept).
MEASURES OF CENTRAL TENDENCY –
UNGROUPED DATA
• Tell us the point about which items have a tendency to cluster. Such a
measure is considered as the most representative figure for the entire mass of
data.
• Mean – also known as arithmetic average is the most common measure of
central tendency.
(𝑥1 +𝑥2 + . . . + 𝑥𝑖 ) σ 𝑥𝑖 σ 𝑥𝑖
Sample mean 𝑥ҧ = = Population mean μ=
𝑛 𝑛 𝑁
where 𝑥ҧ = The symbol we use for mean (pronounced as X bar)
Σ = Symbol for summation
𝑥𝑖 = Value of the ith item X, i = 1, 2, …, n
n = total number of items for sample, N = total population
MEDIAN
• Is the value of the middle item of series when it is arranged in ascending or
descending order of magnitude.
• a positional average
• Used in qualitative or nominal type of data.
• It is not affected by the value of the extreme items

 If N is odd, example: 2, 5, 8, 9, 12, 14, 17 (N=7)

 If N is even, example: 6, 7, 8, 9, 10. 10, 11, 13(N=8)


MODE
• The most commonly or frequently occurring value in a series.
• It is the item with maximum concentration or frequency.
• Like median, mode is a positional average and is not affected by the values
of extreme items. It is useful in all situations where we want to eliminate the
effect of extreme values or outliers.
Unimodal – only 1 mode
Bimodal – 2 modes
Multimodal – more than 2 modes

• Given the values: 13, 11, 14, 15, 12, 14, 18, 12, 14, 10 Mode is 14
• Given the values: 32, 37, 42, 41, 56, 59, 83, 65, 96 and 71 there is no mode
• Given the values: 56, 62, 47, 83, 50, 65, 75, 50, 82 and 75 bimodal: 50, 75
MEASURES OF CENTRAL TENDENCY –
GROUPED DATA
• Grouped data mean

(𝑓1 𝑥1 + 𝑓2 𝑥2 + . . +𝑓𝑛 𝑥𝑛 ) σ(𝑓𝑖 𝑥𝑖 )


𝑥ҧ = +⋯=
(𝑓1 + 𝑓2 + . . . + 𝑓𝑛 ) σ 𝑓 𝑜𝑟 𝑁

• Where 𝑥ҧ = mean
f = frequency of each class
x = class midpoint or class mark
• Grouped data Median

Where L = Lower limit of the Median class.

Median class is that class whose cumulative


frequency is equal to or greater than the value of
N/2
N/2 = Half of the total number of scores
F = Cumulative frequency below the Median
Class
fm = Frequency of the Median Class
i = Size of the class interval.
L = 29.5
Because the N/2 i.e. 20 is included
in the cumulative frequency of the
class interval 30-34, and the exact
limits of the c.i. = 29.5.
F = 19. The cumulative frequency
below the median class.
fm = 8. The exact frequency of the
median class.
i = 5. Size of the class interval.
• Grouped data mode
MEASURES OF
VARIABILITY/DISPERSION
• Dispersion indicates the extent to which observations deviates from an
appropriate measure of central tendency.
• Range is the simplest possible measure of dispersion and is defined as the
difference between the values of the extreme items of a series.
Range = Highest value – lowest value
• Mean Deviation is the average of difference of the values of items from
some average of the series.
|𝑥𝑖 − 𝑥|ҧ
𝑀𝐷 = σ𝑛𝑖=1 ( )
𝑛
Where MD = mean deviation
𝑥𝑖 = individual score
𝑥ҧ = mean of the scores
n = total number of scores
• Variance is the square of the absolute deviations of the scores from the mean.
• The average of the squared differences from the Mean.
• It finds out the average degree to which each observation varies from the mean.
When the variance of a data set is small, it shows the closeness of the data points
to the mean whereas a greater value of variance represents that the observations
are very dispersed around the arithmetic mean and from each other.
(|𝑥𝑖 − 𝜇|) 2
Population Variance (𝜎 2 ) = σ𝑁
𝑖=1 𝑁
Where μ = mean of the population
𝑥𝑖 = scores in the population
N = size of the population or total number of scores in the population

ഥ 2
(|𝑥𝑖 − 𝑥|)
Sample Variance (𝑠 2 ) = 𝑁
σ𝑖=1
𝑛−1
Where 𝑥ҧ = mean of the sample
𝑥𝑖 = scores in the sample
n = sample size or total number of scores in the sample
• Standard Deviation is the square-root of the variance or average of
squares of deviations, when such deviations for the values of individual
items in a series are obtained from the arithmetic average.
• Standard deviation is a measure that quantifies the amount of
dispersion of the observations in a dataset. The low standard deviation
is an indicator of the closeness of the scores to the arithmetic mean
and a high standard deviation represents; the scores are dispersed
over a higher range of values.

(|𝑥𝑖 − 𝜇|) 2
Population SD (σ) = 𝜎2 = σ𝑁
𝑖=1 𝑁
ഥ 2
(|𝑥𝑖 − 𝑥|)
Sample SD (s) = 𝑠2 = 𝑛
σ𝑖=1
𝑛−1
𝑿𝒊 𝑿𝒊 - 𝑿
ഥ (𝑿𝒊 − 𝑿
ഥ )𝟐
56
𝑋ത = =7 12 12-7 25
8
10 10-7 9
𝑁 2 9 9-7 4
2
(|𝑥𝑖 − 𝜇|) 82
𝜎 =෍ = = 10.25 8 8-7 1
𝑁 8
𝑖=1 7 7-7 0
4 4-7 9
𝑛
(|𝑥 − 𝑥|)
ҧ 2 82 4 4-7 9
2 𝑖
𝑠 =෍ = = 11.7 2 2-7 25
𝑛−1 8−1
𝑖=1
෍ 𝑥𝑖 = 56 ෍(|𝑿𝒊 − 𝑿
ഥ | )𝟐 = 82

(|𝑥𝑖 − 𝜇|) 2
σ= 𝜎2 = σ𝑁
𝑖=1 = 10.25 = 3.2
𝑁

ҧ 2
(|𝑥𝑖 − 𝑥|)
s= 𝑠2 = σ𝑛𝑖=1 = 11.7 = 3.4
𝑛−1
TABULATING DATA
• To arrange a mass of data into a concise and logical order.
• Benefits of tabulation:
1. It conserves space and reduces explanatory and descriptive statement to a
minimum.
2. It facilitates the process of comparison.
3. It facilitates the summation of items and the detection of errors and
omissions.
4. It provides a basis for various statistical computations.
• One-way Table - supply answers to questions about one characteristic of data only.
• Two-way Table - give information about two inter-related characteristics of data.
• Three-way tables - giving information about three interrelated characteristics of
data
• Higher order table/manifold table - supply information about several interrelated
characteristics of data (cross tabulation)
PARTS OF A TABLE
PRINCIPLES OF TABULATION

• Every table should have a clear, concise and adequate title placed above the
body of the table.
• Every table should be given a distinct table number to facilitate easy reference.
• The column headings (captions) and the row headings (stubs) of the table should
be clear and brief.
• The units of measurement under each heading or sub-heading must always be
indicated.
• Footnotes should be placed directly beneath the table, along with the reference
symbols used in the table.
• Source(s) of data in the table must be indicated just below the table.
• Columns are separated by lines which make the table more readable and
attractive. Lines are always drawn at the top and bottom of the table and below
the captions.
• There should be thin lines separating data under one class and thick lines
separating data under different class.
• The columns may be numbered to facilitate reference.
• Columns whose data are to be compared should be kept side by side.
Similarly, percentages and/or averages must also be kept close to the data.
• It is better to approximate figures before tabulation to reduce unnecessary
details in the table itself.
• Use different kinds of type, spacing and indentations to emphasize the
relative significance of certain categories.
• All column figures, decimal points and (+) or (–) signs should be properly
aligned.
• As much as possible, avoid abbreviations and ditto marks should not be
used in the table.
• Miscellaneous and exceptional items should be placed in the last row of the
table.
• Table should be made as logical, clear, accurate and simple as possible. If
the data happen to be very large, they should not be crowded in a single
table as it make the table unwieldy and inconvenient.
• Total of rows should normally be placed in the extreme right column and
total of columns should be placed at the bottom.
• The arrangement of the categories in a table may be chronological,
geographical, alphabetical or according to magnitude to facilitate
comparison. Above all, the table must suit the needs and requirements of an
investigation.
GRAPHICAL PRESENTATION OF DATA
• Graphs are used to explore data and assess relationships between the
variables. It is also used to summarize data and to help interpret statistical
results.
• Graphs are often an excellent way to display your results. In fact, most good
science fair projects have at least one graph.
• For any type of graph:
 Generally, you should place your independent variable on the x-axis of your
graph and the dependent variable on the y-axis.
 Be sure to label the axes of your graph— don't forget to include the units of
measurement (grams, centimeters, liters, etc.).
 If you have more than one set of data, show each series in a different color or
symbol and include a legend with clear labels.
COMMON TYPES OF GRAPH
Different types of graphs are appropriate for different experiments. These are
just a few of the possible types of graphs: Clustered Column
5
4
3

A bar graph might be appropriate for 2


1
comparing different trials or different 0
Row1 Row2 Row3
experimental groups. It also may be a Column1 Column2 Column3
good choice if your independent
Stacked Column
variable is not numerical. (In Microsoft 15
Excel, generate bar graphs by 10
choosing chart types "Column" or 5
"Bar.") 0
Row1 Row2 Row3 Row4
Column1 Column2 Column3
• Pie chart can be taken as a circular graph which is divided into different disjoint
pieces, each displaying the size of some related information. The highlight of this
graph is that it represents a whole and each part represents a percentage of the
whole. Hence, pie charts are best used with respect to categorical data which
helps one understand what percentage each of these category constitutes. It
also has a good visual treat and the percentage value of each section is instantly
known.
• A time-series plot can be used if your dependent variable is numerical and
your independent variable is time. (In Microsoft Excel, the "line graph" chart
type generates a time series. By default, Excel simply puts a count on the x-
axis. To generate a time series plot with your choice of x-axis units, make a
separate data column that contains those units next to your dependent
variable. Then choose the "XY (scatter)" chart type, with a sub-type that
draws a line.)
• A scatter plot might be the proper graph if
you're trying to show how two variables may be
related to one another. (In Microsoft Excel,
choose the "XY (scatter)" chart type, and then
choose a sub-type that does not draw a line.)

• Histogram is used for quantitative data. This is


a kind of graph that also uses bars. Ranges of
values are listed at the bottom and these are
called ‘classes.’ Taller bars represent the
classes with greater frequencies.
ASSIGNMENT
• Make a table and a graph (bar or line) to show the results of the experiment
conducted.

• A researcher conducted an experiment to test the effect of varying the amount of


the fertilizer in a pot to the growth of wheat for 2 weeks. He recorded the results of
the experiment on his journal as follows:
10 g fertilizer : Week1: 10 cm (trial1), 12 cm(trial2), 11 cm (trial3)
Week2: 19 cm (trial1), 19 cm(trial2), 17 cm(trial3)
20 g fertilizer: Week1: 12 cm (trial1), 14 cm(trial2), 14 cm (trial3)
Week2: 20 cm (trial1), 21 cm(trial2), 22 cm(trial3)
30 g fertilizer: Week1: 8 cm (trial1), 7 cm(trial2), 8 cm (trial3)
Week2: 17 cm (trial1), 19 cm(trial2), 16 cm(trial3)
PEARSON PRODUCT MOMENT
CORRELATION

You might also like