You are on page 1of 19

STATISTICAL ANALYSIS WITH SOFTWARE APPLICATION

DATA COLLECTION,
ORGANIZATION AND
PRESENTATION
with software application

LEARNING MODULE
Rina A. Abner, DBA, CPA,

This learning material has adopted various resources offline and online. Some discussions were lifted verbatim. The sources
are properly recognized in the references section. This material does not intend to infringe any copyrights, and is for
educational purposes alone.
2 AE9: Statistical Analysis with Software Application

College of Business and Management


Module 2
Data Collection, Organization and Presentation

Name of Student: ____________________________ Week Number: 4 and 5


Course Code: AE19 Name of Faculty: Rina A. Abner
Course Title: Statistical Analysis with Software
Application

I. OBJECTIVES

This learning material has the following objectives:


1. Differentiate primary data from secondary data.
2. Differentiate quantitative data from qualitative data.
3. Make use of online resources to collect both primary and secondary data.
4. Organize the collected data.
5. Make use of tables and graphs to present data.

II. LESSON

Data

Before we tackle the collection, organization, and presentation of data, we


have to know first the type of data that we need – primary or secondary, or
quantitative or qualitative?

Primary data are data that were gathered by the data collectors themselves,
for a specific purpose. Secondary data are those that were documented or gathered
by other sources, which are made available for other researchers. The Pulse Asia
typically conducts surveys on the opinions of people regarding current issues – that
is an example of gathering primary data. Another example could be an online survey
that was administered by the researcher himself. The data that was gathered from
the online survey is an example of a primary data. On the other hand, secondary
data are those data that were previously gathered by other sources that may be
used by researchers in their study. If you try to visit the website of the Philippines
Stock Exchange, you will see the uploaded data and information of all the publicly
listed companies (PLCs) in the country. You may use such data in your study, and
that is an example of secondary data. Another example is the data about the number
of farmers in a certain municipality that was gathered by the Local Government
Unit (LGU). If your study needs data about the number of farmers and you went to
the LGU to have access on the data, your data is called a secondary data. However,
if you yourself gathered the data about the farmers, that would be considered as
primary.

Both types of data have their advantages and disadvantages. Take a look at
the following table which was taken from the book of (Sarstedt & Mooi, 2019).

AE9: STATISTICAL ANALYSIS WITH SOFTWARE APPLICATION | Learning Module


3 AE9: Statistical Analysis with Software Application

Table 1. Advantages and disadvantages of primary and secondary data.


Primary Data Secondary Data
Advantages → Are recent → Tend to be cheaper
→ Are specific for the → Sample sizes tend to be
purpose greater
→ Are proprietary → Tend to have more
authority
→ Are usually quick to access
→ Are easier to compare to
other research using the
same data
→ Are sometimes more
accurate (e.g., data on
competitors)

Disadvantages → Are usually more → May be outdated


expensive → May not fully fit the
→ Take longer to collect problem
→ There may be hidden errors
in the data
→ difficult to assess the data
quality
→ Usually contain only
factual data – No control
over data collection – May
not be reported in the
required form (e.g.,
different units of
measurement, definitions,
aggregation levels of the
data)

Aside from determining whether the needed data is secondary or primary, it


is also important to know the difference between quantitative data and qualitative
data. Quantitative data are typically presented in values, while qualitative data
are not and can be in words, observations, stories, pictures, and even audio
(Sarstedt & Mooi, 2019). Qualitative data is deeper and richer when it comes to
insights. When interpreting qualitative data, a deeper understanding of
phenomenon is required. However, qualitative data may be coded and turned into
quantitative data. Coding qualitative data helps also in further analysis. For
instance, our data involves the perception of the government actions in this time of
pandemic. Responses can be coded as 0 for neutral, 1 for satisfied, and 2 for not
satisfied. Another example can be the profile of your respondents. Say, you collected
the demographic profile of the BS Accountnacy students at Partido State University,
which is composed of the gender (male and female), address (Goa, Tigaon, Lagonoy,
San Jose), and religious affiliation (Catholic, Iglesia ni Cristo, Born Again Christian,
Jehova’s Witnsesses).

AE9: STATISTICAL ANALYSIS WITH SOFTWARE APPLICATION | Learning Module


4 AE9: Statistical Analysis with Software Application

Look at the following table for the details:

Demographic Profile Code


Gender
Male 0
Female 1
Address
Goa 1
Tigaon 2
Lagonoy 3
San Jose 4
Religious Affiliation
Catholic 1
Iglesia ni Cristo 2
Born Again Christian 3
Jehova’s Witnesses 4

Also, look at the coded data in Appendix A. Now we import the data to Gretl,
and click first the variable that which frequency distribution you want to know.
Then, click “Frequency Distribution” under “Variable”. Just click the “Show data
only”, and untick the “show plot”, and click “Ok”. The following will pop up for
Gender, Address, and Religion:

AE9: STATISTICAL ANALYSIS WITH SOFTWARE APPLICATION | Learning Module


5 AE9: Statistical Analysis with Software Application

Note that if we have qualitative data, and we want to import such in any
statistical software, they have to be coded. You may decide what values you are
going to use to turn them into quantitative data. In that way, the software can read
the data easily.

AE9: STATISTICAL ANALYSIS WITH SOFTWARE APPLICATION | Learning Module


6 AE9: Statistical Analysis with Software Application

On one hand, there are qualitative data that the researcher does not want to
express in quantitative terms, instead, such will be analyzed as qualitative data.
That’s another field that this our course does not cover.

Data Collection

Now, that you already know the types of data that you may gather, let’s talk
about the data collection. Recall that data are collected on a sample which is a
representative of the whole population. There are ways and techniques on how to
get the sample, which will be covered in details in Week 7 Module under the topic
Sampling and Sampling Distributions. Following are some of the data collection
strategies that a researcher may use, depending on the research design:

1. Survey. When conducting a survey, questions are normally asked or the


participants/respondents are given questionnaires. Surveys may be
conducted through mail, e-mail, telephone, web or online, or in person.
For example, we want to know the best practices of the CPALE topnotchers
in the Philippines. We may send them (sample) questionnaires via e-mail
(it would be very costly if we do it in person). Typically, during surveys, the
participants are asked the same questions.

2. In-depth interview. Here, questions are asked by the data collector or


researcher to gather deeper insights from the participants. It can be a
structured interview (there is a script and questions are read verbatim),
semi-structured (key questions are written, and the researcher asks some
follow-up questions depending on the answer of the respondent),
unstructured (questions are not written, conduct is free-flowing, there are
topics or themes), or non-directive (questions are free-flowing and ther eis
no guide).

3. Experiment. When conduction an experiment, data are obtained in


“controlled” settings (Heumann & Shalabh, 2016). The one conducting the
study has control over one or some of the variables included in the
experiment. Assume that a researcher would want to know whether those
BSA students who do not play Mobile Legends (ML) have better academic
performance than those who don’t. The researcher established two groups
of students who have never played ML – one group were asked to play ML
for few months until midterm examination, and the other group remained
to not play the game. Then, their midterm grades were compared if there
is a significant difference. This is one example of an experiment.

4. Observational Data. Here, the researcher does not design and administer
any questionnaire or conduct an experiment. Instead, data are collected
through observation. Suppose that you want to study about the practices
of a certain tribe in Partido, Camarines Sur. You may stay in their
community for a certain period and observe what they do and how they
conduct their activities and culture. You may choose to participate with
them in their activities (participant observer) or just sit under a tree and
observe and take notes of what’s happening (non-participant observer).
Observational data may also be gathered online. For example, there is a

AE9: STATISTICAL ANALYSIS WITH SOFTWARE APPLICATION | Learning Module


7 AE9: Statistical Analysis with Software Application

Facebook group of Duterte Supporters, and you want to analyze deeper


insights regarding their continued support to the President. You may join
their Facebook group and be a member for a certain span of time. You may
also participate in their online forums or discussions (or bashing,
whatever), or you may just be there and do nothing but observe.

5. Focus Group Discussion (FGD). An FGD is an interview conducted with


a group of people (normally, 6 to 10), which is led by a moderator. The
participants are allowed to interact during the discussion. It can be
structured or semi-structured, and usually takes 30 to 120 minutes,
depending on the type of participants (Sarstedt & Mooi, 2019). Normally,
FGDs are conducted face-to-face, but in other circumstances (like when
it’s very costly for everyone to meet or in pandemic times like this), it may
be done online via video conference.

6. Case Studies. Case studies collect and present “information about a


particular participant or small group” (Cuesta, 2020). It emphasizes
exploration and description and does not intend to discover a “universal
truth” because it covers only a particular participant or group.

7. Secondary data collection. Here, we utilize the data that were collected
by someone else. For example, in addition to what was previously
mentioned in this module, we have the database such as COMPUSTAT and
OSIRIS, where we can get various data about publicly listed companies
around the world (subscription to them aren’t free, by the way). Collection
of data from the database of Philippines Statistics Authority (PSA),
Philippine Stock Exchange, Securities and Exchange Commission,
Company Websites, Government Reports, and other sources from the
internet, are also examples.

Data Organization

After we have collected our data, it’s now time to organize them. Commonly,
the researcher would prepare fist the frequency distribution (we have actually done
this in the previous section when we code the qualitative data). A frequency
distribution contains the number of observations that fall in each of the category
variables that a study covers.

For this sub-topic, let us use the data in Appendix B (data in Excel file format
will be provided), which contains the ratings of the takers in CPALE and their
academic performance (grades) in their major accounting subjects. Suppose that we
want to know whether academic performance is a predictor of CPALE ratings.
Assume the same codes for the profile, except for the “Award”, which 0 means no
award, and 1 means with Latin honor.

Aside the frequency distribution, percentage distribution may also be


included. It is column showing the percentage of the observations that fall in each
category. In addition, you may also make a column for the cumulative percentage
distribution which sums up the variable/class and all variables/classes below it.

AE9: STATISTICAL ANALYSIS WITH SOFTWARE APPLICATION | Learning Module


8 AE9: Statistical Analysis with Software Application

At this point, let’s now construct the frequency distribution, percentage


distribution, and cumulative percentage distribution for the profile of the CPALE
takers. At this point, it is assumed that you already know how to do it in Gretl. Take
on the following illustration:

Following are the outputs from the software:

AE9: STATISTICAL ANALYSIS WITH SOFTWARE APPLICATION | Learning Module


9 AE9: Statistical Analysis with Software Application

Using the figures generated from Gretl, we may have the following frequency
distribution table:

Table 1. Frequency distribution for CPALE takers’ profile.


Frequency Percentage Cumulative
Variable
(f) (%) (Cum)
Gender
Male 27 33.75 33.75
Female 53 66.25 100.00
Total 80 100.00 100.00
Address
Goa 33 41.25 41.25
Tigaon 20 25.00 66.25
Lagonoy 15 18.75 85.00
San Jose 12 15.00 100.00
Total 80 100.00 100.00
Religious Affiliation
Catholic 31 38.75 38.75
Iglesia ni Cristo 21 26.25 65.00
Born Again Christian 28 35.00 100.00
Jehova’s Witnesses 0 0.00 0.00
Total 80 100.00 100.00
Award
No honor 52 65.00 65.00
With Latin honor 28 35.00 100.00
Total 80 100.00 100.00

Note that gender, address, religious affiliation, and award are nominal variables,
and the frequency distribution table is the one commonly used to present them.
Such will also be the case if the variables are interval and ordinal. For ratio
variables, like the CPALE ratings and grades (last two columns of our data), we may
also present them is frequency, but the software will develop intervals, so such can
be presented in frequency. Take a look at the following:

AE9: STATISTICAL ANALYSIS WITH SOFTWARE APPLICATION | Learning Module


10 AE9: Statistical Analysis with Software Application

But when it comes to ratio variables, it would also be good if we present the
summary statistics, like the following:

Table 2. Summary statistics for the CPALE takers.


Variable Mean Median SD Min Max
CPALE Ratings 67.20 68.00 12.04 43.00 86.00
Grades in Major 88.94 88.00 1.970 86.00 93.00
Accounting Subjects

Data Presentation (Graphic)

Most of the time, data are presented in tables, however, the researcher may
also present them graphically if it would be better in such way. Graphical
presentations of data. Graphical presentation can take the form of charts and
graphs, and with the help of statistical software, it is easy. It’s also easy if you use
MS Excel.

Let us prepare the pie chart first, using the same data on CPALE takers.

AE9: STATISTICAL ANALYSIS WITH SOFTWARE APPLICATION | Learning Module


11 AE9: Statistical Analysis with Software Application

Step 1. MS Excel, plot the summary like the following:

Step 2. Click the range of the values and label that you want to present as a pie
chart, then click the Insert option, then Insert Pie Chart. The following will appear:

AE9: STATISTICAL ANALYSIS WITH SOFTWARE APPLICATION | Learning Module


12 AE9: Statistical Analysis with Software Application

Step 3. You may format the pie chart according to your specifications, you also put
the chart title.

Here’s now your pie graph for the gender of the respondents.

Gender

27

53

Male Female

You may do the same for the other variable profiles.

AE9: STATISTICAL ANALYSIS WITH SOFTWARE APPLICATION | Learning Module


13 AE9: Statistical Analysis with Software Application

Now, for the ratio variables, it would not be so practical if we present it in pie charts.
Instead, we may prepare a histogram. This is “a graph showing the differences in
frequencies or percentages among the categories of an interval-ratio variable. The
categories are displayed as contiguous bars, with width proportional to the width of
the category and height proportional to the frequency or percentage of that category”
(Leon-Guerrero & Nachmias, 2017, p. 45).

This time, we graph the CPALE and academic performance using Gretl.

Click “Variable”, then “Frequency Distribution”, make sure that the “show plot” is
ticked, then click “Ok”. The following will appear.

Take note that you ay save the graph as an image file. Just right click the graph
and choose “Save as PNG” (you may also choose other formats). You may also edit
the graph according to your specifications.

AE9: STATISTICAL ANALYSIS WITH SOFTWARE APPLICATION | Learning Module


14 AE9: Statistical Analysis with Software Application

Aside from histogram, there are other graphs in the Gretl software. Using the
same graph above, let’s just edit such to turn it into other form of graphs. Click the
“Menu” (indicated by three lines), and “Edit”.

If the data involve time series, you may do a line graph to show the trend.
Example is the following (a different dataset was used; time period involved is 2013
– 2017).

The graphical presentation in this module is limited. There are charts and
graphs other the ones presented here. Just explore the Gretl software and MS Excel
if you are interested.

III. ACTIVITIES

1. Watch Khan Academy’s video on representing data,


https://youtu.be/0ZKtsUkrgFQ.
2. Create a google form and use it to gather any data that you want from the BS
Accountancy students.
3. You may now start gathering the data needed for your approved study. After
the deadline of this module, you are expected to have gathered some of your
data already.

AE9: STATISTICAL ANALYSIS WITH SOFTWARE APPLICATION | Learning Module


15 AE9: Statistical Analysis with Software Application

IV. ASSESSMENT

Instruction: Answer the following questions. If you have your laptop or computer,
you may use MS Word for your answers. If you can’t do it in MS Word, you may just
write your answer on a clean sheet of paper and take photo/s of it (please make
sure that your images are clear).

In naming the file for your output, kindly put your name, subject code and module
number (e.g., ABNER, RINA_AE9_Module 2). Please turn in your output through
our Moodle.

A. Theory (50%)

1. In your own words, how do you differentiate primary data from secondary data?
Provide a concrete example to expound your answer.
2. Give concrete examples for quantitative data from qualitative data.
3. If I want to present the socio-demographic profile of my respondents, what
graphic presentation would you likely recommend?
4. What if I want to use graphical presentation for the financial performance of the
SMEs in the Philippines, what would you most likely recommend?
5. For the following, identify the data type (primary or secondary), and the collection
method (choose from the methods discussed under “Data Collection”) that would
best fit the situations:

Data Needed Date Type Data Collection


1. Net income of publicly listed
companies
2. Perception of BSA students on
the flexible learning arrangement
3. Number of fishermen in Partido
4. Willingness to pay of the
residents in Hiwacloy,
Camarines Sur to preserve their
river
5. Environmental disclosures of the
publicly listed corporations in
the Philippines
6. Corporate social responsibility
initiatives of the Cooperatives in
the Partido
7. Way of life of the Agta
Tabangnon Tribe in Coyaoyao,
Tigaon
8. Marketing strategies of RTW
shops in Naga City
9. Number of tourists that visits
Philippines annually
10. Cause of bankruptcy of the
Rural Bank of Goa

AE9: STATISTICAL ANALYSIS WITH SOFTWARE APPLICATION | Learning Module


16 AE9: Statistical Analysis with Software Application

B. Application (50%)

There is an excel file attached to this module, named “Module 2 Assessment


Data”. The data are related to the micro, small, medium, and large enterprises in a
certain locality (figures are hypothetical). Import the data to Gretl, then present them
using tables (all), pie chart (profile) and histogram (income).

Take screenshots of your outputs in Gretl, and in the same file for your answers
in Assessment A (Questions) above, paste your screenshots. Please add at least one
selfie photo (with Gretl) while working on it.

V. REFERENCES

Cuesta, M. (2020). Research methods class. Ateneo de Naga University.


Heumann, C., & Shalabh, M. S. (2016). Introduction to statistics and data analysis. In
Technometrics (Vol. 44). https://doi.org/10.1007/978-3-319-46162-5
Leon-Guerrero, A., & Nachmias, C.-F. (2017). The organization and graphic
presentation of data. In Essentials of social statistics for a diverse society (pp. 21–
62).
Sarstedt, M., & Mooi, E. (2019). A concise guide to market research (Third).
https://doi.org/10.1007/978-3-662-56707-4

AE9: STATISTICAL ANALYSIS WITH SOFTWARE APPLICATION | Learning Module


Student 15 0 2 1
Student 16 0 1 1
Student 17 0 1 3
17 AE9: Statistical Analysis with Software Application
Student 18 1 1 2
Student 19 1 3 3
Appendix A
Student 20 1 1 3
Coding Qualitative Data
Student 21 1 3 3
Student 22 1 2 3
Student 23 1 1 3
Student 24 1 1 3
Student Code Code Code Student 25 1 1 2
Student 1 0 1 1 Student 26 1 1 2
Student 2 1 3 2 Student 27 0 3 2
Student 3 0 3 3 Student 28 0 3 2
Student 4 1 2 3 Student 29 0 2 2
Student 5 1 3 2 Student 30 0 2 3
Student 6 1 2 1 Student 31 1 3 1
Student 7 1 1 1 Student 32 0 3 1
Student 8 1 1 1 Student 33 1 3 1
Student 9 1 1 1 Student 34 1 2 1
Student 10 0 1 3 Student 35 1 2 1
Student 11 0 3 2 Student 36 0 2 1
Student 12 1 3 1 Student 37 0 2 1
Student 13 0 3 1 Student 38 0 2 1
Student 14 1 2 1 Student 39 1 1 1
Student 15 0 2 1 Student 40 0 1 1
Student 16 0 1 1 Student 41 1 1 3
Student 17 0 1 3 Student 42 1 1 3
Student 18 1 1 2 Student 43 1 1 3
Student 19 1 3 3 Student 44 0 1 1
Student 20 1 1 3 Student 45 0 1 1
Student 21 1 3 3 Student 46 1 3 3
Student 22 1 2 3 Student 47 0 3 3
Student 23 1 1 3 Student 48 0 3 3
Student 24 1 1 3 Student 49 0 3 2
Student 25 1 1 2 Student 50 0 3 2
Student 26 1 1 2
Student 27 0 3 2
Student 28 0 3 2
Student 29 0 2 2
Student 30 0 2 3
Student 31 1 3 1
Student 32 0 3 1
Student 33 1 3 1
Student 34 1 2 1
Student 35 1 2 1
Student 36 0 2 1
Student 37 0 2 1
Student 38 0 2 1
Student 39 1 1 1
Student 40 0 1 1
AE9: STATISTICAL ANALYSIS WITH SOFTWARE APPLICATION | Learning Module
Student 41 1 1 3
Student 42 1 1 3
18 AE9: Statistical Analysis with Software Application

Appendix B
Organization and Presentation of Data

Name Sex Address Religion Award CPALE ACADMajor


Taker 1 1 1 1 0 51 89
Taker 2 0 1 3 0 84 92
Taker 3 0 1 2 0 84 89
Taker 4 0 3 1 0 82 91
Taker 5 1 3 1 0 61 87
Taker 6 1 4 1 0 60 89
Taker 7 1 2 1 1 70 91
Taker 8 1 2 1 0 86 92
Taker 9 1 1 3 0 57 88
Taker 10 1 1 2 0 84 89
Taker 11 1 1 3 1 47 87
Taker 12 1 3 3 1 67 88
Taker 13 1 1 3 1 58 87
Taker 14 1 4 3 0 78 89
Taker 15 1 2 3 0 65 88
Taker 16 1 1 3 0 45 88
Taker 17 0 1 2 0 72 87
Taker 18 0 1 2 0 63 87
Taker 19 1 1 2 0 80 91
Taker 20 1 4 2 1 58 87
Taker 21 1 4 2 0 64 89
Taker 22 1 2 3 0 81 91
Taker 23 0 2 1 0 67 88
Taker 24 1 3 1 1 75 87
Taker 25 1 3 1 1 58 89
Taker 26 0 3 1 1 50 87
Taker 27 1 2 3 0 77 92
Taker 28 1 2 2 1 58 87
Taker 29 1 2 1 0 85 93
Taker 30 1 1 1 0 59 88
Taker 31 1 1 1 0 69 88
Taker 32 0 1 1 0 84 88
Taker 33 1 3 1 1 84 92
Taker 34 0 3 3 0 58 86
Taker 35 1 4 2 0 54 89
Taker 36 0 2 3 1 73 89
Taker 37 1 2 3 76 91
Taker 38 0 1 3 1 81 91
Taker 39 1 1 3 1 78 91
Taker 40 1 1 3 1 59 89

AE9: STATISTICAL ANALYSIS WITH SOFTWARE APPLICATION | Learning Module


19 AE9: Statistical Analysis with Software Application

Name Sex Address Religion Award CPALE ACADMajor


Taker 41 1 3 3 0 80 92
Taker 42 1 1 2 0 76 93
Taker 43 0 4 2 0 73 88
Taker 44 1 2 2 1 52 89
Taker 45 1 1 2 1 55 88
Taker 46 1 1 2 1 63 88
Taker 47 0 1 3 1 73 93
Taker 48 1 1 1 0 44 87
Taker 49 1 4 1 0 70 91
Taker 50 0 4 1 0 68 91
Taker 51 1 2 1 0 67 88
Taker 52 1 2 3 0 48 89
Taker 53 0 3 2 1 75 89
Taker 54 0 3 1 1 78 89
Taker 55 0 3 1 0 54 91
Taker 56 0 2 1 0 78 93
Taker 57 1 2 1 0 43 89
Taker 58 1 2 1 0 73 88
Taker 59 0 1 3 0 69 87
Taker 60 0 1 2 0 63 91
Taker 61 0 1 3 1 79 92
Taker 62 1 3 3 1 47 86
Taker 63 1 3 3 1 64 87
Taker 64 1 4 3 1 51 87
Taker 65 0 2 3 0 77 89
Taker 66 1 2 3 0 76 88
Taker 67 0 1 2 1 67 88
Taker 68 1 1 2 0 61 87
Taker 69 0 1 2 0 83 88
Taker 70 0 3 2 0 81 93
Taker 71 1 1 2 0 43 86
Taker 72 0 4 3 0 68 88
Taker 73 1 2 1 0 72 87
Taker 74 1 1 1 1 66 87
Taker 75 1 1 1 1 61 87
Taker 76 1 1 3 0 76 88
Taker 77 1 1 1 0 53 87
Taker 78 1 4 1 0 53 87
Taker 79 0 4 1 0 85 91
Taker 80 1 2 1 1 69 87

AE9: STATISTICAL ANALYSIS WITH SOFTWARE APPLICATION | Learning Module

You might also like