You are on page 1of 24

Introduction To Research Data

DATE: 23-10-2022
Learning Outcomes 学习成果
At the end of this class, students will be able to:

❑ Define research data

❑ List research data formats

❑ Describe various types of research data

❑ Highlight various sources of research data

❑ Classify research data

❑ Advice on which type of data to apply in a research

❑ Use ordinal and nominal research data

❑ Describe Likert, ordinal, nominal, interval, and ratio scales

❑ Compare discrete and continuous research data

❑ Define research variables and tabula representation

❑ Highlight the various methods for research data collection


Definitions of Research Data
❑ Research data can be describe as any information that has been collected, observed,
generated or created to validate original research findings.
❑ It can also be define as a collection of facts or information from which conclusions
may be drawn.
❑ Many people think of data-driven research as something that primarily happens in the
sciences. It is often thought of as involving a spreadsheet filled with numbers. Both of
these beliefs are incorrect.
❑ Research data are collected and used in scholarship across all academic disciplines
and while it can consist of numbers in a spreadsheet, it also takes many different
formats, including videos, images, artifacts, and diaries.
❑ For example, a psychologist may be collecting survey data to better understand
human behavior, an artist may be using data to generate images and sounds, or an
anthropologist using audio files to document observations about different cultures.
❑ Scholarly research across all academic fields is increasingly data-driven.
Research Data Formats
❑ Data may be intangible as in measured numerical values found in a spreadsheet or an
object as in physical research materials such samples of rocks, plants, or insects.
Examples of the formats that data can take include:
❖ Documents (text, MS Word), spreadsheets
❖ Lab notebooks, field notebooks, diaries
❖ Questionnaires, transcripts, surveys
❖ Codebooks
❖ Experimental data
❖ Films, audio or video tapes/files
❖ Photographs, image files
❖ Sensor readings
❖ Test responses
❖ Artifacts, specimens, physical samples
❖ Models, algorithms, scripts
❖ Content analysis
❖ Focus group recordings, interview notes
Types of Research Data
❑ The type of research data you collect may affect the way you manage that data.

❑ For example, data that is hard or impossible to replace (e.g. the recording of an event at a specific time and place) requires extra backup procedures to

reduce the risk of data loss. Or, if you will need to combine data points from different sources, you will need to follow best practices to prevent data corruption.

Research data can be generated for different purposes and through different processes.

❖ Observational data is captured in real-time, and is usually irreplaceable, for example sensor data, survey data, sample data, and neuro-images.

❖ Experimental data is captured from lab equipment. It is often reproducible, but this can be expensive. Examples of experimental data are gene sequences,

chromatograms, and toroid magnetic field data.

❖ Simulation data is generated from test models where model and metadata are more important than output data. For example, climate models and economic

models.

❖ Derived or compiled data has been transformed from pre-existing data points. It is reproducible if lost, but this would be expensive. Examples are data

mining, compiled databases, and 3D models.

❖ Reference or canonical data is a static or organic conglomeration or collection of smaller (peer-reviewed) datasets, most probably published and curated.

For example, gene sequence databanks, chemical structures, or spatial data portals.
Sources of data
As I told you before
source of data….!
Records/previous
Sources of data

studies
(secondary data)
Comprehensive
(universal)
Surveys
(primary data)
Sample
Experiments
(primary data)
Classification of Research Data
❑ Qualitative data describes qualities or characteristics.
Qualitative
❑ It is collected using questionnaires, interviews, or observation,
(Categorical)
and frequently appears in narrative form.
Research Data ❑ For example, it could be notes taken during a focus group on
the quality of the food at KFC, or responses from an open-
Quantitative ended questionnaire.
(Numerical) ❑ Qualitative data may be difficult to precisely measure and
analyze.

❑ Quantitative data are used when a researcher is trying to quantify a problem, or address ❑ The data may be in the form of descriptive words that can be
the "what" or "how many" aspects of a research question. examined for patterns or meaning, sometimes through the use
❑ It is data that can either be counted or compared on a numeric scale. of coding.
❑ For example, it could be the number of third semester students at LUC, or the ratings on
❑ Coding allows the researcher to categorize qualitative data to
a scale of 1-4 of the quality of food served at McDonald’s or KFC.
identify themes that correspond with the research questions
❑ This category of data are usually gathered using instruments, such as a questionnaire
and to perform quantitative analysis.
which includes a ratings scale or a thermometer to collect weather data.
❑ Statistical analysis software, such as SPSS, is often used to analyze quantitative data.
Qualitative or Quantitative?
❑ Research topics may be approached using either quantitative or qualitative
methods. Should I use qualitative
or quantitative data in
❑ Choosing one method or the other depends on what you believe would provide the my research?
best evidence for your research objectives.
❑ Researchers sometimes choose to incorporate both in their research since these
methods provide different perspectives on the topic.
❑ For example, you want to know the locations of the most popular study spaces in
LUC Wisma campus, and why they are so popular.
❑ To identify the most popular spaces, you might count the number of students
studying in different locations at regular time intervals over a period of days or
weeks. This quantitative data would answer the question of how many people
study at different locations on campus.
❑ To understand why certain locations are more popular than others, you might use a
survey to ask students why they prefer these locations. This is qualitative data.
Classification of Research Data Cont’d
Categories
Ordinal
Qualitative
Ranks
(Categorical)

Binary
Nominal
Non-Binary
Research Data

Discrete
(Counting)

Quantitative Interval Scale


(Numerical) Continuous
(Measuring) Ratio Scale
Ordinal Research Data
❑ Ordinal research data is a kind of categorical data with a set order or scale.

❑ For example, ordinal data is said to have been collected when a responder inputs his/her

financial happiness level on a scale of 1-10.

❑ In ordinal data, there is no standard scale on which the difference in each score is measured.

❑ This is to show that the scale is usually influenced by personal factors and not due to a set rule.

❑ Examples include:
❖ Agreement (strongly disagree, disagree, neutral, agree, strongly agree)
❖ Degree/severity of illness (mild, moderate, severe)
❖ Rating (excellent, good, fair, poor)
❖ Frequency (always, often, sometimes, never)
❖ Classification
✓ (1st , 2nd, 3rd, …..)
✓ primary, secondary, tertiary….
✓ grades (A B C D E F)
Ordinal Scale
❑ The Ordinal scale includes statistical data type where variables are in order or rank but
without a degree of difference between categories.
❑ The ordinal scale contains qualitative data.
❑ It places variables in order/rank, only permitting to measure the value as higher or
lower in scale.
❑ You can use an ordinal scale for research and survey purposes to understand the
higher or lower value of a data set. The scale identifies the magnitude of the variables.
❑ It does not explain the distance between the variables.
❑ The ordinal scale cannot answer “how much” different the two categories are.
❑ Like a Likert scale, the ordinal scale can measure frequency, importance, satisfaction,
likelihood, quality, and experience, etc.
❑ The measures in ordinal scale do not have absolute value hence the real difference
between adjacent values may not have the same meaning.
❑ For example, the values in the age scale “less than 20” and “20-50” do not have the
same meaning as “50-80” and “over 80”.
Likert Scale
❑ Likert scale is a point scale used by researchers to take surveys and ❑ A 4 point Likert scale is basically a forced Likert scale.
❑ The reason it is named as such is that the user is
get people's opinion on a subject matter.
forced to form an opinion.
❑ It is usually a 5 or 7-point scale with options that range from one ❑ There is no safe 'neutral' option.
extreme to another. ❑ It is mostly used by market researchers to get specific
responses.
Take for example:

❑ How satisfied are you with our meal tonight?

1 2 3 4 5

Very Somewhat Neutral Somewhat Very


Satisfied Satisfied Dissatisfied Dissatisfied
Nominal Research Data
❑ Nominal research data is a type of data that is used to label variables without Examples:
providing any quantitative value. Consider the two questions below:
❑ It is the simplest form of a scale of measure. 1. How was your customer service
❑ Unlike ordinal data, nominal data cannot be ordered and cannot be experience? _______
measured. 2. How was your customer service experience?
❑ Nominal data cannot be manipulated using available mathematical operators. ✓ Good
Thus, the only measure of central tendency for such data is the mode. ✓ Neutral
❑ Nominal data can be both qualitative and quantitative. However, the ✓ Bad
quantitative labels lack a numerical value or relationship (e.g., identification The data to be collected from Question 1 is a

number). On the other hand, various types of qualitative data can be nominal data, while that of 2 is an ordinal data.

represented in nominal form. Type of cars (Proton Saga, Proton


Wira, BMW, Toyota, Honda, Jaguar) Non-binary
❑ They may include words, letters, symbols, names of people, phone number, Ethnicity (Malay, Chinese, Indian)
address, gender, and nationality.
Smoking status (smoker, non-smoker)
❑ A type of categorical data without an essential order. Disease status (Diseased, normal) Binary
Status of student (Undergraduate, Postgraduate)
Nominal Scale
❑ Nominal scale uses “tags” or “labels” to associate value with the rank.

❑ It differentiates items based on the categories they belong.

❑ A nominal scale does not depend on numbers because it deals with non-

numeric attributes.

❑ For example, in a marathon race, all the contestants are given a number.

❑ These numbers are for the purpose of identifying the contestant. The

numbers don’t have any association with the result of the race or with the

characteristics of the person.

❑ A nominal scale can have both, qualitative as well as quantitative variables.

❑ For example, religious affiliation, gender, country or city of origin, marital

status, etc. can be considered to be a type of Nominal Scale.


Discrete Research Data
❑ Discrete data is a count that involves integers.

❑ Only a limited number of values is possible.

❑ This type of data cannot be subdivided into different parts.

❑ Discrete data includes discrete variables that are finite, numeric, countable, and
non-negative integers.

❑ In many cases, discrete data can be prefixed with “the number of”.

For example:

❖ The number of PhD students who have attended the class

❖ The number of customers who have bought different products

❖ The number of groceries people are purchasing every day

❑ This type of data is mainly used for simple statistical analysis because it is easy
to summarize and compute.

❑ In most of the practices, discrete data is displayed by bar graphs, stem-and-leaf-


plot and pie charts.
Continuous Research Data
❑ Continuous research data is data that can take any value.

❑ Height, weight, temperature and length are all examples of


continuous data.

❑ Some continuous data will change over time, such as the weight of a
baby in its first year or the temperature in a room throughout the day.

❑ This data is best shown on a line graph, skews, and histograms as


these type of graphs can show how the data changes over a given
period of time.

❑ Other continuous data, such as the heights of a group of children on


one particular day, is often grouped into categories to make it easier
to interpret.

❑ The numbers of continuous data are not always clean and integers,
as they are usually collected from very precise measurements.
Interval Scale
❑ An interval scale can be defined as a quantitative measurement scale where
variables have an order, the difference between two variables is equal, and the
presence of zero is arbitrary.
❑ It can be used to measure variables that exist along a common scale in equal
intervals.
❑ Interval scales are best suited in surveys where respondents must enter values
regarding temperature, time, and dates.
❑ Interval scales can be easily integrated into multiple choice questions or rating
scale questions by asking respondents to use a numerical scale to make a rating.
For example:
❑ Net Promoter Score surveys measure the likelihood of customers recommending
a company’s products or services to others.
❑ It does so by asking them to rate their likelihood to do so on a numeric scale from
0 to 10, where 0 indicates they are not likely at all, and 10 indicates they are very
likely.
Ratio Scale
❑ Ratio scale is a type of variable measurement scale which is
quantitative in nature.
❑ It allows any researcher to compare the intervals or differences.
❑ Ratio scale is the 4th level of measurement and possesses a zero
point or character of origin. This is a unique feature of this scale.
❑ For example, the temperature outside is 0-degree Celsius. 0 degree
doesn’t mean it’s not hot or cold, it is a value.
❑ A ratio scale is the most informative scale as it tends to tell about
the order and number of the object between the values of the scale.
❑ The most common examples of this scale are height, money, age,
weight, blood pressure etc.
❑ With respect to market research, the common examples that are
observed are sales, price, number of customers, market share etc.
Discrete vs Continuous Data
❑ Both data types are important for statistical analysis. However, some major
differences need to be noted before drawing any conclusions or making
decisions.

The key differences are:

❑ Discrete data is the type of data that has clear spaces between values.
Continuous data is data that falls in a constant sequence.

❑ Discrete data is countable while continuous is measurable.

❑ To accurately represent discrete data, the bar graph is used. Histogram or line
graphs are used to represent continuous data graphically.

❑ A diagram of the discrete function shows a distinct point that remains


unconnected. While in a continuous function graph, the points are connected
with an unbroken line.

❑ Discrete data contains distinct or separate values. Continuous data includes


any value within the preferred range.
Datasets and Data Tables
❑ A dataset is a set or collection of data.

❑ This set is normally presented in a tabular pattern.

❑ Every column describes a particular variable.

❑ Each row corresponds to a given member of the dataset, as per the given
question.

❑ Datasets describe values for each variable for unknown quantities such as height,
weight, temperature, volume, etc of an object or values of random numbers.

❑ The values in this set are known as a datum.

❑ The dataset consists of data of one or more members corresponding to each row.

Data Table

❑ A dataset organized into a table, with one column for each variable and one row
for each person.
Definitions for Variables & Typical Data Table
OBS AGE BMI FFNUM TEMP( 0F) GENDER EXERCISE LEVEL QUESTION
❑ AGE: Age in years 1 26 23.2 0 61.0 0 1 1
2 30 30.2 9 65.5 1 3 2
❑ BMI: Body mass index, weight/height2 in kg/m2
3 32 28.9 17 59.6 1 3 4

❑ FFNUM: The average number of times eating “fast 4 37 22.4 1 68.4 1 2 3


5 33 25.5 7 64.5 0 3 5
food” in a week 6 29 22.3 1 70.2 0 2 2
7 32 23.0 0 67.3 0 1 1
❑ TEMP: High temperature for the day
8 33 26.3 1 72.8 0 3 1
❑ GENDER: 1- Female 0- Male 9 32 22.2 3 71.5 0 1 4
10 33 29.1 5 63.2 1 1 4
❑ EXERCISE LEVEL: 1- Low 2- Medium 3- High
11 26 20.8 2 69.1 0 1 3
12 34 20.9 4 73.6 0 2 3
❑ QUESTION: Compared to others, what is your
13 31 36.3 1 66.3 0 2 5
satisfaction rating of the National Practitioner Data 14 31 36.4 0 66.9 1 1 5
15 27 28.6 2 70.2 1 2 2
Bank?
16 36 27.5 2 68.5 1 3 3

1- Very Satisfied 2- Somewhat Satisfied 3- Neutral 17 35 25.6 143 67.8 1 3 4


18 31 21.2 11 70.7 1 1 2
4- Somewhat dissatisfied 5- Dissatisfied 19 36 22.7 8 69.8 0 2 1
20 33 28.1 3 67.8 0 2 1
Data Collection Methods
Method When to use How to collect data
Experiment To test a causal relationship. Manipulate variables and measure their effects on others.

To understand the general characteristics or Distribute a list of questions to a sample online, in person or
Survey
opinions of a group of people. over-the-phone.
To gain an in-depth understanding of perceptions Verbally ask participants open-ended questions in individual
Interview/focus group
or opinions on a topic. interviews or focus group discussions.

Observation To understand something in its natural setting. Measure or survey a sample without trying to affect them.

To study the culture of a community or Join and participate in a community and record your
Ethnography
organization first-hand. observations and reflections.
To understand current or historical events, Access manuscripts, documents or records from libraries,
Archival research
conditions or practices. depositories or the internet.
To analyze data from populations that you can’t Find existing datasets that have already been collected, from
Secondary data collection access first-hand. sources such as government agencies or research
organizations.
❑ Carefully consider what method you will use to gather data that helps you directly answer your research questions.
Class Attendance 课堂出勤
Please click on the link below to submit your class attendance.

https://forms.gle/SPizKfEhKFNGrbNh6

You might also like