You are on page 1of 1

Chapter 2: Data collection Nguyen Thi Thu Van - September 19, 2022

Data terminology
An observation is a single member of a collection of items that we want to study, such as a person, firm, or region. A variable is a
characteristic of the subject or individual, such as name, age, income. A dataset consists of all the values of the observations we have
chosen to observe. Data usually are entered into a spreadsheet or database as an 𝑛 × 𝑚 matrix: univariate datasets (one variable);
bivariate datasets (two variables); and multivariate datasets (more than two variables). The questions that can be explored and the
data analytical techniques that can be used will depend upon the data type and the number of variables.

Data type
Categorical / Qualitative data Numerical / Quantitative data
Categorical data have values that are described by words rather than numbers (e.g., gender, eye color, hair color). Because categorical Numerical data arise from counting, measuring something, or some kind of mathematical operation. Numerical data can be broken down
variables have non numerical values, on occasion the values of categorical variable may be represented using numbers, this is called into two types: discrete (i.e. variables with countable number of values like the number of credits, the number of passengers in a flight
coding. But coding a category as a number does not make the data numerical and the number does not typically imply a rank. …) and continuous (i.e. variables with values within an interval like height, weight, time, income …).
Time-series data is a sequence of data points collected over time intervals, allowing us to track changes over time. Time-series data can
track changes over milliseconds, days, or even years. For the time series data, we are interested in the trends, or the pattern over time.

Cross-sectional data is collected from many units (people, companies, countries, etc.) in a single time period For the cross-sectional
data, we are interested in variation among observations or in relationships.
a finite population is effectively infinite if the sample is less than 5 percent of the population
(i.e., if n/N < .05)

4 levels of measurement for data


Nominal scale/measurement Ordinal scale/measurement Interval scale/measurement Ratio scale/measurement
A nominal scale describes a variable with categories that do not An ordinal scale is one where the order matters but not the An interval scale is one where there is order and the difference A ratio scale is an interval scale where ratios are meaningful. The
have a natural order or ranking. difference between values. between two values are meaningful, but ratios are not meaningful zero point of this scale is meaningful, meaning that it indicates the
because the zero point of these scales doesn’t mean the absence of absence of the quantity being measured.
the quantity being measured.
Example. If you’re measuring the academic majors for a group of Example. Often, an ordinal scale consists of a series of ranks Example. You know that a measurement of 80° Fahrenheit is Example. A gas tank with 10 gallons (10 more than 0) has twice
college students, the categories would be arts, business, chemistry, (first, second, third, and so on) like the order of finish in a horse higher than exactly 20° a measure of 60° F. But we can’t say that as much gas as a tank with only 5 gallons (more than 0).
and so on. Each student would be classified in one category race. 60°F is twice as warm as 30°F, or that 30°F is 50 percent warmer
according to his or her major. than 20°F, because zero point of this scale is not meaningful.
The Likert scale is widely used in social work research. It is usually treated as an interval scale, but strictly speaking it is an ordinal scale, where arithmetic operations cannot be conducted. The coarseness of a Likert-scale refers to the number of scale points.

Sampling methods
Two main categories of sampling methods: random sampling (e.g., simple random sample, systematic sample, stratified sample, A census is an examination
cluster sample) and non-random sampling (e.g., judgement sample, convenience sample, focus group). Sampling without of all items of the population,
replacement means that once an item has been selected to be included in the sample, it cannot be considered for the sample again. while a sample involves
Sampling with replacement means that the same random number could show up more than once. Sampling with replacement does not looking only at some items
lead to bias in our sample results. Note that when the population is finite and the sample size is close to the population size, we should selected from the population.
not use sampling without replacement. When the sample’s less than 5% of the population, the population is effectively infinite.

Survey
Most survey research follows the same basic steps: Step 1: State the goals of the research. Step 5: Design a data collection instrument (questionnaire).
Step 2: Develop the budget (time, money, staff …). Step 6: Pretest the survey instrument and revise as needed.
Step 3: Create a research design (target population, frame, sample size) Step 7: Administer the survey (follow up if needed). Step 8: Code the data and analyze it.
Step 4: Choose a survey type and method of administration. Note that these steps may overlap in time.

You might also like