Professional Documents
Culture Documents
Data terminology
An observation is a single member of a collection of items that we want to study, such as a person, firm, or region. A variable is a
characteristic of the subject or individual, such as name, age, income. A dataset consists of all the values of the observations we have
chosen to observe. Data usually are entered into a spreadsheet or database as an 𝑛 × 𝑚 matrix: univariate datasets (one variable);
bivariate datasets (two variables); and multivariate datasets (more than two variables). The questions that can be explored and the
data analytical techniques that can be used will depend upon the data type and the number of variables.
Data type
Categorical / Qualitative data Numerical / Quantitative data
Categorical data have values that are described by words rather than numbers (e.g., gender, eye color, hair color). Because categorical Numerical data arise from counting, measuring something, or some kind of mathematical operation. Numerical data can be broken down
variables have non numerical values, on occasion the values of categorical variable may be represented using numbers, this is called into two types: discrete (i.e. variables with countable number of values like the number of credits, the number of passengers in a flight
coding. But coding a category as a number does not make the data numerical and the number does not typically imply a rank. …) and continuous (i.e. variables with values within an interval like height, weight, time, income …).
Time-series data is a sequence of data points collected over time intervals, allowing us to track changes over time. Time-series data can
track changes over milliseconds, days, or even years. For the time series data, we are interested in the trends, or the pattern over time.
Cross-sectional data is collected from many units (people, companies, countries, etc.) in a single time period For the cross-sectional
data, we are interested in variation among observations or in relationships.
a finite population is effectively infinite if the sample is less than 5 percent of the population
(i.e., if n/N < .05)
Sampling methods
Two main categories of sampling methods: random sampling (e.g., simple random sample, systematic sample, stratified sample, A census is an examination
cluster sample) and non-random sampling (e.g., judgement sample, convenience sample, focus group). Sampling without of all items of the population,
replacement means that once an item has been selected to be included in the sample, it cannot be considered for the sample again. while a sample involves
Sampling with replacement means that the same random number could show up more than once. Sampling with replacement does not looking only at some items
lead to bias in our sample results. Note that when the population is finite and the sample size is close to the population size, we should selected from the population.
not use sampling without replacement. When the sample’s less than 5% of the population, the population is effectively infinite.
Survey
Most survey research follows the same basic steps: Step 1: State the goals of the research. Step 5: Design a data collection instrument (questionnaire).
Step 2: Develop the budget (time, money, staff …). Step 6: Pretest the survey instrument and revise as needed.
Step 3: Create a research design (target population, frame, sample size) Step 7: Administer the survey (follow up if needed). Step 8: Code the data and analyze it.
Step 4: Choose a survey type and method of administration. Note that these steps may overlap in time.