You are on page 1of 5

Lecture 1: Introducton to the Use of Statistics in Empirical Research in the Social Sciences

Internationalisation of road sides

 Data/Statistics are internationalization of information


 EU Driving licence

Existence of data bases around the world

OECD
http://stats.oecd.org/

ECB

http://sdw.ecb.europa.eu/
https://www.govdata.de/web/guest/daten

https://www.ksh.hu/

Role of statistics as a tool in empirical research in the social sciences?

Misleading?
Basic notions: individuals, samples and populations; variables, types and categories; frequencies and distributions

 Statistics is about collecting, analyzing, interpreting and presenting data for the purpose of understanding
phenomena in the world around us and facili tating informed decision making.
 Think of it as a harmonization tool
 Data are the facts and figures collected, summarized, analysed and interpreted. Evidently data can come
from a number of sources: internal firm data, business database services, statistical bureaus, government
agencies, self-collected.
 In a particular study, we have the data set, which contains elements (entities on which data are collected)
and variables (a characteristic of interest for the elements), and each particular element’s set of
measurements for a given list of variable is the observation.
 In experimental studies, the variables of interests are first identified, then one or more factors are controlled
so that data can be obtained about how the factors influence the variables. In observational studies, no
attempt is made to control or influence the variables of interest
o Data acquisition considerations: time requirement, cost of acquisition, data errors.
 Types of data:
o Nature of variable: quantitative vs. Qualitative
 Qualitative data : labels or names used to identify an attribute of each element = categorical
data, can be nominal/ordinal OR numeric/non-numeric
 Quantitative data: indicates how much (continuous), how many (discrete), always numeric
o Type of data representation: numerical, non-numerical
o Scale of measurement: nominal vs. ordinal; interval vs. ratio
 Determines the amount of information contained in the data, which then shapes the choice
of data summarization and statistical analyses
 Nominal: data are labels/names used to identify an attribute of an element (this label may or
may not be numeric) (1 for women; 2 for men)
 Ordinal: nominal data with rank (Distinction, Merit, Pass, Fail)
 Interval: properties of ordinal data and the interval between observations is expressed in
terms of a fixed unit of measure (always numeric); then it is meaningful to calculate sums
and differences of data values, but the scale does not have a natural zero point (GMAT
scores)
 Ratio: properties of interval data and the ratio of two values is meaningful (distance, height,
weight). Scale must contain a zero value that indicates that nothing exists for the value at
the zero point. (no. of credits earn, 72 credits vs. 36 credits means twice as many credits)
 Types of data sets:
o Cross-sectional data: observations made on a number of elements in a single period
o Time series data: observations made on a single entity over a number of periods
o Panel data: observations made on a number of elements over several periods.
 Ideally, we would want to look at everyone in the population (the set of all elements), but this is usally
impossible. So we stick with a sample (a subset of the population).
 Descriptive statistics are the tabular, graphical and numerical methods used to summarise data. Descriptive
statistics tools are used whenever the sole purpose of our analysis is to describe the observed data values.
o Tabular summary:
 Qualitative. Frequency distribution (tabular summary of data showing the
frequency/number of items in each of several non-overlapping classes)
 Qualitative. Relative frequency of a class = fraction or proportion of the total number of
data items belonging to the class  relative frequency distribution
 Quantitative. Classes are not predefined, so we need to divide the data range into a number
of class intervals.
o Graphical summary: histograms, pie charts, leaf and stem etc etc
 Qualitative. Bar graph/bar chart: labels for classes vs. frequency/relative frequency/%; bars
the same size, separated
 Qualitative. Pie chart: used to represent frequency distributions; segments of circle
represent each class, 360o in a circle, so relative frequency takes up an angle corresponding
to freq/100*360
 Quantitative.
o Numerical descriptive statistics (mean, etc.)
 Statistical inference is the process of using data obtained from a sample to make estimates and test
hypotheses about the characteristics of a population. Used to generalize the information extracted from the
observed data to some larger population: one needs to resort to statistical inference whenever collecting
data on the entire population is impossible, too costly or too time-consuming.

You might also like