Professional Documents
Culture Documents
1
Course Information
►Lecturer info:
►Email: tuan.le@isb.edu.vn
►Page: https://sites.google.com/view/anhtuanle
2
Course Information
► Course requirements:
3
Group Assigment (GA)
►Work in group (3-5 students per group).
►Group list should be submitted in Week 2.
►GA1 : Week 4
►GA2: Week 11
4
Exam
► Mid-term exam :
► An in-class closed-book exam, 90 minutes
► The mid-term exam will include 50 multiple choice
questions.
► Cover Weeks 1- 6
► No laptop, no electronic devices.
► A one-page formula sheet is provided.
► Final exam:
► Closed-book exam, 120 minutes.
► 40 multiple choice questions
► Short/long questions
► No laptop, no electronic devices.
► A one-page formula sheet is provided.
5
Course Information
►Required reading:
►Lecture notes.
►Doane, P. D. & Seward, E. L. (2016), Applied Statistics
in Business and Economics, 5th edition, McGraw Hill.
►Recommended reading:
►Berenson, Mark L., David M. Levine and Timothy C.
Krehbiel (2004), Basic Business Statistics, 9th edition,
Prentice-Hall
6
Course Information
►Software
►Microsoft Excel, MegaStat are used in the course.
►Others
7
Tentative schedule
►Session 1: Introduction and Data collection (Chap. 1,2 )
8
Tentative schedule
9
Session 1. Introduction
10
Introduction
►Statistics
►Level of Measurement
►Sampling Concepts
►Sample Methods
11
What is Statistics?
► Statistics is the science of collecting, organizing, analyzing,
interpreting, presenting data and drawing conclusions from
data.
12
Two Branches of Statistics
Statistics
► The branch of mathematics that transforms data into useful
information for decision makers.
13
Variables and Data
14
Variables and Data
Variables
15
Variables and Data
►Categorical or qualitative data have values that are
described by words, may be coded.
►Numerical or quantitative data comes from counting,
measuring, or mathematical operation.
16
Types of Dataset
► Cross-sectional data
► Panel/Longitudinal data
17
Cross-sectional data
obs wage educ Exp Age female
1 20.40 20 10 34 1
2 10.29 15 8 30 0
… … … … … …
500 40.39 30 12 45 1
… … … … … …
► Observations on economic variables over time: stock prices, money supply, CPI,
GDP, inflation rates,…
► Frequencies: daily, weekly, monthly, quarterly, annually
► Ordering is important here!
• Behaviour of economic subject (and the resulting indicators) evolve in a gradual
manner in time
• Lags in economic behaviour (stock prices today affect next month’s actions)
► Typically, observations cannot be considered independent across time →
require more complex econometric techniques.
19
Pooled cross sections
obs Year wage educ Exp Age female
1 2010 20.40 20 10 34 1
… … … … … … …
… … … … … … …
1 2020 4.44 15 25 30 10
Firm
1 2021 4.98 14 39 31 15
1
1 2022 5.02 23 41 32 21
2 2020 3.43 32 20 4 22
Firm
2 2021 7.53 4 11 5 11
2
2 2022 8.49 10 18 6 10
… … … … … … …
► Typical problem: missing values - for some units and periods there are no
data.
22
Question
► What types of datasets are the following (cross-sectional,
pooled cross-sectional, time series, panel data)?
1. A performance survey of banks in Ho Chi Minh city in 2022 cross - sectional
2. Financial ratios of 2500 SMEs over the 2000 – 2020 period panel
3. A happiness index of Vietnam between 2000– 2020 time
4. Income surveys of citizens in Ho Chi Minh city in 2000, 2010. pool - cross
5. Economic policy uncertainty indices of 40 countries around world for a 20-
year period. panel panel data = 40*20 =800
6. Monthly Stock information (close price, high price, return) of Vinamilk in
2023. time n = 12
7. Unemployment rates of Asian countries in the year of Covid-19 2019. cross - se
8. Sales revenue of firms in the retail industry in 2020. cross - sec
9. Annual income of a group of employees at a company for four consecutive
years. panel => obs = 40*4
10. Survey about customer satisfaction in a supermarket on a given day. cross - sec
23
Collecting the data
24
Collecting the data
Scales of Measurement
26
Scales of Measurement
► Scales of measurement include:
► Nominal
► Ordinal
► Interval
► Ratio
► The scale determines the amount of information contained in
the data.
► The scale indicates the data summarization and statistical
analyses that are most appropriate.
27
Nominal Scale
► Data are labels or names used to identify attributes of the
element.
► Example:
► Students of a university are classified by as Business,
Humanities, Education, and so on.
► Alternatively, a numeric code could be used for the school
variable (e.g. 1 denotes Business, 2 denotes Humanities, 3
denotes Education, and so on).
► No ordering.
28
Ordinal Scale
► The data have the properties of nominal data and the order or
rank of the data is meaningful.
► Example:
► Students of a university are classified as Freshman, Junior, or
Senior.
► Alternatively, a numeric code could be used for the class
standing variable (e.g. 2 denotes Freshman, 3 denotes Junior,
and so on).
► Ordering, but differences have no meaning.
29
Interval Scale
► The data have the properties of ordinal data, and the 0 °C 32.0 °F
difference between measurements is meaningful, 1 °C 33.8 °F
2 °C 35.6 °F
but the measurements have no true zero value 3 °C. 37.4 °F
(absence of quantity being measured) 4 °C. 39.2 °F
5 °C. 41.0 °F
6 °C. 42.8 °F
► Example:
► Difference between a temperature of 00C and 200C is
the same difference as between 200C and 400C, but
you can't say that 40°C is twice as hot as 20°C
because the scale does not start at absolute zero
(where there is no temperature)..
► Differences have meaning, but ratios have no meaning.
30
Ratios
► The data have all the properties of interval data and the ratio
of two values is meaningful.
► The measurements have a true zero value.
► Example:
► Company A has $1 million in total assets, while Company B
has $2 million in total assets. Company B’s firm size is double
that of company A.
► Company C has zero total assets.
► Ratios have meaning.
31
Scales of Measurement
Ratio Data Differences between measurements
and ratios
33
Quiz
► Determine the level of measurement (Nominal, Ordinal,
Interval, Ratio)
1. Firms are described as small, medium, and large. ordinal
2. Sales revenue of firms ratio
3. The number of years in operations of firms. ratio
4. Statistics software (Eview, SPSS, STATA) nominal
5. Product quality being rated as fail, average, good, excellent ordinal
6. Customer satisfaction is based on the 7-point Likert. interval, strictly ordinal
7. Industry codes nominal
8. Income of people in HoChiMinh city. ratio
9. IQ scores interval
10. Weights of students in class ratio
11. YES/NO question nominal
12. Children in a school are evaluated and classified as non-readers (0), beginning readers (1),
grade level readers (2), or advanced readers (3). ordinal
34
Sampling Concept
36
Population vs. Sample
population.
37
Population vs. Sample
Population Sample
a b cd
b c
ef gh i jk l m n
gi n
o p q rs t u v w
o r u
x y z
y
38
Parameters vs. Statistics
39
Sampling Concepts
40
Sampling Methods
41
Sampling Methods
Sampling Methods
Convenience Simple
Systematic
Random
Judgment
Cluster
Focus group Stratified
42
Nonstatistical Sampling
(Non-random Sampling)
► Convenience Sample
► Use a sample that happens to be available (e.g.,
ask co-worker opinions at lunch).
► Judgment Sample
► Use expert knowledge to choose “typical” items
(e.g., which employees to interview).
► Focus Groups
► In-depth dialog with a representative panel of
individuals (e.g. iPhone users).
43
Statistical Sampling
Statistical Sampling
(Probability Sampling)
44
Statistical Sampling
45
Simple Random Sampling
► Every member of the population has an equal chance of being selected
► Every possible sample of a given size has an equal chance of being
selected
► Selection may be with replacement or without replacement
► In sampling without replacement (replacement), once an element
is selected from the population to be included in the sample, it is
not returned (returned) to the population. Therefore, the same
element cannot be selected more than once (could potentially be
selected multiple times) in the same sample.
46
Simple Random Sampling
47
Systematic Random Sampling
► Decide on sample size: n
► Divide frame of N individuals into n groups of k
individuals: k = N/n
► Randomly select one individual from the first group
► Select every kth individual thereafter
N = 372,000
n = 3,720 First Group
k = 100
48
Stratified Random Sampling
► Divide population into subgroups (called strata) according
to some common characteristics (e.g. age, gender,
occupation)
► Select a simple random sample from each subgroup
► Combine samples from subgroups into one
Population
Divided
into 4
strata
Sample
49
Cluster Sampling
► Divide population into several “clusters” (e.g. regions), each
representative of the population.
► Clusters are often naturally occurring groups, such as counties,
election districts, city blocks, households, or sales territories
► One-stage cluster sampling: randomly selected k clusters
► Two-stage cluster sampling: randomly select k clusters and then
choose a random sample of elements within each cluster.
Population in Dis. 10 is
divided into 14 wards
(clusters) => street
(clusters)
50
Sources of Error or Bias
Source of Error Characteristics
51
Exercise
next session.
52