You are on page 1of 19

STATISTICS BUSINESS

Lecture 1

Statistics: the science of collecting, organizing, analyzing, interpreting,


andpresenting data.
Statistic: a single measure, reported as a number, used to summarize a sample
data set.
There are two primary kinds of statistics:
• Descriptive statistics (thong ke mo ta) refers to the collection, organization,
presentation
→ summary of data (either using charts and graphs or using a numerical
summary).
• Inferential statistics (thong ke suy luan) refers to generalizing from a sample
to a population, estimating unknown population parameters
→ drawing conclusions, and making decisions.

- some of the ways statistics is used in business


Auditing Marketing Health Care Quality
Improvement

Purchasing Medicine Operations Product Warranty


Management
Chapter 2: Data collection

An observation is a single member of a collection of items that we want to study,


such as a person, firm, or region. Ex: an employee or an invoice mailed last
month.
A variable is a characteristic of the subject or individual, such as an employee’s
income or an invoice amount.
The data set consists of all the values of all of the variables for all of the
observations we have chosen to observe

- A data set may contain a mixture of data types. Two broad categories are
categorical data and numerical data
TYPES OF DATA SET
● Categorical Data (also called qualitative data) have values that are
described by words rather than numbers
Ex: type, size, classification,...
↬ T/F: not limited statistical use
+ Coding: categorical variable represented using numbers
1 = cash 2 = check 3 = credit/debit card 4 = gift card
+ Binary variables: categorical variables have only two values (using a 1 or 0)
Ex; employment status (e.g., employed or unemployed), (currently married or
not currently married)

● Numerical Data (also called quantitative data) arise from counting,


measuring something, or some kind of mathematical operation
Ex: number of sales, claims,..
+ Discrete: A variable with a countable number of distinct values. số nguyên
(number of)
+ Continuos: A numerical variable that can have any value within an interval. số
thập phân
1. Cross-sectional Data:

+ 1 observation = a different individual unit (e.g., a person, firm, geographic area)


+ Data taken at a given point of time
+ units: individuals, households, firms, cities, states,..
2. Time-series Data

+ Observations on economic variables over time: stock prices, money supply,


CPI, GDP, inflation rates,…
+ Frequencies: daily, weekly, monthly, quarterly, annually
+ Ordering is important here!
+ Behaviour of economic subject (and the resulting indicators) evolve in a gradual
manner in time
+ Lags in economic behaviour (stock prices today affect next month’s actions)
3. Panel/Longitudinal Data
+ Are a collection of cross-sectional data for at least two different
points/periods of time.
+ Unlike with pooled cross sections, the same units are measured over time
Use of double index: it where i = 1,...,n and t = 1,...,T.
+ Disadvantages:
- missing values - for some units and periods there are no data. Maybe balanced
panel data or unbalanced panel data.
- More difficult /costly to obtain the data Have several advantages over (pooled)
cross sections (for problem where panel data make sense)
+ Example: firm performance of 500 Vietnamese listed firms from 2000 to 2019
where all Vietnamese firms were chosen for the sample 2000 and kept fixed for
all subsequent years (T =20,n=500, N=10,000)
4. Pooled cross sections

+ Both cross-sectional and time-series features


+ Data collected in multiple (typically, two) points in time
+ Ordering is not crucial, year is recorded as an additional variable
+ Often used to evaluate the effect of a policy change collect data before and
after the policy change and see how the relationship between the variables
changes
► Note: in the second time period, the cross-sectional units need be neither
distinct from nor identical to those in the first period.
SCALES OF MEASUREMENT

● Nominal: are the same as “qualitative,” “categorical,” or “classifi cation”


data
+ usually code nominal data numerically → no numerical meaning
● Ordinal: ranking (nominal data+order of data)
+ can be treated as nominal → but not vice versa
+ no clear meaning to the distance between 1 and 2, or between 2 and 3, or
between 3 and 4
● Interval: not only is a rank but also has meaningful intervals between
scale points.
+ No true zero value, ratios have no meaning (ko thể nói 4 độ nóng gấp đôi 2 độ)
● Ratio:
+ meaningful zero → ratios of data values are meaningful (e.g., $20 million gấp
đôi $10 million) → (e.g., a company with zero sales sold nothing), yet firms can
have negative profi t (i.e., a loss).
SAMPLING METHOD

Chapter 3: Describing Data Visually


Statistics offers many methods that can help organize, explore, and summarize
data in a succinct way. The methods may be visual (charts and graphs) or
numerical (statistics or tables).
Such data can be discussed in terms of three characteristics: center, variability,
and shape.

● Stem-and-Leaf Plot

+ simple way to visualize small data sets

● Dot Plots

+ the dots are piled up vertically


+ shows variability by displaying the range of the data
+ shows the center by revealing where the data values tend to cluster and where
the midpoint lies.
+ reveal some things about the shape of the distribution if the sample is large
enough.
● Frequency Distribution (bảng phân phối tần suất): a table formed by
classifying n data values into k classes called bins
- sắp xếp dữ liệu thành các nhóm (hoặc khoảng) giá trị có cùng đặc tính và đếm
số lượng mẫu nằm trong mỗi nhóm đó

+ Usually, all the bin widths are the same and their limits cannot overlap
+ Frequencies can also be expressed as relative frequencies or percentages of
the total number of observations.
Ex
Sort raw data in ascending order: 12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38,
41, 43, 44, 46, 53, 58
Find range: 58 - 12 = 46
Select number of classes: 5 (usually between 5 and 15)
Compute interval width: 10 (46/5 then round up)
Determine interval boundaries: 10 but less than 20, 20 but less than 30,..., 60 but
less than 70
● Histogram: is a graphical representation of a frequency distribution (bar
chart)
+ Y-axis: number of data values (or a percentage) within each bin of a frequency
distribution
+ X-axis ticks show the end points of each bin
+ No gaps between bars

● Orgive: line graph of the cumulative frequency


● Scatter plot

+ create scatter plots to investigate the relationship between two variables


+ only provide direction of relationship
⇨ NO CAUSE AND EFFECT RELATIONSHIP (ko có mối quan hệ nhân quả, ex. Cái
này tăng nên cái kia cũng tăng)

Chapter 4: Descriptive Statistics


Descriptive measures derived from a sample (n items) are statistics, while for a
population (N items or infinite) they are parameters. For a sample of numerical
data, we are interested in three key characteristics: center, variability, and
shape

Measures of center
Mean ( giá trị trung bình)
● Measures of Variability
Standardized Date
+ main uses is to gauge the position of items within a data array
Chebyshev’s Theorem The Empirical Rule

any population normal distribution

● Z-score

Lưu ý:
● CORRELATION AND COVARIANCE

The formula for the sample correlation coefficient is:

The formula for the sample covariance is:

● Grouped Mean and Standard Deviation


Each interval j has a midpoint mj and a
frequency fj . We calculate the estimated mean by multiplying the midpoint of
each class by its class frequency, taking the sum over all k classes, and dividing
by sample size n.
● SKEWNESS AND KURTOSIS
Chapter 5: Probability

+ P(A) = 0: cannot occur (e.g., a naturalized citizen becoming president of the


United States)
+ P(A) = 1: certain to occur (e.g., rain occurring in Hilo, Hawaii, sometime this
year)
+ all simple events must sum to 1

RULES OF PROBABILITY

● Complement of an Event (biến cố đối)


● Union of Two Events (hợp)
+ A ∪ B or “A or B”

● Intersection of Two Events (giao)


+ A ∩ B or “A and B”
+ The probability of A ∩ B is called the joint probability and is denoted P(A ∩ B )
● Mutually Exclusive Events (biến cố xung khắc)
+ giao nhau bằng rỗng
+ Events A and B are mutually exclusive (or disjoint) if their intersection is the
empty set (a set that contains no elements)

Conditional Probability

INDEPENDENT EVENTS

CONTINGENCY TABLES
A contingency table is like a frequency distribution for a single variable, except
it has two variables (rows and columns).
● Marginal Probabilities
● Joint Probabilities
● Conditional Probabilities
TREE DIAGRAMS
+ Events and probabilities can be displayed to help visualize all possible
outcomes.
BAYES’ THEOREM

COUNTING RULES

● Factorials (hoán vị)

● Permutations (chỉnh hợp)

● Combinations (tổ hợp)

You might also like