You are on page 1of 2

BI-BS-BM Together we use the concept of statistics, methods of managing supply chain, and information technologies to collect and

analyze data to
help make better business decisions.

PPAR-Plan to apply a tool for the business question. 2. Perform statistical analysis 3. Analyze and evaluate results. 4. Reflect results to gain insights
5. Then, we go back to the first step to plan the next step. Each time we repeat this process, we gain a little more insight, and we repeat it until we
have sufficient results to tell a story.

We take random samples, so they have equal chances of being selected, and we can better estimate their corresponding population parameters.

The central limit theorem is the distribution of sample means approaches a normal pattern as the sample size gets larger

“All models are wrong, some models are useful” Every model is wrong because it’s a simplification of reality. Sometimes, there’s not enough
details to make conclusion, but sometimes they are useful because they help us explain or predict the various components.
Big Data Companies collect personal info, demographic info, and shopping habits from us, through our purchases, internet, mailbox, etc., and build
a big data warehouse. After that, their business analytics analyze the data and predict what we desire to accomplish efficient marketing.

Napoleon he puts many info in one single map, and the map is clear and easy to understand. The map successfully depicted the battle with the
number of soldiers, longitude, latitude, time schedule, temperature, advance and retreat routs, etc. in a 2D graph.

Data dashboards provide at-a-glance views of key performance indicators of an objective or business. It provides visual answers. It must be easy to
read, colored, with numbers and graphics. The data must perform in a right tool to have the data dashboards meaningful.

Nominal: items that are differentiated by a simple naming system. You may have numbers assigned to them, but they are categorical and can’t do
arithmetic with them. Ordinal: the position or the scale of items. It is often defined by assigning numbers to them to show their relative position,
but they are categorical and can’t do arithmetic with them. Continuous: a measurement, a scale which numbers can be compared as multiples of
one another. They measure quantities and can be measured.

Correlation is not Causation.


box-plot median=(Q3-Q1)/2 the middle age, middle line in the box. Mean=sum/number, average age, diagram long line. Each quartile contains
about 25% data. IQR the box at the middle, about 50% population. Q3-Q1=IQR=50% ppl age range. Mode=most occurring number. Range=most ppl
lies in this range. Box width=sample number. Outliers=very few exceptional ages. Draw conclusions.
Histogram frequency distribution of xxx. No age under 0. Standard Deviation 誤差 whether the data are close to the average or spread out.
1 1
Popu SD: σ = √ ∑N
i=1(xi − μ)
2 Sample SD: s = √ ∑ni=1(xi − μ)2
N n−1

Mosaic Plot % in col and row. 0 不再上下 95%的機會內

∑(x−x̅)(y−y̅) ̂−y)2
∑(y SSE
Y=β0 (Estimate 下的 intercept)+ b1 (estimate 下的 x)x; b1 = ; b0 =ŷ − b1 x; Stand Error = RMSE = se = √ = √MSE = √
∑(x−x̅)2 n−2 n−2

SSR ∑(y ̅)2


̂−y
R2 = = ;% of variation in y explained by x. How well fit line predicted tips by bills.
SST ̅)2
∑(y−y

(1−R2 )(n−1)
R2a ⅆj =1 − ( ) ; 同上 but more about how useful variable are to model.
n−k−1

Least Square=minimizing the sum of Square Error to find a fit line. RMSE smaller=smaller error

b1 −0
T-test: H0 : β1 = 0 H1 : β1 ≠ 0; β1 = x 旁數字; t = = t-Ratio 數字; t-ratio big, p-value small, Reject H0 , Claiming no rela.
(sb1 )=(Stanⅆarⅆ Error 下的 X)
Betw x & y.

F-test: H0 : β1 = 0 H1 : β1 ≠ 0; F=MSR/MSE=(SSR/1)/(SSE/n-2) = F Ratio 的數字;F 大 p 小, Reject H0 , Claiming no rela. Betw x & y.

Confidence Interval=a range of value, whether 95 % of the CI catches μ population true mean. Why not 5%? Because we’re doing a sample
σ σ
simulation. CI = x̅ ± z0.025 =x̅ ± 1.96 套 standard dev 和 n; z0.025 =1.96; 95%; α = 0.05
√n √n

better alternative of p-values are t-value, z-value, R^2, or you can compare Pvalue with alpha. Direct comparison has its advantages in relativism.
Pvalue indicate how incompatible the data are and don’t measure the probability that the hypothesis is true, it don’t measure an effect or result.

CLT-SD known,distribution of sample mean becomes normal as n gets larger. ŷ = y̅


只在交叉點上 Estimate Std T Ratio Probt
(slop) Error
DF S of S Mean S F-Ratio Intercept b0 / =
Model K SSR MSR MSR/MSE X1 x1 / =
X2 x2 / =
Error n-(k+1) SSE MSE=SSE/n-(k+1) Prob>F
C Total n-1 SST <.0001
n n n
SST=SSR+SSE=∑i=1(yi − y̅)2 = ∑i=1(yi − ŷ)2 + ∑i=1(ŷi − y̅)2 ŷ=b0 + b, x1 + b2 x2 + b3 x3

You might also like