Professional Documents
Culture Documents
1. Notation
2. Mean Values
• Median of a continuous variable X in a frequency distribution
n 1
2
− Ni−1 2
− Fi−1
mc = xi−1 + (xi − xi−1 ) = xi−1 + (xi − xi−1 )
ni fi
where
⋆ xi−1 is the lower bound of the median class
⋆ (xi − xi−1 ) is the width of the median class
⋆ ni is the frequency of the median class
⋆ Ni−1 is the cumulative frequency of the class that comes before the median class
• Arithmetic mean of a variable X in a data set.
Pn
j=1 xj
x̄ =
n
• Arithmetic mean of a variable X in a frequency distribution
Pk k
i=1 xi ni
X
x̄ = = xi f i
n i=1
1
2 PROF. ANGELA MONTANARI
(3) P
Minimum sum of squared deviations, of all the values of x, from their arithmetic mean:
n 2
j=1 (xj − x̄) = min
(5) Equivariance of the arithmetic mean with respect to linear transformations (translations a and
scale changes b):
x∗j = a + bxj ⇒ x̄∗ = a + bx̄
4. Variability Measures
Categorical variables
• Gini heterogeneity measure E1 = 1 − ki=1 fi2 (minimum value 0, maximum value 1 − 1/k)
P
Numeric variables
• Range: xmax − xmin
• Interquartile range: Q3 − Q1
• Sum of squared deviations from the mean of variable X in a data set (total sum of squares):
Xn n
X
Dev(X) = T SS = (xj − x̄)2 = x2j − nx̄2
j=1 j=1
• Sum of squared deviation from the mean of a continuous variable X (coded into classes) in a
frequency distribution
k
X k
X
2
Dev(X) = T SS = (x̂i − x̄) ni = x̂2i ni − nx̄2
i=1 i=1
• Coefficient of Variation
sx
CV =
x̄
• Gini’s concentration ratio
If x(1) + x(2) + · · · + x(j) is the amount of variable X own by the j poorest units, q(j) = (x(1) +
x(2) + · · · + x(j) )/nx̄ is the corresponding proportion of the total amount. Denoting by pj = j/n
the comulative relative frequency of the first j units Gini’s concentration ratio is defined as
Pn−1
j=1 (pj − qj )
R= Pn−1
j=1 pj
G
X
Dev(X)Between = BSS = (x̄g − x̄)2 ng
g=1
where ng and s2g are the size and the variance of the g-th group respectively.
6. Association Measures
• Chi-Squared
u X v
2
X (nih − n∗ih )2
χ =
i=1 h=1
n∗ih
• Tchuprov
s
2 χ2
T = p
n (u − 1)(v − 1)
• Eta-Squared
Dev(Y )Between Dev(Y )W ithin
η2 = =1−
Dev(Y ) Dev(Y )
u X
X v
Dev(Y )W ithin = W SS = (yi − ȳh )2 nih
i=1 h=1
v
X
Dev(Y )Between = BSS = (ȳh − ȳ)2 n0h
h=1
u X
X v
Dev(Y ) = T SS = (yi − ȳ)2 nih
i=1 h=1
4 PROF. ANGELA MONTANARI
T SS(Y ) = M SS + RSS
• Coefficient of determination
M SS RSS
R2 = =1− = r2
T SS T SS