Professional Documents
Culture Documents
Table for marginal rates/proportions Marginal Rate Conditional Rate Joint Rate
Outcome/ Success Failure Row Total Rate(Y) = 350/1050 = Rate(success|X) = 542/700 Not conditional rate!
Treatment 33.333% = 77.44% Rate(Y and failure) = 61/1050
X 542 158 700 Rate(success) = P(A|B) = 5.81%
Y 289 61 350 831/1050
Column Total 831 219 1050 = 79.1%
3. Strength
Shape: compare variability from max-median to
median-min. skewed right if lower half has less correlation
variability than upper half & vice versa r = -0.75 (before)
Center: can deduce median value r = 0.01 (after) presence
Spread: IQR gives an idea of the spread for middle of outlier increases
50% of dataset strength of correlation
1. All are right 3. Spread of distribution: range & standard Correlation coefficient compared to when it is
skewed, deviation → Measure of linear association removed
variability: P1 Higher variability → wider → range is between -1 and 1
> P2 > P3 range spread → summarizes direction & strength of Linear Regression
2. IQR lowest s = 1.69 low variability linear association Slope of regression line
in P1, followed
by P2 then P3
3. More outliers in P1 and P2 compared to P3 s = 4.30 high variability
Histogram Boxplot 4. Outliers
Histogram provides better sense More useful to → Useful in identifying strong skew
of distribution shape, when there compare distribution → Identify possible data collection or data & correlation coefficient
𝑠𝑦
are great differences among the of different datasets, entry errors is related by 𝑚 = 𝑟.
𝑠𝑥
frequencies of datapoint and can identify → Provide interesting insight into the data
outliers clearly → shouldn’t be removed unnecessarily r-value is not affected by interchanging 2 If CC r is +ve, gradient
Don’t give any variables, adding/multiplying a number of RL also +ve,vice versa
information as to how to all values of a variable.
many datapoints it has r-value only measures linear association CC not necessarily equal
compared to to gradient of RL
histogram – 4 diff ASSOCIATION =/ CAUSATION
datasets can have the
same boxplots
Conditional probability & Independence Conditional Probabilities as Rates
A and B are
independent
events
whenever A
and B are not
associated
with each
other
Random variables
CI = population proportion + random error Only done when we have sample data! Both H0 & H1 must be mutually exclusive.
Larger sample size, smaller random error, narrower CI