You are on page 1of 4

1.

Stem-and-Leaf displays: Indicate the units for stems and leaves


2. Dot Plot: data is small or there few distinct data values have ties.
3. Examining Distributions:
a. Mean is non-resistant, it is influenced by very large or very small data points that are extreme values for data set.
b. Median - The median is resistant (robust) to the extremes in the data set. Extremely large or small values do NOT influence the median.
c. Median: largest value can be infinity, smallest can be (largest-median+1)
d. Mean and median provide measures of location (center). One also needs some measures of variability to further describe the spread of the data set.
e. Quartiles - When n is odd include the median in both halves
f. The Interquartile Range, is distance between the first and third quartiles = 50%
g. An observation is a suspected outlier if it falls more than 1.5*IQR from the closest fourth. An outlier is extreme if it is more than 3*IQR from the nearest
fourth, and it is mild otherwise.
4. Box plot
a. The lower (left) edge of the rectangle is at the lower fourth, and the upper (right) edge is at the upper fourth.
b. The horizontal (vertical) line segment inside the rectangle at the location is the median.
c. The “whiskers” out from either end of the rectangle to the smallest and largest observations that are NOT outliers.
d. dots to represent outliers.
e. Left skewed: Mean < Median – data towards right
f. Right skewed: Mean > Median – data towards left
5. Standard deviation
a. The variance and standard deviation are measures of spread that indicate how far values in the data set are from the mean, on average.
b. The deviations (xi-x) display the spread of xi about their mean x
c. The sum of the deviations is always 0, as some of the deviations are positive and others are negative.
d. Squaring the deviations makes them all positive. Observations far from the mean will have large positive squared deviations.
e. The variance is the ‘average’ squared deviation.
f. The more spread out the data, the greater the standard deviation.
g. s is always positive.
h. s has the same unit of measurement as the original data
6. Permutation: p = n!/(n-k)!
a. How many different batting orders are possible for a baseball team consisting of 9 players? – 9!
b. 10 books on her book- shelf. 4 are mathematics books, 3 are physics books, 2 are literature books and 1 is a language book. arrange her books so that all the
books dealing with the same subject are together on her shelf. How many different arrangements are possible? - 4!4!3!2!1!
c. How many different letter arrangements can be formed from the letters PEPPER? – (10!/4!3!2!1!)
7. Combination: n!/[k!(n-k)!]
a. A committee of 3 is to be formed from a group of 20 people. How many different committees are possible? – 20!/(3!17!)
b. From a group of 5 women and 7 men, how many different committees consisting of 2 women and 3 men can be formed? What if 2 of the men are feuding
and refuse to serve on the committee together? – [5C2][7C3] –[2C2][5C1][5C2]
8. Probability
a. The law of large numbers says that the long-run relative frequency of repeated independent events gets closer and closer to the true relative frequency as the
number of trials increases.
b. P(AUBUC) = P(A) + P(B) + P(C) – P(A∩B) – P(A∩C) – P(B∩C) + P(A∩B∩C)
c. Suppose you have 999 regular coins and 1 two-headed coin in a bag. And you randomly draw one coin out of the bag and put it on the table and you see a
head.
d. Bayes Rule: p(A|B) = [P(B|A) P(A)]/ [P(B|A) P(A) + P(B|A’) P(A’)]
e. P(A)= 1/1000, P(A’)=999/1000, P(B|A)=1, P(B|A’)=0.5
f. Conditional Probability: p(A|B) = P(A∩B)/ P(B)
g. P(A∩B∩C) = P(C|A∩B) P(B|A) P(A)
h. Two events A and B are independent, if and only if P(A∩B) = P(A) • P(B).
i. The Law of Total Probability states, Let A1, ..., Ak be mutually exclusive and exhaustive events. Then for any other event B, P(B) = P(B|A1)P(A1) + ... +
P(B|Ak)P(Ak) = ∑ P(B|Ai)P(Ai)
9. DISCRETE Random Variables
a. PMF: The probability distribution or probability mass function (pmf) of a discrete rv is defined for every number x by p(x) = P(X=x) = P(all sЄS: X(s)=x).
b. Bernoulli: Any random variable whose possible values are only 0 and 1 is called a Bernoulli random variable.
c. CDF: The cumulative distribution function (cdf) F(x) of a discrete rv variable X with pmf p(x) is defined for every number x by

i. Step function graph


d. The expected value of a random variable describes its theoretical long-run average value.
i. E(aX + b) = a • E(X) + b
e. Variance of a rv X is a measure to describe the spread/variability/dispersion of its distribution. Let X have pmf p(x) and expected value μ. Then the variance
of X denoted by Var(X) or σX2, or just σ2,

i.
ii. Var(X)=σ2 =[∑x2 •p(x)]–μ2 =E(X2)–[E(X)]2
iii. Var(ax+b) = a2Var(x)
10. Bernouilli
a. The n trials are identical.
b. The trials are independent (the outcome on any particular trial does not influence the
a. outcome on any other trial).
b. Each trial has two possible outcomes: success or failure.
c. The probability of success, denoted by p, is the same for each trial (identical).
d. If in Bernoulli trials, the number of trials n is fixed in advance of the experiment. This experiment is called a binomial experiment.
11. Binomial
a. P(X ≤ x) = B(x; n, p) =b(y; n, p) from y = 0 to n

b.
c. E(x) = np, Var(x)= np(1-p)
12. Hypergeometric Distribution: If X is the number of successes in a completely random sample of size n drawn from a population consisting of M successes and
(N – M) failures, then the distribution of X is given by

a.
b. n = sample size (10 in socks example).
c. M = total number of successes in the population (34 in socks example).
d. N = total number of individuals in the population (50 in socks example).
e. We wish to obtain P(X=x) = h(x; n, M, N).

f.
g. E(X) = n(M/N)
h. Var(X) = (N – n)/(N – 1) n (M/N) (1 – M/N)
i. Connect to Binomial: let M/N = p
i. Notice that if we fix n, and let N be sufficiently large, Var(X) → np(1 – p) which is the variance of a binomial rv. This is the reason why we can use
a binomial model to approximate hypergeometric when population is large.
ii. (N – n)/(N – 1) is often called finite population correction factor
13. Geometric Model
a. a series of independent Bernoulli trials are performed until we get a success for the first time.
b. P(X=n) = (1 – p)n p
14. A negative binomial X clearly has two parameters r and p, thus we denote its pmf as nb(x; r, p).
a. Geometric model is a simple model which can be used to model many things, such as, the first time we observe a head in coin tosses; the first time a girl is
born in the family; etc.
b. We can further generalize the model by letting X to be the number of failures that precede the rth success. X can be any number in the set of {0, 1, 2, ...}.

c.
d. E(X) = r(1 – p)/p, Var(X) = r(1 – p)/p2
15. Poisson distribution
a. A very important application of the Poisson distribution arises in connection with the occurrence of events of some type over time. For instance, visits to a
particular website; some kind of pulses recorded by a counter; accidents in an industrial facility; customers going to a particular ATM machine; customers
coming to a particular store.

b.
c. E(X) = Var(X) = λ
d. The number of events during a time interval of length t is a Poisson rv with λ=αt, if the following three assumptions are satisfied:
i. The probability of exactly one event is received for a short time interval is fixed (constant).
ii. The probability of more than one (2 or more than 2) events being received for a very short time interval is almost 0.
iii. The number of events received during any time interval is independent of the number received prior to this time interval.
e. A typesetter, on the average makes one error in every 500 words typeset. A typical page contains 300 words. What is the probability that there will be no
more than two errors in five pages: Use the Poisson approximation with λ = np = 1500/500 = 3.

f.
16. Continuous Random Variables
a. P(X > a) = 1 - F(a)
b. P(a ≤ X ≤ b) = F(b) - F(a)
17. Uniform Distribution

a.
b. Suppose a bus arrives equally likely at any time between 7:00 – 7:05 AM. What is the probability it arrives sometime between 7:00 – 7:02 AM?

c.
d. E(X) = (B + A)/2, Var(X) = (B - A)2/12
18. PMF: The probability density function (pdf) of a continuous rv X is a function f(x) such that for any two numbers a and b with a

19.
a. The graph of f(x) is often referred to as the density curve.
b. Note that 0 ≤ f(x) for all x.
c. f(x)dx can be treated as P(X=x)!
d. The total area under the curve must be 1

e. Mean:
f. Variance: Find E[x2]-E[x]2
20. The Exponential Distribution

a.
b. E(X) = 1/λ, Var(X) = 1/λ2
Bus:
c. P(X ≥s+t|X≥s)=P[(X ≥s+t)∩(X≥s)]/P(X≥s) = P(X ≥ s + t)/P(X ≥ s)= e-λt
21. The Normal Distribution
a. Normal distribution is a bell-shaped, single peaked and symmetric distribution.
b. All normal models have the same shape and the same area within x standard deviations of its mean.

c.
d. 68% of the students scored between x±σ
e. 95% of the scores were between x±2*σ
f. 99.7% of the scores were between x±3*σ
g. Z = (X – μ) / σ
h. Binomial Approx: As we see, as n becomes larger and larger the pmf of the binomial becomes more bell-shaped and more symmetric. Thus a normal
distribution can be used to approximate the binomial when n is large.
i. “+0.5” is called “continuous correction”.
j. Za denotes the value of the z axis for which a of the areas under the z curve lies to the right of a.
k. I-a lies to the left. Za is the 100(1-a)th percentile of normal dis
2. Joint Distribution
a. The joint probability mass function p(x, y) is defined for each pair of numbers (x, y) by p(x, y) = P(X=x, Y=y).
b. We must have p(x,y)≥0 and ∑(x)∑(y)p(x,y) =1
c. Remarks
i. In the continuous case, roughly speaking, f(x, y)dxdy can be treated as P(X=x,Y=y).
ii. As in the discrete case, fX(x) and fY(y) calculated from the joint distribution are automatically proper pdf’s.
iii. Marginal distributions are, in fact, the distributions of the marginal random variables when they are treated as univariate random variables.
iv. This is the example that different joint distributions may have the same marginal distributions.
v. We say two random variables X and Y are independent if and only if P(X=x, Y=y) = P(X=x) P(Y=y), for any x and y.
vi. More specifically, two random variables X and Y are said to be independent if for every pair x and y values,
1. p(x, y) = pX(x) pY(y), when X and Y are discrete;
2. f(x, y) = fX(x) fY(y), when X and Y are continuous.
vii. The random variables X1, X2, ..., Xn, are said to be independent if for every subset Xi1, Xi2, ..., Xik, of the variables (each pair, each triple, and so
on), the joint pmf or pdf of the subset is equal to the product of the marginal pmf’s or pdf’s.
d. Conditional
i. Using the marginal distributions, one can calculate the conditional distribution of one rv given the other.
ii. Let X and Y be two conditional rv’s with joint pdf f(x, y) and marginal X pdf fX(x). Then for any X value x for which fX(x)>0, the conditional
probability density function of Y given that X=x is ……
iii. If X and Y are discrete, replace pdf’s by pmf’s in this definition gives the conditional probability mass function of Y when X=x.
3. Covariance
a. A popular measurement to characterize the dependence of two rv’s is called correlation. To calculate correlation of two rv’s, we’ll have calculate the
covariance of the two rv’s.
b. The covariance between two rv’s X and Y is

c.
d. Cov(X, Y) = E(XY) – E(X)E(Y)

e.
f. Correlation Coefficeient

i.
g. If X and Y are independent, then ρX,Y = 0 (why?). But ρX,Y = 0 does NOT imply independence.
i. ρX,Y =1or-1iffY=aX+b for some numbers a and b with a not equal to 0.

You might also like