Professional Documents
Culture Documents
•We can use parametric statistics on data that follows a specific distribution,
otherwise we need to use nonparametric stats.
Normal Distribution:
– Also called the Bell curve, Gaussian Distribution,
Standard curve
– Most important distribution in statistics:
• Random errors are additive.
– We will refer to this curve a lot
– It is characterized by its mean (μ) and standard
deviation (σ). It is a continuous function given by the
Equation:
• Examples of normal distributions:
– For example the long-term return rate on a stock investment can be considered to be
the product of the daily return rates.
• Log-normal distributions are also particularly common when mean values are low,
variances large, and values cannot be negative
– E.g. species abundances
– Distribution of mineral resources in the earth’s crust
•Examples of lognormal distributions:
– A discrete distribution where the probability of an event occurring is rare and random • e.g. radioactive
decays per hour, wells per square mile etc.
• where P is the probability of r events occurring if the average number of events per unit of time or area is X .
• The Poisson distribution expresses the probability of a number of events occurring in a fixed
period of time if these events occur with a known average rate and independently of the time since
the last event.
• The Poisson distribution can also be used for the number of events in other specified intervals
such as distance, area or volume.
• The Poisson distribution can be applied to systems with a large number of possible events, each of
which is rare.
– A classic example is the nuclear decay of atoms.
• Properties:
– The expected value of a Poisson-distributed random variable is equal to X and so is its variance.
• The Poisson distribution is sometimes called the law of small numbers because it is the probability
distribution of the number of occurrences of an event that happens rarely but has very many
opportunities to happen
• Examples of Poisson distribution:
• Note how all of these are discrete and usually rare events.
Statistics and Statistical Parameters
• A sample is a subset of a population.
• We collect a sample and calculate statistics from it to infer parameters of the population.
• Generally we want to know:
– central tendency, dispersion (or spread), symmetry and shape
• Moment: is a way that the four values are related
Measures of Central Tendency
Other types of means:
– As shown on the page before, for all data sets containing at least one pair of non-equal
values, the harmonic mean is always the least of the three means, while the arithmetic
mean is always the greatest of the three and the geometric mean is always in between.
– (If all values in a nonempty dataset are equal, the three means are always equal to one
another; e.g. the harmonic, geometric, and arithmetic means of {2, 2, 2} are all 2.)
– Since the harmonic mean of a list of numbers tends strongly toward the least elements
of the list, it tends (compared to the arithmetic mean) to mitigate the impact of large
outliers and aggravate the impact of small ones.
skewness
skewness is a measure of the asymmetry of a distribution:
– negative skew: The left tail is longer; the mass of the distribution is
concentrated on the right of the figure. The distribution is said to be left-
skewed.
• The mean is lower than median which in turn is lower than the mode (i.e.;
mean < median < mode);
• The skewness coefficient is lower than zero
– positive skew: The right tail is longer; the mass of the distribution is
concentrated on the left of the figure. The distribution is said to be right-
skewed.
• Mean is greater than median which is greater than the mode (i.e.; mean >
median > mode)
• The skewness coefficient is greater than zero. In a skewed (unbalanced,
lopsided) distribution, the mean is farther out in the long tail than is the
median. If there is no skewness or the distribution is symmetric like the
bell-shaped normal curve then the mean = median = mode.
Distribution Types
Probability
• To understand why data are distributed the way they are and why data fall in specific types, a
review of probability is necessary.
• Basics on Probability:
– For independent sampling: That is, the sample that I choose is not related to the previous sampling
(true for coin flipping, dice rolling, not true for choosing marbles out of a bag)
– If there are N equally likely ways for an event to occur and there are M desired events, then the
probability (p) of the desired events occurring is M/N.
– For a coin, there are 2 (N = 2) possible outcomes. If one of the outcomes is desired (e.g. heads)
then M = 1, and the probability of getting that outcome is M/N = 1⁄2.
– For a die, there are 6 possible outcomes. If the desired outcome is a 5 or 6 (M = 2), then the
probability of getting those numbers is M/N = 2/6 = 1/3.
– The probability of something happening (p) is always
Basics of Probability:
– Again for a coin: pheads + ptails = 1, or for a die, p1 + p2 + p3 + p4 + p5 + p6 = 1
– If the probability of an event occurring is p, then the probability of that event not
occurring is 1 – p = q
– Let’s look at an unbiased sample of a manufacturing process:
• In this process, for every 5 parts made, one is defective.
• The probability of an acceptable part (A) is 4/5, the probability of a defective part (D) is 1/5.
• If I select a part at random, the probability that I get an A is 4/5, and for D is 1/5
Basics of Probability:
– How do probabilities change with a second draw?
Basics of Probability:
Basics of Probability:
Probability – Binomial Equation
Probability – Binomial Equation
Probability – Binomial Equation
Probability – Binomial Equation