Unit 1 - Probability

R.
MOSCONI, BUSINESS STATISTICS, UNIT 1: PROBABILITY Page |1
Course: Business Statistics

Professor: Rocco Mosconi
UNIT 1: Probability
This version: October 22, 2019
Table of Contents
1. Random variable ......................................................................................................... 2
2. Probability ................................................................................................................... 2
3. Probability distribution ................................................................................................ 3
4. Expected value of a random variable .......................................................................... 4
5. Variance of a random variable..................................................................................... 5
6. The binomial distribution ............................................................................................ 7
7. Probability density for continuous random variables .................................................. 9
8. The Normal distribution ............................................................................................ 11
9. An illustration of the normal distribution .................................................................. 15
10. The Exponential distribution ..................................................................................... 18
11. The Lognormal distribution ....................................................................................... 19
R. MOSCONI, BUSINESS STATISTICS, UNIT 1: PROBABILITY Page |2
1. Random variable
This unit illustrates some basic concepts of probability theory. Probability theory is a branch
of mathematics and deals with mathematical models of uncertainty. In particular, we will focus on
the concepts of random variable and probability distribution.
In essence, a Random Variable is a variable that can take on different values according to the
outcome of a random event. In other words, it can be thought of as the result of an experiment that
is associated with uncertainty. Random variables may be divided into two groups:
 Discrete random variables: these variables can take on a countable number of distinct values
 Continuous random variables: these variables can take on an uncountably infinite number of
possible outcomes
The most classic examples of discrete random variables are
 tossing a coin (two possible outcomes: head or tail)
 throwing a six‐sided dice (six possible outcomes: 1, 2, 3, 4, 5, 6)
In economics we observe many events characterized by uncertainty, whose outcomes can
therefore be considered as random variables. Let's look at some examples of discrete variables:
 a real estate agent proposes a property to a potential customer (two possible outcomes:
success, i.e. the customer buys or failure, i.e. the customer does not buy)
 a car dealer receives 5 customers every morning: how many of them will buy a car? (six possible
outcomes: 0, 1, 2, 3, 4, 5)
Now let's consider an example of a continuous random variable:
 an investor buys an asset with the purpose to sell it after one year: what will be the return?
(possible outcomes: infinitely many, represented by the set of real numbers. A possible outcome
is for example ‐0.0235, that is a loss of 2.35%)
2. Probability
Considering a random event, it is interesting to evaluate the probability associated with
different outcomes. The concept of probability is not easily defined, and there are different schools
of thought (objectivist, subjectivist, axiomatic).
The objectivist perspective considers probability as an intrinsic property of the random
event, which is independent of the observer. For example, it is reasonable to assume that the
probability of a head when throwing a fair coin is equal to 50% (classical definition); if instead the
coin is unfair, the probability could be determined by throwing the coin many times and evaluating
the percentage of successes (frequentist definition).
The subjectivist perspective stems from the remark that often the experiment cannot be
repeated under the same conditions, and therefore what we define probability simply reflects the
state of our information (subjectivist definition). For example: the probability that a certain team
wins a certain match can be evaluated differently by subjects with different information regarding
the shape of the players, and can be completely different to someone who paid the referee to favor
one of the two teams.
Whichever point of view you assume about probability, from a mathematical viewpoint
there is complete agreement regarding the axioms that probability must fulfill (axiomatic
definition):
 The probability of an elementary event (outcome of the experiment) must be a positive real
number
 The probability that at least one of the elementary events will occur (certain event) must be
equal to 1
 The probability of the union of two mutually exclusive events must be equal to the sum of the
probabilities of the single events
3. Probability distribution
Let’s now define the concept of probability distribution (or probability function). We define
as a “probability distribution” a rule that associates a probability to every possible value of the
random variable.
This rule can be represented by a table or by a mathematical formula. For example, suppose
you have a binary random variable (only two outcomes): for example, the sex of the first person I
meet when I open the door. Without loss of generality, I could code the two elementary events as
“0”, equal to male, and “1” equal to female. Denoting the random variable with the capital letter X,
and with the corresponding lowercase letter the value that the random variable can assume, we
can denote by “p of x” the probability that the random variable “capital X” takes on the value
“lowercase x”. For example, if we subjectively considered that the probability of meeting a female
is equal to 70%, we could represent the probability distribution through the following table:
In the first column we report the two values that the random variable can take, while in the
second column we report the corresponding probabilities, that is: 70% is the probability to meet a
woman and 30% is the probability to meet a man. Equivalently, instead of the table we could use a
mathematical formula:
“P of x” is equal to “0.7 to the power x”, times “(1‐0.7) to the power (1‐x)”, where x can be
equal to 0 or 1.
Applying this formula for x = 0 and x = 1, given that any number raised to the power of 0 is
equal to 1, we obtain that the probability of 0 is equal to “0.7 to the power 0” times “0.3 to the
power 1”, that is 0.3, while the probability of 1 is equal to “0.7 to the power 1” times “0.3 to the
power 0”, that is 0.7.
The formula and the table are therefore equivalent. The formula is definitely more
convenient when there are many possible results, even more so when they are infinitely many.
In the previous example we assumed that the probability that X is equal to 1 is equal to 70%.
To generalize, we could associate the probability with a symbol rather than a number, whose value
can be specified depending on the application. For example, we could define the probability that X
is equal to 1 by π, where π represents any real number between 0 and 1.
We can now represent our probabilistic model by this table
where the first column reports all possible values and the second column reports the
associated probabilities. Again, instead of the table, we can use a formula
where “p of x” is equal to “π to the power x” times “(1‐π) to the power (1‐x)”; x is 0 or 1,

and π is a real number between 0 and 1. This mathematical model is called Bernoulli Distribution;
π is a "parameter", and it can assume different values, corresponding to different situations:
in essence, the model represents a family of probability distributions, whose individual elements
correspond to the infinite possible values of the parameter π.
In general, any probability function contains some parameters, whose specific value varies
from time to time depending on the application of the model.
The value of the parameters is sometimes fixed on the basis of a subjective evaluation, but
sometimes it is based on an analysis of empirical data related to the phenomenon of interest.
4. Expected value of a random variable

Given a random variable and its probability distribution, we can introduce some interesting
concepts, for example the concept of "expected value".
The expected value is, in essence, a sort of average of all the values that the random variable
could assume, where each value is weighed for its probability. Therefore, the higher the probability
of a result, the more it counts when calculating the weighted average. If we define by “capital X”
the random variable, the expected value is usually indicated with “E of X”, where E stands for
Expected Value.
For discrete random variables, the expected value is calculated as follows:
where this symbol sigma indicates the sum of all the values that the random variable can
assume. Each value is multiplied by its probability, and therefore the formula represents a weighted
average. For example, if you had two possible values and each of them had 50% probability, the
expected value would be, trivially, the average of the two possible values.
Referring to the Bernoulli distribution, the application of this formula gives the following
result:
therefore, if we consider a binary variable, the expected value is equal to π, i.e. the
probability associated with the value 1, which we sometimes call "success".
Note that the expected value is not necessarily one of the admissible values of the random
variable. In fact, this variable can only take on value 0 or 1; nevertheless, the expected value is π. If,
for example π were 0.7, the expected value would be 0.7, which of course does not mean that I
expect the person I meet to be 70% woman and 30% man: I just expect it to be a woman with a
higher probability, and a man with lower probability.
The term "expected value" is sometimes replaced by the term "average", which actually
would not be appropriate. However, the term “average” suggests the idea, in fact correct, that if I
replicate this experiment many times, and calculate the average of the results, I would expect to
find 0.7, i.e. 70 percent of women and 30 percent of men. Therefore, the expected value and
average are fairly related concepts.
5. Variance of a random variable

Let's now define another Indicator of the probability distribution, that is: the variance.
Referring to a random variable, the variance is the expected value of the squared distance from the
expected value, i.e.:
So, for any value that X can take, we calculate the distance from the expected value “E of X”,
and we raise it to power 2 in order to remove the sign. In this way we obtain a new random variable
“X minus E of X squared”. The probabilities associated with this random variable are the same as
those of the random variable X.
We then calculate its expected value by summing all the possible values weighed by the
respective probabilities. The variance is therefore an expected squared distance with respect to the
expected value, i.e. a weighted average of the squared distances. The greater the distances
(regardless of their sign), the greater the variance.
Let's try to apply the definition of variance to the Bernoulli distribution.
Recall that for the Bernoulli distribution, the expected value “E of X” is equal to π, the
probability of 0, “p of 0”, is equal to (1‐π), while the probability of 1 is equal to π, therefore we
obtain this result:
Starting from this formula, with a few steps it is easily obtained that the variance of X is equal
to π times (1‐ π).
This result is quite easy to interpret.

If the probability of success π was zero, applying the formula we would obtain a variance
equal to zero: in fact, if π were zero, performing the experiment we would always get 0, i.e. there
would be no variability: it is therefore natural that the variance is zero.
Likewise, if the probability of success π was equal to one, applying the formula we would
obtain again a variance equal to zero: in fact, performing the experiment we would always get 1, so
even in this case there would be no variability of the results.
Following the reasoning, if π was very close to zero (or very close to 1), the variability of the
results would be low, and therefore the variance would be close to zero.
The maximum variability of the results would be when π is equal to 1/2.
The graph shows how the variance of a binary random variable varies as a function of π.
6. The binomial distribution

We now illustrate the Binomial probability distribution. The Binomial distribution is
appropriate to represent the probabilities associated with a specific type of random variable: the
sum of successes when we replicate a binomial experiment n times.
As an economic example of reference, we could think of a car dealer who, in one day,
proposes the purchase of a car to 10 potential buyers (n is equal to 10): how many cars will he be
able to sell?
We define by “X‐i” the random variable "outcome of the i‐th proposal".
the outcome of each proposal is random and binary: 1 if the sale is successful, 0 otherwise.
We could therefore represent X‐i as a Bernoulli random variable. Assume that the probability of
success is the same with all potential customers, and denote it by π. We also assume that the n
random variables are independent, i.e. that the outcome of the i‐th proposal does not change the
probability of success of the next (so, a customer is not influenced by seeing that the previous
customer has bought or not).
The sum of the outcomes can be written as follows:
Since the terms of the summation are random variables, the result is also a random variable.
How is Y distributed? It can be shown that the probability distribution of Y, which is called the
Binomial Distribution, is as follows
The formula might seem complex, but this formula is available in a user friendly way in many
programs, including excel: its practical use is therefore very easy.
As we can see, the formula of the Binomial distribution is quite similar to that of the Bernoulli
distribution. The binomial distribution has two parameters: π (real number between 0 and 1) and n
(positive integer). Clearly, the variable Y can take n+1 values (in our example 11): 0, 1, 2, up to n.
The symbol "n over y" is called binomial coefficient. It is a function of n and y, and is needed here
to ensure that sum of the probabilities is equal to 1. “n exclamation mark” reads " factorial n", and
represents the product of all the integers from 1 to n.
The hypotheses underlying this distribution are:
 the n individual Bernoulli experiments have the same success probability
 and they are independent
Knowing π, the formula of “p of y” allows us to calculate, for example, the probability that
the dealer sells 3 cars
First we need to calculate the value of the binomial coefficient. Once this is done, the
calculation of the probability is simple. If, for example, the probability of selling a car to a generic
customer is 10%, the probability of selling 3 cars when we meet 10 customers is 5.74%.
To compute the expected value and the variance of the Binomial distribution, we might use
the formulae discussed above:
One might use excel to implement these formulae. However, for the Binomial distribution,
it is not difficult to prove that the formulae can be simplified:
The expected value is therefore equal to n times the probability of success, while the
variance is equal to n times π times (1‐π).
7. Probability density for continuous random variables

Let’s now illustrate some general concepts on continuous random variables. A continuous
random variable has this characteristic: given any two values that are admissible for the variable, in
between one can find infinitely many others.
An example? The weight of a randomly picked person. Two possible weights could be 70 kg
and 71 kg. Between these two weights we can find infinitely many weights, for example 70.235 kg.
And of course, with a suitable measuring instrument, we could measure weight with even greater
precision.
Coming to something economic, an example of a continuous random variable could be the
turnover of a randomly picked firm. Strictly speaking, the turnover of a company would be a discrete
variable. In fact, the highest possible precision in this case is the euro cent. Thus, between two
possible values of turnover, for example € 100,000 and € 110,000, strictly speaking there would be
a finite number of intermediate values. However, when the elementary unit is very small as
compared to the variability, the variable can be considered, to all practical purposes, as a continuous
variable. This is why we can consider almost all the economic variables, measured in monetary
terms, as continuous variables.
In general, for a continuous variable, it doesn’t make sense to consider the probability of a
single value. For example, although people whose weight is about 70 kg are not unusual, the
probability of meeting a person whose weight is exactly 70 kg (i.e. 70 point 0 0 0 0 0 0 ..., with
infinitely many zeroes after the dot) is zero.
For the same reason, also the probability of meeting a person whose weigh is exactly 400 kg
is zero. The two events therefore seem to have the same probability. However, this seems in
contrast to the common sense that it is more likely to meet a person whose weigh is 70 kg than a
person whose weigh is 400kg.
To clarify the difference between these two situations, we introduce the concept of
“probability density”. Indeed, a weight of exactly 70 kg has zero probability, as well as a weight of
exactly 400. However, the probability of meeting a person whose weigh is around 70 kg (for example
between 60 and 80 kg) is much higher than the probability of meeting a person whose weigh is
around 400 kg (say, between 390 and 410). Technically, we say that the probability density is higher
for the value 70 than for the value 400.
The probability density is mathematically represented by a continuous function, usually
denoted by “f of x”, as shown in this figure
This function describes the probabilities associated with the weight of people (more
precisely, it associates a probability density with each possible weight). The graph shows that the
probability density corresponding to 70 kg is higher than the probability density corresponding to
R. MOSCONI, BUSINESS STATISTICS, UNIT 1: PROBABILITY P a g e | 10
400 kg, and that the latter is very close to zero. Technically, we say that on a single point there is no
“probability mass”, but there is “probability density”. Starting from the probability density function,
we can compute probabilities by calculating the subtended area between two points.
The shaded area represents the probability that the weight is in the range between 60 and
80 kg. Technically the area under a continuous curve is called “integral”. Thus, we say that the
probability that the random variable X takes on values between x1 and x2, is equal to the integral
between x1 and x2 of the probability density function. One doesn’t have to have perfect working
knowledge of the concept of integral to calculate the probabilities, given that many programs
(including Excel) can compute these integrals in a very user friendly way. The important thing is to
have a clear sense of this graphic illustration, which, without analytical details, highlights the
relationship between density and probability.
The axioms of the probability that we mentioned before (i.e. that the probability is a real
number between 0 and 1, and that the probability of the certain event is equal to 1) imply that any
function density must necessarily have some properties:
 it must always be positive (more precisely, non‐negative: it could be zero at some points)
 the integral from negative infinity to infinity must be equal to 1
Depending on the type of random variable that is to be represented, the shape of the density
function may vary. An important feature is symmetry versus asymmetry:
The first graph shows a symmetric distribution: there is an axis of symmetry with respect to
which the probability density behaves specularly to the right and to the left.
The second graph shows a distribution that has a right (or positive) skewness: in this case
the right tail of the distribution is very dispersed, while the left tail is very concentrated.
The third graph shows a distribution with left (or negative) skewness: in this case the left tail
of the distribution is very dispersed, while the right tail is very concentrated.
Many economic variables are right skewed (income, consumption, production levels of
firms, ...).
Another important feature is unimodality versus multimodality:
The first graph shows a unimodal distribution: this means that the density has a single peak.
The second graph shows a bimodal distribution, where the density has two peaks (not
necessarily with the same height).
The third graph shows a multimodal distribution, in where the density has several peaks
(three in this case).
Multimodality is usually associated with the presence of individuals with different
characteristics: for example, it is reasonable to think that the weight of a person taken at random
from a population composed of both men and women is bimodal.
Basically there are many probabilistic models for continuous variables: in the following we
will examine a suitable model for variables whose distribution is symmetric and unimodal: the
normal distribution.
8. The Normal distribution

We will now illustrate what is perhaps the most widely used model in practical applications,
the so‐called "Normal" (or "Gaussian") distribution. The name itself suggests that this model is well
suited to represent the uncertainty in many real‐life situations. The model is the following
The density is bell‐shaped, symmetrical and unimodal. The mathematical formula of the
normal density is reported for completeness, but it is not necessary to memorize it. Conversely, it is
important to have a clear understanding of the plot. The Normal distribution has two parameters.
The central value is usually indicated with the Greek letter µ (mu), and it is the value with the highest
probability density. µ can be any real number (positive, negative or zero). We then denote by the
Greek letter σ (sigma) the distance between mu and the inflection point of the curve (the point at
which the curvature of the bell changes, from concave to convex). Sigma must be a positive real
number. Of course, the parameters mu and sigma are replaced by appropriate numerical values in
the applications.
Considering for example the weight of adult men measured in kg, we might assume a value
of mu around 75, and a value of sigma around 10. Considering newborns instead, obviously the
appropriate values for the parameters mu and sigma describing the uncertainty associated with
weight would be very different. For example, mu could be 3.2 kg and sigma could be 0.3 kg.
The area highlighted in yellow shows that if the distribution of the weight of adult men was
normal, and if mu and sigma had the assumed values, the probability of meeting a person whose
weight is between 55 and 95 kg would be quite high, even if less than 100%.
Many packages, including excel, have user friendly functions allowing to compute this
probability, without the need to solve an integral.
We will now discuss the expected value and the variance of a normal random variable. As
we have seen, for discrete random variables the expected value is the weighted average of the
possible values, with weights given by the probabilities. The variance is instead a weighted average
of the quadratic distances from the expected value, with weights given by the probabilities.
Clearly these definitions cannot be applied for a continuous random variable, both because
the possible values are an uncountable infinity, and because there is no probability mass on the
single values.
The correct definition is however conceptually identical: just substitute “sum” with
“integral”, and “probability” with “probability density”:
Each value is therefore weighed by its density, and then these products are, in double
quotes, "summed in continuous", that is “integrated”.
Replacing the Normal density in the previous equations, and applying simple rules of integral
calculus, one realizes that there are many simplifications. The final result of the integrals is
surprisingly simple: the expected value is equal to mu and the variance is equal to sigma square (so
the standard deviation is equal to sigma):
It is worth noting that for any symmetric distribution (not just for normal), the expected
value is equal to the central value. Furthermore, in symmetric distributions the central value is also
the median of the distribution, since it leaves a 50% probability to the left and to the right.
For the normal distribution, there are some well known results, for example,
If I want to construct a symmetric interval around mu, containing 95 percent of the

probability, this would be the interval between mu‐1.96sigma and mu + 1.96sigma.
Increasing the width of the interval, so does the probability: the probability that a normal
random variable takes on values between mu‐2.57sigma and mu + 2.57sigma is 99%.
Obviously, reducing the width of the interval, the probability is also reduced: the probability
that a normal random variable takes on values between mu‐1.64sigma and mu + 1.64sigma is equal
to 90%.
The three values 1.64, 1.96 and 2.57 are very useful in practice, and should be memorized.
The “standard normal” is of particular interest,
This is a normal in which mu = 0 and sigma = 1. By replacing these values in the previous
figures, we obtain some useful probabilities associated with the normal standard.
Any normal random variable can be "standardized", i.e. transformed into a standard normal.
To standardize, the we subtract the expected value mu (this shifts the distribution, so that it
is centered on zero) and the result is divided by the standard deviation sigma (this changes the scale,
making the distribution assume unitary standard deviation).
9. An illustration of the normal distribution

Let’s now illustrate how the normal distribution can be used in practice to compute
interesting probabilities. As an illustrative example, we consider the return on a financial
investment: purchase an ALFA share today to sell it tomorrow. The return is random, since it could
be positive or negative, high or low. Thus, from a probabilistic viewpoint, it makes sense to consider
the return as a random variable. Let's denote the return on our investment by X.
This graph shows the returns of the ALFA share, recorded every day over the last 5 years.
The shape of the histogram looks very similar to the normal distribution. We therefore
assume that normal distribution, which in the past seems to fit the returns fairly well, may be
appropriate to describe the uncertainty associated with future returns.
We assume that the expected value is in line with what was observed in the past. So we fix
mu at 0.1% (i.e. 0.001). The key parameter to describe the uncertainty is the standard deviation,
sigma. We assume also in this case that the uncertainty in the future is in line with the variability
observed in the past, and then we set sigma at 2% (i.e. 0.02).
The graphic representation of our model for the distribution of returns between today and
tomorrow, inspired by the statistical analysis of historical data, would therefore be as follows:
The graph shows that, although the expected value of the return is positive, being equal to
0.1%, the probability of loss, highlighted in the plot with the shaded area in red, is not negligible.
Taking into account the fact that the area underlying the entire curve is equal to 100%, we
could evaluate the probability of experiencing a loss (i.e. the area in red) as slightly less than 50%.
Having a probability model (the Normal distribution in this case) allows us to compute the
probabilities precisely using one of the many packages implementing it, for example Excel.
To compute the probability that the variable X turns out to be negative, i.e. the probability
of a negative return, we use the function NORM.DIST. This function has 4 parameters: the first is
the value whose probability we want to compute, the second is the value of the parameter mu, the
third is the value of the parameter sigma, and finally, by setting “cumulative” to 1, we are asking to
compute the integral to the left of the value of interest (0 in this case). The result of this computation
is 0.48: that means that the probability of a loss is 48%. If instead we want to compute the
probability of a positive value, i.e. the probability that the return is larger than zero, we can exploit
the property that the integral from negative infinity to infinity is equal to 1: this allows to compute
the probability as 1 minus the integral to the left of zero: this way, we obtain that the probability of
a positive return is 52%.
If we want to compute the probability that the return will be in the range from negative 2%
to 2%, we can obtain the result using the function NORM.DIST twice: the first time we compute the
integral to the left of 2%, and then we subtract the integral to the left of negative 2%. This difference
corresponds to the probability in the range from negative 2% to 2%. Clearly, if instead we want to
compute the probability that the return will be larger than 2% in absolute value, we just have to
compute 1 minus the probability that the return will be less than 2% in absolute value.
Another useful function of Excel allows us to compute the percentiles of the normal
distribution. Let's see what percentile means through a graphical illustration.
This figure illustrates, for example, the fifteenth percentile and the percentile 97.5. The
fifteenth percentile is that value leaving a probability equal to 15% to the left (therefore the
probability to the right is equal to 85%), while the percentile 97.5 leaves a probability equal to 97.5%
to the left (and therefore the probability to the right is equal to 2.5%). Clearly the numerical value
of the percentiles depends on mu and sigma: increasing mu with (keeping sigma fixed), all
percentiles shift to the right, while increasing sigma (mu fixed), the percentiles below the 50th shift
to the left, while the percentiles above the fiftieth they shift to the right.
The twenty‐fifth percentile is also called “first quartile” (because it leaves a probability equal
to ¼ to the left), the fiftieth percentile is also called “second quartile” (or also “median”), while the
seventy‐fifth percentile is also called “third quartile”.
Knowing mu and sigma, percentiles can be easily calculated in Excel using the function that
in the English version is called NORM.INV. If we want to compute the first percentile we use
NORM.INV this way: as the first parameter we write the probability we want to the left, as a second
parameter we write the value of mu, and as a third parameter we write the value of sigma. By doing
so, we obtain that the value ‐0.046 leaves a probability of 1% to the left. In a similar way we can
compute any percentile, for example the 5th percentile: the probability to the left is 5%, mu and
sigma are as before, and we obtain that the 5th percentile corresponds to ‐0.032 (i.e. a loss of 3.2%).
With the same logic we can compute any percentile we are interested in. Notice that the rightmost
percentiles can be alternatively computed using a result we have illustrated previously: for example,
the probability to the left of 𝜇 1.64𝜎 is equal to 95%. Therefore, we can alternatively use the
function NORM.INV, or this well‐known result.
Summarizing: we have shown that having an appropriate model and software, it is very easy
to compute precisely the probabilities of interesting events (the probability of loss, the probability
of gaining or losing much, etc.). Probabilistic models, even if at first sight might appear complex, are
in fact useful to make probability calculations in a very simple way. It is however important that an
appropriate model is chosen, otherwise the calculated probabilities will be unreliable.
In the example shown above, we have chosen the normal distribution because the historical
data seem to conform well enough to that distribution. If that is not the case, we have to choose
another model. Once the model has been chosen, to calculate the probabilities we have to define
the value of the parameters: also in this case, historical data can be useful: as we have seen, in the
case of financial returns I use historical data, setting the parameters to values which proved correct
in the past. There is therefore a link between data analysis and probabilistic modeling of events.
Sometimes we do not have data: in this case we can use other studies related to the same
problem to support the choice of the model. It is however always important that the distribution
we decide to use coherent with the characteristics of symmetry/asymmetry,
unimodality/multimodality of the economic phenomenon under scrutiny.
10. The Exponential distribution

The Normal distribution is an appropriate model if our experience suggests a continuous,
symmetric, unimodal, bell‐shaped distribution. There are however other models, which can be
appropriate to represent phenomena with asymmetric distribution. Let’s discuss another model
that can be useful in different application areas.
Suppose we want to study the delivery times of a certain product. Delivery time is a fairly
important performance indicator, which is often associated with customer satisfaction. What
distribution could we use to answer questions such as: "What is the probability of having a delivery
time exceeding seven days"?
The following graph shows data on delivery times observed in the past
In this, as in most practical applications, delivery delays have a strongly asymmetric

distribution: obviously, negative delays are impossible; short delays are frequent, and the frequency
decreases regularly as the delay increases. Note that the mean is above the median, a typical feature
of right skewed distributions, and that the standard deviation is very similar to the mean.
To model data with these characteristics we might consider the exponential distribution.
It is not important to memorize the formula of this distribution: like the normal distribution,
the exponential is also implemented in Excel and many other packages. It is important to have a
clear understanding of the shape, which decreases in a monotonic and regular way. The exponential
distribution has one single parameter, usually denoted by lambda. The expected value is equal to
one over lambda, and is equal to the standard deviation. The median (that is, the value that leaves
a 50% probability to the left) is equal to the logarithm of 2 (i.e. 0.69) divided by lambda.
Based on the observed historical data, we could fix 𝜆 0.5: this corresponds to an expected
value, a median and a standard deviation which are quite similar to the observed values.
To make probabilistic calculations using the exponential distribution, we can use excel. For
example, to compute the probability that delivery time is at most one day, we can use the function
called, in the English version, EXPON.DIST. The function requires as first parameter the delivery time;
the second parameter is the value of lambda, and the third is “cumulative”: 1 if we want the
probability to the left (or 0 for the density). We obtain that the probability that the delivery time is
1 day or less is 39.3%. Instead, the probability that the delivery time does not exceed 7 days is 97%,
so that the probability that delivery requires more than 7 days is 3%. In principle we can also
compute the percentiles. Unfortunately, the function EXPON.INV is not available in excel, but the
formula is not difficult,1 and one can write it directly: using it, we obtain that 95% of the delivery
times are below 5.99 days. It is interesting to observe that the distance between the 5th percentile
and the median is much smaller than the distance between the 95th percentile and the median: this
is typical in the right skewed distributions.
11. The Lognormal distribution

Some economic variables present a right skewed distribution, but the shape is somewhat
different from the one seen previously. For example, consider the gross annual income of 1500
individuals.
1
For example, to find the 5‐th percentile, set 𝛼 0.05 in the formula 𝑙𝑛 1 𝛼 /𝜆, where 𝜆 0.5 in our
case. For the 95‐th percentile 𝛼 0.95, and so on.
Right skewness is apparent, but the exponential distribution does not seem to fit well, since
the frequency does not decay monotonically, but initially sharply increases and then gradually
degrades. If, for example, in order to evaluate the probability that a randomly picked individual
earns over 100,000 euros, the normal distribution would not be appropriate in this case. Notice also
that a normal random variable with an average of about 37,000 and a standard deviation of about
25,000 would have a non‐negligible probability of assuming negative values.
It is interesting to observe how the histogram is modified by transforming the variable into
logarithms.
This graph shows the histogram of the logarithms of income data seen previously. The shape
of the distribution becomes symmetrical and unimodal, bell‐shaped very similar to normal.
When a variable has a right skewed distribution similar to the distribution of income, and
transforming it into logarithms you obtain a distribution which resembles the normal, we can
consider the so called Lognormal distribution as a reasonable model.
A variable has Lognormal distribution if its logarithm has Normal distribution. The Lognormal
distribution presents right skewness, and the shape is very similar to the one we saw for income.
The distribution has two parameters, mu and sigma, which basically represent expected value and
standard deviation of the logarithm. The formulae of the expected value, of the median and of the
standard deviation of the Lognormal are somewhat complex, but it is interesting to observe that,
by inserting in these formulas the values of mu and sigma based on the analysis of logarithms, we
obtain values which are very similar to those of incomes. If we want to compute probabilities using
the lognormal distribution (for example the probability that income is less than 20,000), we can use
the function LOGNORM.DIST. We consider the point 20,000, we introduce the parameters mu and
sigma, and as usual we set “cumulative” to 1, to compute the integral to the left of 20,000. This way,
we obtain that the probability that income is less than 20,000 is equal to 25.1%. In a similar way, we
can compute the probability that income exceeds 100,000 euros: exploiting the property that the
area below the entire curve is 1, this probability “to the right” may be computed as 1 minus the
probability to the left of 100,000. We obtain that the probability that income of a randomly picked
person exceeds 100,000 is equal to 2.7%, quite low.
It is also possible to compute the percentiles. For example, the percentile 2.5 can be
computed using the function LOGNORM.INV, specifying the probability to the left, that is 2.5%, and
then the value of mu and sigma. This way, we obtain that 2.5% of incomes are below 8,998 Euros.
In a similar way we can use LOGNORM.INV to obtain the median, just indicating a probability of
50%: we conclude that income is below 30,333 Euros with probability 50%. The percentile 97.5 is
computed the same way, and it is equal to 102,250 Euros: in other words, only 2.5% of incomes
exceed 102,250 Euros. Also in this case, the distance between the median and the percentile 2.5 is
lower than the distance between the median and the percentile 97.5, which is related to the fact
that the distribution is right skewed.

Unit 1 - Probability

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit 1 - Probability

Uploaded by

Copyright:

Available Formats

R.

MOSCONI, BUSINESS STATISTICS, UNIT 1: PROBABILITY Page |1

Course: Business Statistics

where “p of x” is equal to “π to the power x” times “(1‐π) to the power (1‐x)”; x is 0 or 1,

4. Expected value of a random variable

5. Variance of a random variable

This result is quite easy to interpret.

6. The binomial distribution

7. Probability density for continuous random variables

8. The Normal distribution

If I want to construct a symmetric interval around mu, containing 95 percent of the

9. An illustration of the normal distribution

10. The Exponential distribution

In this, as in most practical applications, delivery delays have a strongly asymmetric

11. The Lognormal distribution

You might also like