Practical Data Science: Basic Concepts of Probability

RIOJA, ANNA MILCA V.
BSHM 3-2
PRACTICAL DATA SCIENCE

ACTIVITY #1
A. Outcomes probability and events

Basic concepts of probability
Probability deals with random (or unpredictable) phenomena. When one of several
things can happen, we often must resort to attempting to assign some measurement of
the likelihood of each of the possible eventualities. Probability theory provides us with
the language for doing this, as well as the methodology.
Events The algebra of events will be a Boolean Algebra just like the algebra of
propositions. In particular, we will have the notions: • A ∧ B is the event that both A and
B will occur • A ∨ B is the event that at least one of A and B will occur • Ac which the
event that A will not occur By defining A ⊗ B = A ∧ B and A ⊕ B = (A ∨ B) ∧ (A ∧ B) c ,
we require that the algebraic system obtained is a Boolean algebra. For completeness,
we will also require the notion of an event Ω which is the union of all possible events
and φ as the intersection of all possible events. (Since (∨iAi) c = ∧Ac i we can see that Ω
c = φ.)
Events and Outcomes
 The result of an experiment is called an outcome.
 An event is any particular outcome or group of outcomes.
 A simple event is an event that cannot be broken down further
 The sample space is the set of all possible simple events.
Example 1
If we roll a standard 6-sided die, describe the sample space and some simple events.
The sample space is the set of all possible simple events: {1,2,3,4,5,6}
Some examples of simple events:
 We roll a 1
 We roll a 5
Some compound events:
 We roll a number bigger than 4
 We roll an even number
Basic Probability
BSHM 3-2
Given that all outcomes are equally likely, we can compute the probability of an event
E using this formula:
Certain and Impossible events

 An impossible event has a probability of 0.
 A certain event has a probability of 1.
 The probability of any event must be
If you compute a probability and get an answer that is negative or greater than 1, you
have made a mistake and should check your work.
Complement of an Event
The complement of an event is the event “E doesn’t happen”
The notation ¯¯¯¯EE¯ is used for the complement of event E.
We can compute the probability of the complement using
P(¯¯¯¯E)=1−P(E)P(E¯)=1−P(E)
Independent Events
Events A and B are independent events if the probability of Event B occurring is the
same whether or not Event A occurs.
When two events are independent, the probability of both occurring is the product of
the probabilities of the individual events.
P(A and B) for Independent Events
If events A and B are independent, then the probability of both A and B occurring is
P(A and B) = P(A) × P(B)
where P(A and B) is the probability of events A and B both occurring, P(A) is the
probability of event A occurring, and P(B) is the probability of event B occurring.
B. The laws of probability
To each event A we assign a probability P(A) which is a real number between 0 and 1
called the probability of (occurrence of) the event A.
This satisfies the following laws:
• The probability of the universal event is 1, i. e. P(Ω) = 1.
If A ⊂ B then P(A) ≤ P(B).
• The “excluded middle” law, P(A) + P(Ac ) = 1.
• The Multiplication Law-If A and B are independent events thenP(A∩B)=P(A)×P(B)In
words‘The probability of independent eventsAandBoccurring is the product of the
probabilitiesof the events occurring separately.’
• The addition law P(A ∧ B) = P(A) + P(B) − P(A ∨ B). We have Ω = (φ) c . It follows that
P(φ) = 0. Moreover, since any A is a subset of Ω, we obtain an important rule (which we
should never forget!): 0 ≤ P(A) ≤ 1 Any calculation that purports to give a probability
BSHM 3-2
where the answer does not satisfy this is obviously wrong! We note that (A ∧ B) ∧ (A ∧ B
c ) = φ and (A ∧ B) ∨ (A ∧ B c ) = A and so it follows that P(A) = P(A ∧ B) + P(A ∧ Bc ).
C. Conditional probability
Each of the probabilities computed in the previous section (e.g., P(boy), P(7 years of
age)) is an unconditional probability, because the denominator for each is the total
population size (N=5,290) reflecting the fact that everyone in the entire population is
eligible to be selected. However, sometimes it is of interest to focus on a particular
subset of the population (e.g., a sub-population). For example, suppose we are
interested just in the girls and ask the question, what is the probability of selecting a 9
year old from the sub-population of girls? There is a total of NG=2,730 girls (here NG
refers to the population of girls), and the probability of selecting a 9 year old from the
sub-population of girls is written as follows:
P(9 year old | girls) = # persons with characteristic / N
where | girls indicates that we are conditioning the question to a specific subgroup, i.e.,
the subgroup specified to the right of the vertical line.
The conditional probability is computed using the same approach we used to compute
unconditional probabilities. In this case:
P(9 year old | girls) = 461/2,730 = 0.169.
This also means that 16.9% of the girls are 9 years of age. Note that this is not the same
as the probability of selecting a 9-year old girl from the overall population, which is P(girl
who is 9 years of age) = 461/5,290 = 0.087.
D. Random variables
A random variable is a numerical description of the outcome of a statistical experiment.
A random variable that may assume only a finite number or an infinite sequence of
values is said to be discrete; one that may assume any value in some interval on the real
number line is said to be continuous. For instance, a random variable representing the
number of automobiles sold at a particular dealership on one day would be discrete,
while a random variable representing the weight of a person in kilograms (or pounds)
would be continuous.
A continuous random variable may assume any value in an interval on the real number
line or in a collection of intervals. Since there is an infinite number of values in any
interval, it is not meaningful to talk about the probability that the random variable will
BSHM 3-2
take on a specific value; instead, the probability that a continuous random variable will
lie within a given interval is considered.
he expected value, or mean, of a random variable—denoted by E(x) or μ—is a weighted
average of the values the random variable may assume. In the discrete case the weights
are given by the probability mass function, and in the continuous case the weights are
given by the probability density function. The formulas for computing the expected
values of discrete and continuous random variables are given by equations 2 and 3,
respectively.
E(x) = Σxf(x) (2)
E(x) = ∫xf(x)dx (3)
The variance of a random variable, denoted by Var(x) or σ2, is a weighted average of the
squared deviations from the mean. In the discrete case the weights are given by the
probability mass function, and in the continuous case the weights are given by the
probability density function. The formulas for computing the variances of discrete and
continuous random variables are given by equations 4 and 5, respectively. The standard
deviation, denoted σ, is the positive square root of the variance. Since the standard
deviation is measured in the same units as the random variable and the variance is
measured in squared units, the standard deviation is often the preferred measure.
Var(x) = σ2 = Σ(x − μ)2f(x) (4)
Var(x) = σ2 = ∫(x − μ)2f(x)dx (5)
E. Probability distributions
In probability theory and statistics, a probability distribution is the mathematical
function that gives the probabilities of occurrence of different possible outcomes for an
experiment. It is a mathematical description of a random phenomenon in terms of its
sample space and the probabilities of events. The probability distribution for a random
variable describes how the probabilities are distributed over the values of the random
variable. For a discrete random variable, x, the probability distribution is defined by
a probability mass function, denoted by f(x). This function provides the probability for
each value of the random variable. In the development of the probability function for a
discrete random variable, two conditions must be satisfied: (1) f(x) must be nonnegative
for each value of the random variable, and (2) the sum of the probabilities for each
value of the random variable must equal one.
Special probability distributions
The binomial distribution
Two of the most widely used discrete probability distributions are the binomial and
Poisson. The binomial probability mass function (equation 6) provides the probability
that x successes will occur in n trials of a binomial experiment.
BSHM 3-2
A binomial experiment has four properties: (1) it consists of a sequence of n identical

trials; (2) two outcomes, success or failure, are possible on each trial; (3) the probability
of success on any trial, denoted p, does not change from trial to trial; and (4) the trials
are independent. For instance, suppose that it is known that 10 percent of the owners
of two-year old automobiles have had problems with their automobile’s electrical
system. To compute the probability of finding exactly 2 owners that have had electrical
system problems out of a group of 10 owners, the binomial probability mass function
can be used by setting n = 10, x = 2, and p = 0.1 in equation 6; for this case, the
probability is 0.1937.
The Poisson distribution
The Poisson probability distribution is often used as a model of the number of arrivals at
a facility within a given period of time. For instance, a random variable might be defined
as the number of telephone calls coming into an airline reservation system during a
period of 15 minutes. If the mean number of arrivals during a 15-minute interval is
known, the Poisson probability mass function given by equation 7 can be used to
compute the probability of x arrivals.

For example, suppose that the mean number of calls arriving in a 15-minute period is
10. To compute the probability that 5 calls come in within the next 15 minutes, μ = 10
and x = 5 are substituted in equation 7, giving a probability of 0.0378.
The normal distribution
The most widely used continuous probability distribution in statistics is the normal
probability distribution. The graph corresponding to a normal probability density
function with a mean of μ = 50 and a standard deviation of σ = 5 is shown in Figure 3.
Like all normal distribution graphs, it is a bell-shaped curve. Probabilities for the normal
probability distribution can be computed using statistical tables for the standard normal
probability distribution, which is a normal probability distribution with a mean of zero
and a standard deviation of one. A simple mathematical formula is used to convert any
value from a normal probability distribution with mean μ and a standard deviation σ
into a corresponding value for a standard normal distribution.

Practical Data Science: Basic Concepts of Probability

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Practical Data Science: Basic Concepts of Probability

Uploaded by

Copyright:

Available Formats

RIOJA, ANNA MILCA V.

PRACTICAL DATA SCIENCE

A. Outcomes probability and events

Certain and Impossible events

P(9 year old | girls) = # persons with characteristic / N

P(9 year old | girls) = 461/2,730 = 0.169.

A binomial experiment has four properties: (1) it consists of a sequence of n identical

compute the probability of x arrivals.

You might also like