You are on page 1of 11

• If two events A and B are independent then,

P(A n B)= P(A).P(B)

P(A u B)= P(A)+P(B)-P(A n B)

Since A and B are independent, so

P(A u B)= P(A)+P(B)-P(A).P(B)

Problems will be given on the above formulas

Random Variable Definition


A random variable is a rule that assigns a numerical value to each outcome in a sample space.
Random variables may be either discrete or continuous. A random variable is said to be discrete
if it assumes only specified values in an interval. Otherwise, it is continuous. We generally denote
the random variables with capital letters such as X and Y. When X takes values 1, 2, 3, …, it is
said to have a discrete random variable.
As a function, a random variable is needed to be measured, which allows probabilities to be
assigned to a set of potential values. It is obvious that the results depend on some physical
variables which are not predictable. Say, when we toss a fair coin, the final result of happening
to be heads or tails will depend on the possible physical conditions. We cannot predict which
outcome will be noted. Though there are other probabilities like the coin could break or be lost,
such consideration is avoided.
What is a Hypothesis?
Ever make bets with your family and friends about sports, singing or dancing competitions?
You can think of these bets as a hypothesis. A hypothesis is defined as a proposed
explanation for an event based on previous facts. A hypothesis is little more than an educated
guess.

This is why, when you make a bet on a winning singer in a talent competition, you are making
a hypothesis. You are using previous information in the form of past performances to
propose an explanation for the future.

Take a look at the table below for an example of hypotheses in various fields.

Field Hypothesis
Psychology Twins have the same IQ
Biology Fertilizer increases average height growth for plants
Political Science Countries that received a loan are better off now than those who didn’t

What is Hypothesis Testing?


Now that you understand the definition of a hypothesis, you can start to delve into the basics of
hypothesis testing. Hypothesis is exactly what it sounds like - you conduct a hypothesis test in
order to test whether or not your hypothesis is probable.

Using the probabilities in your data, you can determine how likely your hypothesis is true most
of the time or simply the result of chance. Hypotheses have a special notation, which are
introduced below.

Null hypothesis
Alternative hypothesis
Population mean
Population standard deviation
Sample mean
Sample SD
Let’s focus on the first two lines - the rest will be explained further. The null and alternative
hypotheses are the two types of hypotheses you can make about your data. The definitions
of each are summarized in the table below.

States that a parameter is equal to, less or greater than, or different from a hypothesized value
States that a parameter is not equal to, less or greater, or different from a hypothesized value

Hypothesis tests can be conducted by finding the probability distribution of your data.
The frequency distribution of your data set is simply the frequency at which each value in the
data set occurs.

Probability distributions, on the other hand, use the mean and standard deviation of your data
set to produce the likelihood of certain values occurring.

Types of Hypothesis Testing


Building off of the previous section, you now know that we can make hypothesis tests based
off of a data set’s probability distribution. In fact, most of the time you can make hypothesis
tests without the entire data set and simply use the mean and standard deviation information.
Hypothesis tests can be made using probability distributions because you can see whether or
not your hypothesized value is likely to occur. Recall, however, that there are many types of
distributions. Take a look at the table below, which summarizes each hypothesis test, their
required distributions and their test parameters.

Test Type Distribution Test Parameters


Z-test Normal Mean
T-test Student-t Mean
ANOVA F distribution Means
Chi-Square Chi-squared distribution Association between two categorical variables

As you can see, while the first three are used for the comparison of means, the last test is used
to compare the association between two categorical variables.

Each hypothesis test uses these basic principles.

Element Example Description


Hypothesis with
hypothesized value The here is that the population mean is greater than 45
This is used as a benchmark to test how likely a mean of 45
Test value 45
population mean and SD
At the 95% confidence level (1-0.95 = 0.05), we can be certain
Confidence interval
gets the true answer 95% of the time
The test statistic gives you the standardized value of your test v
Test statistic
test distribution
P-value The p-value is the calculated probability of your value occuring

In hypothesis testing, the following rules are used to either reject or accept the hypothesis
given a of 0.05. Keep in mind that if you were to have an of 0.1, you’re results would be
given with 90% confidence and the example above, with a p-value of 0.06, would reject .

P-value < 0.05 Region of rejection Reject


P-value > 0.05 Region of acceptance Fail to reject

Type 1 and Type 2 Error


To understand type 1 and type 2 error, let’s take a closer look at the confidence level. Take
as an example a scenario in which you are conducting an experiment on average plant growth
in one year as a result of a specific fertilizer. Recall that a population is the thing or idea that
you want to study - here, it would be the plant we are studying.

Because it is impossible for us to test our hypothesis on all of the plants in the world, we take
samples in order to make estimates about the population. Let’s assume we’ve taken 100
different samples from this plant population.

A confidence level is defined as the percentage of all possible samples that we can expect to
include the true population parameter. A 95% confidence level would therefore be the same
as saying that we expect 95% of all 100 samples to include the true population parameter.

A confidence level can be set at a ton of different values. In fact, the DNA tests that people
take nowadays to find out where their ancestors come from include varying confidence
levels you can set. The table below summarizes common confidence levels with their
corresponding , or alpha, value.

Confidence Level Alpha


99
95
90
85
75

Type 1 and 2 errors occur when we reject or accept our null hypothesis when, in reality, we
shouldn’t have. This happens because, while statistics is powerful, there is a certain chance
that you may be wrong. The table below summarizes these types of errors.

Accept Reject
In reality, is actually Incorrect: Type 1 error - is true a
Correct: is true and statistical test accepts
true test rejects
In reality, is actually Incorrect: Type 2 error - is false and statistical
Correct: is false and statistical tes
false test accepts
Now that you’re familiar with hypothesis testing, let’s look at an example:

The mean population IQ for adults is 100 with an SD of 15. You want to see whether those
born prematurely have a lower IQ. To test this, you attain a sample of the IQ’s adults that were
born prematurely with a sample mean of 95. Your hypothesis is that prematurely born people
do not have lower IQs.

Because we know the population mean and standard deviation, as well as the distribution (IQ’s
are generally normally distributed), we can use a z-test.

Null Hypothesis : IQ of 95 or above is normal


Alternative Hypothesis : IQ of 95 is not normal

First, we find the z-score

Next, we find the z-score on a z-table for negative values.

Z 0.00 0.01 0.02 0.03


0.2 0.42074 0.41683 0.41294 0.40905
0.3 0.38209 0.37828 0.37448 0.37070

With a p-value of 0.3707, we the fail to reject the null hypothesis.

Type 1 Error Example


In the example above, you can see that we have chosen our confidence level to be at 95%.
This gives us an alpha of 0.05. As explained above, a type 1 error occurs when our statistical
test rejects the null hypothesis when, in reality, the null hypothesis is true.

The main question is, how do we know when a type 1 error has occurred? The only way we
could know for certain would be if we had all population values, which we don’t. Luckily, we
can use the same logic as we do for the confidence level. If we are 95% certain of something
occurring, this means that the probability that this thing really didn’t occur as the tail end of our
rejection region. Therefore, the type 1 error is calculated simply as the 1 minus the
probability that our hypothesis occurred, which is simply our p-value 0.3707.

• Differentiate between probability Density Function with Probability Mass Function

Difference Between PDF and PMF (With Table)

In order to understand the difference between PDF and PMF, it is important to understand what Random
variables are. A random variable is a variable whose value is not known to the task; in other words, the value
depends on the result of the experiment. For instance, while flipping a coin, the value i.e. heads or tails depends
upon the outcome.

PDF vs PMF

The difference between PDF and PMF is in terms of random variables. PDF is relevant for continuous random
variables while PMF is relevant for discrete random variable.

Both the terms, PDF and PMF are related to physics, statistics, calculus, or higher math. PDF (Probability Density
Function) is the likelihood of the random variable in the range of discrete value. On the other hand, PMF
(Probability Mass Function) is the likelihood of the random variable in the range of continuous values.

Comparison Table Between PDF and PMF

Parameter of
Comparison PDF PMF

Full form Probability Density Function Probability Mass Function

PDF is used when there is a need to find a solution PMF is used when there is a need to find a
Use in a range of continuous random variables. solution in a range of discrete random variables.

Random Variables PDF uses continuous random variables. PMF uses discrete random variables.
Parameter of
Comparison PDF PMF

Formula F(x)= P(a < x 0 p(x)= P(X=x)

The solution falls in the radius range of continuous The Solutions falls in the radius between
Solution random variables numbers of discrete random variables

What is PDF?

The Probability Density Function (PDF) depicts probability functions in terms of continuous random variable
values presenting in between a clear range of values.

It is also known as a probability distribution function or a probability function. It is denoted by f(x).

The PDF is essentially a variable density over a given range. It is positive/non-negative at any given point in the
graph and the whole of PDF is always equal to one.

In a case where the probability of X on some given value x (continuous random variable) is always 0. In such a
case P(X = x) does not work.

In such a situation, we need to calculate the probability of X resting in an interval (a, b) along with for P(a< X<
b) which can take place using a PDF.

The Probability distribution function formula is defined as, F(x)= P(a < x < b)= ∫ba f(x)dx>0

Some instances where Probability distribution function can work are:

Temperature, rainfall and overall weather

Time computer takes to process input and give output

And many more.

Various applications of the probability density function (PDF) are:

The PDF is used in shaping the data of atmospheric NOx temporal concentration yearly.

It is treated to shaped the diesel engine combustion

It is used to work on the probabilities attached with random variables in statistics.


What is PMF?

The Probability Mass function depends on the values of any real number. It does not go to the value of X which
equals to zero and in case of x, the value of PMF is positive.

The PMF plays an important role in defining a discrete probability distribution and produces distinct outcomes.
The formula of PMF is p(x)= P(X=x) i.e the probability of (x)= the probability (X=one specific x)

As it gives distinct values, PMF is very useful in computer programming and shaping of statistics.

In simpler terms, probability mass function or PMS is a function that is associated with discrete events i.e.
probabilities related with those events occurring.

The word “mass“ explains the probabilities that are focused on discrete events.

Some of the applications of the probability mass function (PMF) are:

Probability mass function (PMF) has a main role in statistics as it helps in defining the probabilities for discrete
random variables.

PMF is used to find the mean and variance of the distinct grouping.

PMF is used in binomial and Poisson distribution where discrete values are used.

Some instances where Probability mass function can work are:

Number of students in a class

Numbers on a dice

Sides of a coin

And many more.

Main Differences Between PDF and PMF

The full form of PDF is Probability Density Function whereas the full form of PMF is Probability Mass Function

PMF is used when there is a need to find a solution in a range of discrete random variables whereas PDF is used
when there is a need to find a solution in a range of continuous random variables.

PDF uses continuous random variables whereas PMF uses discrete random variables.

Pdf formula is F(x)= P(a < x < b)= ∫ba f(x)dx>0 whereas pmf formula is p(x)= P(X=x)

The solutions of PDF falls in the radius of continuous random variables whereas the solutions of PMF falls in the
radius between numbers of discrete random variables
Conclusion

When it comes to PDF and PMF, people often confuse themselves within the two. The main difference is in
terms of random variables used by both.

PDF on hand, depends on continuous random variables whereas PMF depends on Discrete random Variables.
Both of them are used in fields like physics, statistics, calculus, or higher math.

The probabilities for discrete distributions are found using PMFs are Binomial, Hypergeometric, Poisson,
Geometric, Negative Binomial, etc. whereas the probabilities for continuous distributions are found using PDFs
are Exponential, Gamma, Pareto, Normal, Lognormal, Student’s T, F, etc.

You might also like