Professional Documents
Culture Documents
This is why, when you make a bet on a winning singer in a talent competition, you are making
a hypothesis. You are using previous information in the form of past performances to
propose an explanation for the future.
Take a look at the table below for an example of hypotheses in various fields.
Field Hypothesis
Psychology Twins have the same IQ
Biology Fertilizer increases average height growth for plants
Political Science Countries that received a loan are better off now than those who didn’t
Using the probabilities in your data, you can determine how likely your hypothesis is true most
of the time or simply the result of chance. Hypotheses have a special notation, which are
introduced below.
Null hypothesis
Alternative hypothesis
Population mean
Population standard deviation
Sample mean
Sample SD
Let’s focus on the first two lines - the rest will be explained further. The null and alternative
hypotheses are the two types of hypotheses you can make about your data. The definitions
of each are summarized in the table below.
States that a parameter is equal to, less or greater than, or different from a hypothesized value
States that a parameter is not equal to, less or greater, or different from a hypothesized value
Hypothesis tests can be conducted by finding the probability distribution of your data.
The frequency distribution of your data set is simply the frequency at which each value in the
data set occurs.
Probability distributions, on the other hand, use the mean and standard deviation of your data
set to produce the likelihood of certain values occurring.
As you can see, while the first three are used for the comparison of means, the last test is used
to compare the association between two categorical variables.
In hypothesis testing, the following rules are used to either reject or accept the hypothesis
given a of 0.05. Keep in mind that if you were to have an of 0.1, you’re results would be
given with 90% confidence and the example above, with a p-value of 0.06, would reject .
Because it is impossible for us to test our hypothesis on all of the plants in the world, we take
samples in order to make estimates about the population. Let’s assume we’ve taken 100
different samples from this plant population.
A confidence level is defined as the percentage of all possible samples that we can expect to
include the true population parameter. A 95% confidence level would therefore be the same
as saying that we expect 95% of all 100 samples to include the true population parameter.
A confidence level can be set at a ton of different values. In fact, the DNA tests that people
take nowadays to find out where their ancestors come from include varying confidence
levels you can set. The table below summarizes common confidence levels with their
corresponding , or alpha, value.
Type 1 and 2 errors occur when we reject or accept our null hypothesis when, in reality, we
shouldn’t have. This happens because, while statistics is powerful, there is a certain chance
that you may be wrong. The table below summarizes these types of errors.
Accept Reject
In reality, is actually Incorrect: Type 1 error - is true a
Correct: is true and statistical test accepts
true test rejects
In reality, is actually Incorrect: Type 2 error - is false and statistical
Correct: is false and statistical tes
false test accepts
Now that you’re familiar with hypothesis testing, let’s look at an example:
The mean population IQ for adults is 100 with an SD of 15. You want to see whether those
born prematurely have a lower IQ. To test this, you attain a sample of the IQ’s adults that were
born prematurely with a sample mean of 95. Your hypothesis is that prematurely born people
do not have lower IQs.
Because we know the population mean and standard deviation, as well as the distribution (IQ’s
are generally normally distributed), we can use a z-test.
The main question is, how do we know when a type 1 error has occurred? The only way we
could know for certain would be if we had all population values, which we don’t. Luckily, we
can use the same logic as we do for the confidence level. If we are 95% certain of something
occurring, this means that the probability that this thing really didn’t occur as the tail end of our
rejection region. Therefore, the type 1 error is calculated simply as the 1 minus the
probability that our hypothesis occurred, which is simply our p-value 0.3707.
In order to understand the difference between PDF and PMF, it is important to understand what Random
variables are. A random variable is a variable whose value is not known to the task; in other words, the value
depends on the result of the experiment. For instance, while flipping a coin, the value i.e. heads or tails depends
upon the outcome.
PDF vs PMF
The difference between PDF and PMF is in terms of random variables. PDF is relevant for continuous random
variables while PMF is relevant for discrete random variable.
Both the terms, PDF and PMF are related to physics, statistics, calculus, or higher math. PDF (Probability Density
Function) is the likelihood of the random variable in the range of discrete value. On the other hand, PMF
(Probability Mass Function) is the likelihood of the random variable in the range of continuous values.
Parameter of
Comparison PDF PMF
PDF is used when there is a need to find a solution PMF is used when there is a need to find a
Use in a range of continuous random variables. solution in a range of discrete random variables.
Random Variables PDF uses continuous random variables. PMF uses discrete random variables.
Parameter of
Comparison PDF PMF
The solution falls in the radius range of continuous The Solutions falls in the radius between
Solution random variables numbers of discrete random variables
What is PDF?
The Probability Density Function (PDF) depicts probability functions in terms of continuous random variable
values presenting in between a clear range of values.
The PDF is essentially a variable density over a given range. It is positive/non-negative at any given point in the
graph and the whole of PDF is always equal to one.
In a case where the probability of X on some given value x (continuous random variable) is always 0. In such a
case P(X = x) does not work.
In such a situation, we need to calculate the probability of X resting in an interval (a, b) along with for P(a< X<
b) which can take place using a PDF.
The Probability distribution function formula is defined as, F(x)= P(a < x < b)= ∫ba f(x)dx>0
The PDF is used in shaping the data of atmospheric NOx temporal concentration yearly.
The Probability Mass function depends on the values of any real number. It does not go to the value of X which
equals to zero and in case of x, the value of PMF is positive.
The PMF plays an important role in defining a discrete probability distribution and produces distinct outcomes.
The formula of PMF is p(x)= P(X=x) i.e the probability of (x)= the probability (X=one specific x)
As it gives distinct values, PMF is very useful in computer programming and shaping of statistics.
In simpler terms, probability mass function or PMS is a function that is associated with discrete events i.e.
probabilities related with those events occurring.
The word “mass“ explains the probabilities that are focused on discrete events.
Probability mass function (PMF) has a main role in statistics as it helps in defining the probabilities for discrete
random variables.
PMF is used to find the mean and variance of the distinct grouping.
PMF is used in binomial and Poisson distribution where discrete values are used.
Numbers on a dice
Sides of a coin
The full form of PDF is Probability Density Function whereas the full form of PMF is Probability Mass Function
PMF is used when there is a need to find a solution in a range of discrete random variables whereas PDF is used
when there is a need to find a solution in a range of continuous random variables.
PDF uses continuous random variables whereas PMF uses discrete random variables.
Pdf formula is F(x)= P(a < x < b)= ∫ba f(x)dx>0 whereas pmf formula is p(x)= P(X=x)
The solutions of PDF falls in the radius of continuous random variables whereas the solutions of PMF falls in the
radius between numbers of discrete random variables
Conclusion
When it comes to PDF and PMF, people often confuse themselves within the two. The main difference is in
terms of random variables used by both.
PDF on hand, depends on continuous random variables whereas PMF depends on Discrete random Variables.
Both of them are used in fields like physics, statistics, calculus, or higher math.
The probabilities for discrete distributions are found using PMFs are Binomial, Hypergeometric, Poisson,
Geometric, Negative Binomial, etc. whereas the probabilities for continuous distributions are found using PDFs
are Exponential, Gamma, Pareto, Normal, Lognormal, Student’s T, F, etc.