You are on page 1of 7

Cooper Energy Investor Series

Cumulative Probability P90, P50, P10


The terms P90, P50 and P10 are occasionally used by persons when they are discussing
volumes of hydrocarbons. But what do these terms actually mean?
Contrary to what you may have heard or understood, P90 does NOT mean that the volume
estimate under discussion has a 90% chance of occurring. Similarly, P50 and P10 do NOT
mean that those volume estimates have a 50% or 10% chance of occurring respectively. So
what do they mean?
To understand what these terms mean you have to understand statistical theory and how
hydrocarbon volumetric estimates are prepared.
Most high school graduates will be familiar with a normal frequency distribution. This is
shown diagrammatically below:

Figure 1: Normal Distribution f(x,0,1)


f(x)~0.4

f(x)~0.18

1.28
(P90)

0
(P50)
1

Figure 1 is very off-putting because it looks very mathematical.


The easiest way to understand a frequency distribution is this: imagine
a tree in your garden; if you had the time, and the inclination, you could
go out and measure the length of every leaf on the tree. Some leaves
will be long, some leaves will be short and some leaves will be medium.
As you are measuring the leaves you put them into five cups depending
on the size of the leaf. The first cup is for the smallest leaves; the last
cup is for the largest leaves. Once you have pulled all the leaves off the
tree the cups (or bins to use the correct statistical term) will look like the picture below:

NumberofLeavesofthatsize

Figure 2: Leaves on a tree example

500
200
50

150
50

As you can see, the medium sized leaves are most common while the very small and very
large leaves are least common. In nature things tend to group around a central common size
or point.
What you have done is describe the uncertainty of the leaf sizes by a 5-bin frequency
distribution. If you look at Figure 2, you should be able to see that the shape at the top of the
green boxes is similar to the shape of the red line in Figure 1. Figure 1 is known as a
continuous distribution (the line flows continuously) think of it as a distribution with a very
large number of bins. Figure 2 is a discrete distribution (it has a discrete number of bins that
capture the number of leaves that fall within a certain size range).
We can do this exercise for every measureable thing and create a frequency distribution.
There are other (non-normal) frequency distribution shapes possible but these do not need
to be discussed here. If you understand a normal frequency distribution then thats all you
need to know for the time being.
Now that you understand frequency distributions, whats P90 etc?

With the frequency distribution that we just created we can add up all the numbers from one
end and create a CUMULATIVE frequency distribution. Its just another way of showing the
data. Using the leaf example, if we start adding up the leaves from the biggest end and work
our way to the smallest end we end up with the following:

CumulativeLeavesfrombiggesttosmallest

Figure 3: Leaves on a tree example

950
900
700
200
50

So how does that help us? Well we can say things like: there are 900 leaves bigger than the
smallest leaves or there are 200 leaves bigger than the medium size leaves.
We can do the same exercise with the continuous frequency distribution in Figure 1 and we
end up with the following continuous cumulative frequency distribution:

Figure 4: Upper Cumulative Distribution Q(x,0,1)

Q(x)=0.9

Q(x)=0.5

1.28
(P90)

0
(P50)
4

Although this looks terribly mathematical, its similar to the graph you have just produced
with the leaf example. The main difference is that the numbers on the Y-axis (or vertical axis)
have been divided by the biggest number at the end thereby normalising the axis to 100%.
You should be able to see that the shape described by the top of the green boxes in Figure
3 looks very similar to the shape of the red line in Figure 4. Figure 4 looks smoother than
Figure 3 because Figure 4 was created from the smooth continuous distribution in Figure 1.
Since the Y-axis in Figure 4 has been normalised to 100% we can read off the estimates that
correspond to the 90%, 50% and 10% cumulative frequency. These estimates are usually
termed the P90, P50 and P10 confidence levels.
Using Figure 4, the estimate at the P90 confidence level is -1.28 and the estimate at the P50
confidence level is 0. Its just the way the scale is presented it has been normalised to zero
at the middle.
As per the leaves example, P90 means that 90% of the estimates (or outcomes) are
expected to be bigger than this estimate. P50 means that 50% of the estimates (or
outcomes) are expected to be bigger than this estimate. This is NOT the same as the
chance of that estimate occurring.
The chance of a single estimate occurring can be read off Figure 1. If we ask the question a
different way: from Figure 4, what is more likely to occur more frequently - P90 or P50? To
help you, you cant actually answer the question from the cumulative frequency distribution
(Figure 4) and you will need to jump from the cumulative frequency curve (Figure 4) back to
the frequency distribution (Figure 1).
An easier way to understand the question would be to use the leaf example, assume P90 is
the same as the small leaves and P50 is the same as the medium leaves. So the question
becomes: what is more likely to occur the small leaves or the medium leaves? The

medium leaves are more likely to occur of course. So in a normal distribution, the P50 value
is more likely to occur than the P90 value.
In simple general terms, that is why P50 is sometimes also known as the best estimate
because its the estimate that occurs more frequently.
So how does this help you to understand oil and gas estimates?
An oil or gas estimate is calculated by multiplying together a number of parameters, for
example:
Oil in place equals rock volume of the reservoir multiplied by porosity multiplied by oil
saturation (there are actually a lot more input variables but let us keep it simple for now).
Rock volume, porosity and oil saturation are measureable things. There is however
uncertainty surrounding the measurement of those parameters. To cater for this uncertainty
we describe the input parameters by continuous frequency distributions. If we then multiply
all the input frequency distributions together (a computer does this for us), the output, oil in
place, ends up as a frequency distribution. We can then take this oil in place frequency
distribution and create an oil in place cumulative frequency distribution. This is shown
diagrammatically as follows:

Figure 5: Oil in place calculation and estimation

RockVolume

Porosity

OilSaturation

Oilinplace

Multiplythefrequencydistributionstogethertoobtaina
frequencydistributionandthencreateacumulativefrequency
distribution.
FromthefrequencydistributionwecanreadofftheP90,P50
andP10confidencelevels.

In summary, to create an oil volume distribution:


Step 1: create continuous frequency distributions for each input parameters (rock volume,
porosity, oil saturation).
Step 2: multiple the input parameters together (using a computer) and create an oil in place
continuous frequency distribution.

Step 3: take the oil in place continuous frequency distribution and create an oil in place
continuous cumulative frequency distribution.
Step 4: From the oil in place continuous cumulative frequency distribution read off the
estimate sizes that correspond to the P90, P50 and P10 confidence levels.
So now that you understand frequency distributions, cumulative frequency distributions and
how we use them to create volumetric estimates you should be able to answer a few
questions:
Question 1: What does P90 mean?
Answer: It means that 90% of the calculated estimates are bigger than the P90 estimate.
Question 2: Is the P90 estimate or the P50 estimate more likely to occur?
Answer: P50 is more likely to occur because the estimate is expected to occur more often
than the P90 estimate in the frequency distribution.
Question 3: Whats the most important number P90, P50 or P10?
Answer: P50 is the most important number because its the best estimate. P90 and P10 just
show the range in the uncertainty of the estimate.
Question 4: Am I more confident in the P90 estimate or the P50 estimate?
Answer: You are more confident in the P90 estimate. As 90% of the estimates are greater
than the P90 estimate you would more confident that the final actual outcome will be greater
than the P90 estimate than greater than the P50 estimate. Recall that only 50% of the
estimates are greater than the P50 estimate. This doesnt mean that the P90 estimate has a
higher chance of occurring, as explained above, all it means is that you have a higher
confidence in that estimate being exceeded by the actual outcome. This can be a difficult
concept to grasp. An easier way to think about it may be to say Im confident that the actual
outcome will be greater than my P90 estimate but overall I expect that the final outcome will
be closest to my P50 estimate.
Question 5: Does everybody do frequency distribution (or probabilistic) calculations?
Answer: No. Some people just multiply single values (deterministic best estimates) together
to calculate a single output estimate, not a frequency distribution.
Question 5a: So how can you tell the confidence of that single output estimate?
Answer: You cant, its a single best estimate. You just have to use it as it is calculated.

Note that in the example above we only calculated the oil in place. We can go one step
further and calculate the recoverable oil. Recoverable oil equals oil in place multiplied by a
recovery factor. For the recovery factor we can create a frequency distribution like all other
input parameters. Multiplying the oil in place frequency distribution by the recovery factor
frequency distribution we end up with a recoverable oil frequency distribution and then we
can convert this to a cumulative frequency distribution and read off the P90, P50 and P10
estimates. All the same concepts as discussed above apply.

Question 6: How do you create frequency distributions for all the variables?
Answer: You go to university for 4-5 years and become a geophysicist, geologist,
petrophysicist or reservoir engineer, you get a good job with an oil and gas company, you
get trained over 5-10 years, you gain a lot of experience and knowledge in earth sciences
and the physics of oil and gas moving through rocks and then you get to work on interesting
things like estimating recoverable oil and gas. But seriously, that question is taking us
outside the scope of this document as it involves knowledge, experience and the
measurement and analysis of the data that make up each of the individual input parameters.

For further detailed reading investors should consult the Recoverable Hydrocarbon
Guidelines on Cooper Energys website policies section.
Cooper Energy Limited
For further information contact Cooper Energy via the website.

You might also like