You are on page 1of 3

Empirical distribution function and Percentiles

1
Q1. Given Data={-1,0,1}. What proportion of the data do not exceed = 2 ?

1
Ans. There are 3 observations in the data, out of which 2 do not exceed (we compare
2
1 2
each data value to 2 and count). Therefore the required proportion is equal to = 3.

Let us now generalize this question

Q2. Given the data {1 , , }, what portion of the data is less than or equal to ?

Ans. We have to compute the number of observations in the data that are less than or equal to
. To get this number we compare each with , = 1, , . If does not exceed , we
count as one and zero otherwise. Hence

The number of s , = 1, , , not exceeding equals to =1 ( ),

where ( )=1 if and zero otherwise.


1
Therefore the proportion of the data not exceeding , equals to =1 ( ).

Let us denote this proportion by (). Given the data {1 , , }, we have a function as
follows

0, < (1)
1
, (1) < (2)

2
, (2) < (3)
() = ..
..
( 1)
, (1) < ()

{ 1, ()

Note:- {(1) , , () } are the sorted data. That is (1) denotes the minimum and () is
the maximum. () denotes the observation such that exactly 1 s , = 1, , , are less
than () .

The function has a name, viz. Empirical distribution function.

Exercise: Plot the function and state properties.

Q3. Given the data {1 , , } and 0 < < 1, can you find a number such that

100 percent of the data do not exceed .

Ans. Recall that () is the proportion of the data not exceeding .


Therefore , to answer the above question, it is natural to solve the equation () = .

However from the definition of () it is important to realize that there may not be any .

for which () = . (Why is that so? Well () can be only equal to one of the + 1
1 2 1
numbers {0, , ......., , 1} and may not be equal to any one of these numbers.)

Moreover, if such a exists, it may not be unique. Eg. 4 () = 0.5, (2) < (3) .

However, our purpose is served if we can get a , such that

1. () and

2. for any number < , () < .

Since is a non negative monotonically non decreasing function increasing to 1, the set
{: () } is bounded below. Therefore we can define

= inf{: () }.

Note: Such a satisfies 1 and 2 (why? < and () , then is not even a lower
bound of the set {: () }. Therefore < () < .)

The in the above definition is the 100 percentile.

Percentile: Given the data {1 , , } and 0 < < 1, the 100 percent
percentile is denoted by , and is defined as

= inf{: () }.

The percentiles divide the data 100 equal parts.

Quartiles: We can divide the data into four parts using the 25th, 50th and 75th
percentiles, viz , = 0.25, 0.50, 0.75. These percentiles are called quartiles,
denoted by 1 , 2 , 3 .

So therefore, there are 25percent of the data not exceeding 1, 50percent of the
data not exceeding 2 , 75percent of the data not exceeding 3 .

Ex1. What percent of the data are between 1 and 3 ?

Ex2. What percent of the data are between 2 and 3 ?

Ex3. What is the relation between 2 and the median ?

You might also like