You are on page 1of 4

HIVE ANALYTICAL FUNCTIONS...rank...row_number....

denserank
----------------------------------------------------------
hive> create table ranktab(name string,id int,amt int)
> row format delimited
> fields terminated by ','
> lines terminated by '\n'
> stored as textfile ;
OK
Time taken: 0.083 seconds
hive> LOAD DATA LOCAL INPATH 'window_rankData.csv' INTO TABLE ranktab;
Loading data to table default.ranktab
Table default.ranktab stats: [numFiles=1, totalSize=143]
OK
Time taken: 0.287 seconds
hive>
----------------------------------------------------------------------------
RANK()

"The rank function will return the rank of the values


as per the result set of the over clause.
If two values are same then it will give the same rank
to those 2 values and then for the next value,
the sub-sequent rank will be skipped."

hive> select name, volume as amount , rank() over (order by V.volume desc) as rank
from
> (select name,sum(amt) as volume from ranktab group by name) V;

OUTPUT:

RAJU 154000 1
JAYA 110000 2
MAHI 89000 3
RAJA 56000 4
ADHI 46000 5
RAVI 45000 6
SOMA 12000 7
RAMYA 12000 7
RAMA 12000 7
RAKHI 10000 10
MUKTHA 8000 11

NOTE: in the above for 3 members same rank was given as the amounts after sum is
same
------------------------------------------------------------------------------
ROW_NUMBER()

"Row number will return the continuous sequence of numbers


for all the rows of the result set of the over clause."

hive> select name, volume as amount , row_number() over (order by V.volume desc) as
rownumber from
> (select name,sum(amt) as volume from ranktab group by name) V;

OUTPUT:

RAJU 154000 1
JAYA 110000 2
MAHI 89000 3
RAJA 56000 4
ADHI 46000 5
RAVI 45000 6
SOMA 12000 7
RAMYA 12000 8
RAMA 12000 9
RAKHI 10000 10
MUKTHA 8000 11

NOTE: In the above output , for each record a unique ROW NUMBER has been given
even though the SUM is same
--------------------------------------------------------------------------------
DENSE_RANK()

"It is same as the rank() function but the difference is if any duplicate value
is present then the rank will not be skipped for the subsequent rows. Each unique
value will get the ranks in a sequence."

hive> select name, volume as amount , dense_rank() over (order by V.volume desc) as
denserank from
> (select name,sum(amt) as volume from ranktab group by name) V;

OUTPUT:

RAJU 154000 1
JAYA 110000 2
MAHI 89000 3
RAJA 56000 4
ADHI 46000 5
RAVI 45000 6
SOMA 12000 7
RAMYA 12000 7
RAMA 12000 7
RAKHI 10000 8
MUKTHA 8000 9

NOTE: In the above output, Rank is NOT Skipped, it continued ( from 7th Rank to 8th
Rank)
-------------------------------------------------------------------------
PERCENT_RANK()

hive> select name, volume as amount , percent_rank() over (order by V.volume desc)
as denserank from
> (select name,sum(amt) as volume from ranktab group by name) V;

OUTPUT:

RAJU 154000 0.0


JAYA 110000 0.1
MAHI 89000 0.2
RAJA 56000 0.3
ADHI 46000 0.4
RAVI 45000 0.5
SOMA 12000 0.6
RAMYA 12000 0.6
RAMA 12000 0.6
RAKHI 10000 0.9
MUKTHA 8000 1.0
------------------------------------------------------------------------------
NTILE()

"NTILE is an analytic function. It divides an ordered data set into a number of


buckets indicated by expr and assigns the appropriate bucket number to each row."

hive> select name, volume as amount , ntile(3) over (order by V.volume desc) as
denserank from
> (select name,sum(amt) as volume from ranktab group by name) V;

OUTPUT:

RAJU 154000 1
JAYA 110000 1
MAHI 89000 1
RAJA 56000 1
ADHI 46000 2
RAVI 45000 2
SOMA 12000 2
RAMYA 12000 2
RAMA 12000 3
RAKHI 10000 3
MUKTHA 8000 3
-----------------------
hive> select name, volume as amount , ntile(4) over (order by V.volume desc) as
denserank from
> (select name,sum(amt) as volume from ranktab group by name) V;

OUTPUT:

RAJU 154000 1
JAYA 110000 1
MAHI 89000 1
RAJA 56000 2
ADHI 46000 2
RAVI 45000 2
SOMA 12000 3
RAMYA 12000 3
RAMA 12000 3
RAKHI 10000 4
MUKTHA 8000 4

=========================================================================
HIVE WINDOWING FUNCTIONS

FIRST_VALUE(<<column>>)

"It returns the value of the first row from that window"

hive> select name , first_value(amt) over ( partition by name order by id) from
ranktab;

OUTPUT:

ADHI 46000
JAYA 34000
JAYA 34000
MAHI 89000
MUKTHA 8000
RAJA 33000
RAJA 33000
RAJU 45000
RAJU 45000
RAJU 45000
RAKHI 10000
RAMA 12000
RAMYA 12000
RAVI 45000
SOMA 12000
-----------------------------------------------------------------------------
LAST_VALUE(<<column>>)

It is the reverse of FIRST_VALUE. It returns the value of the last row from that
window.

hive> select name , last_value(amt) over ( partition by name) from ranktab;

OUTPUT:

ADHI 46000
JAYA 76000
JAYA 76000
MAHI 89000
MUKTHA 8000
RAJA 23000
RAJA 23000
RAJU 55000
RAJU 55000
RAJU 55000
RAKHI 10000
RAMA 12000
RAMYA 12000
RAVI 45000
SOMA 12000

You might also like