Hive-Analytical-Window Functions

HIVE ANALYTICAL FUNCTIONS...rank...row_number....
denserank
----------------------------------------------------------
hive> create table ranktab(name string,id int,amt int)
> row format delimited
> fields terminated by ','
> lines terminated by '\n'
> stored as textfile ;
OK
Time taken: 0.083 seconds
hive> LOAD DATA LOCAL INPATH 'window_rankData.csv' INTO TABLE ranktab;
Loading data to table default.ranktab
Table default.ranktab stats: [numFiles=1, totalSize=143]
OK
Time taken: 0.287 seconds
hive>
----------------------------------------------------------------------------
RANK()
"The rank function will return the rank of the values

as per the result set of the over clause.
If two values are same then it will give the same rank
to those 2 values and then for the next value,
the sub-sequent rank will be skipped."
hive> select name, volume as amount , rank() over (order by V.volume desc) as rank
from
> (select name,sum(amt) as volume from ranktab group by name) V;
OUTPUT:
RAJU 154000 1
JAYA 110000 2
MAHI 89000 3
RAJA 56000 4
ADHI 46000 5
RAVI 45000 6
SOMA 12000 7
RAMYA 12000 7
RAMA 12000 7
RAKHI 10000 10
MUKTHA 8000 11
NOTE: in the above for 3 members same rank was given as the amounts after sum is
same
------------------------------------------------------------------------------
ROW_NUMBER()
"Row number will return the continuous sequence of numbers

for all the rows of the result set of the over clause."
hive> select name, volume as amount , row_number() over (order by V.volume desc) as
rownumber from
OUTPUT:
RAJU 154000 1
JAYA 110000 2
MAHI 89000 3
RAJA 56000 4
ADHI 46000 5
RAVI 45000 6
SOMA 12000 7
RAMYA 12000 8
RAMA 12000 9
RAKHI 10000 10
MUKTHA 8000 11
NOTE: In the above output , for each record a unique ROW NUMBER has been given
even though the SUM is same
--------------------------------------------------------------------------------
DENSE_RANK()
"It is same as the rank() function but the difference is if any duplicate value
is present then the rank will not be skipped for the subsequent rows. Each unique
value will get the ranks in a sequence."
hive> select name, volume as amount , dense_rank() over (order by V.volume desc) as
denserank from
OUTPUT:
RAJU 154000 1
JAYA 110000 2
MAHI 89000 3
RAJA 56000 4
ADHI 46000 5
RAVI 45000 6
SOMA 12000 7
RAMYA 12000 7
RAMA 12000 7
RAKHI 10000 8
MUKTHA 8000 9
NOTE: In the above output, Rank is NOT Skipped, it continued ( from 7th Rank to 8th
Rank)
-------------------------------------------------------------------------
PERCENT_RANK()
hive> select name, volume as amount , percent_rank() over (order by V.volume desc)
as denserank from
OUTPUT:
RAJU 154000 0.0

JAYA 110000 0.1
MAHI 89000 0.2
RAJA 56000 0.3
ADHI 46000 0.4
RAVI 45000 0.5
SOMA 12000 0.6
RAMYA 12000 0.6
RAMA 12000 0.6
RAKHI 10000 0.9
MUKTHA 8000 1.0
------------------------------------------------------------------------------
NTILE()
"NTILE is an analytic function. It divides an ordered data set into a number of

buckets indicated by expr and assigns the appropriate bucket number to each row."
hive> select name, volume as amount , ntile(3) over (order by V.volume desc) as
denserank from
OUTPUT:
RAJU 154000 1
JAYA 110000 1
MAHI 89000 1
RAJA 56000 1
ADHI 46000 2
RAVI 45000 2
SOMA 12000 2
RAMYA 12000 2
RAMA 12000 3
RAKHI 10000 3
MUKTHA 8000 3
-----------------------
hive> select name, volume as amount , ntile(4) over (order by V.volume desc) as
denserank from
OUTPUT:
RAJU 154000 1
JAYA 110000 1
MAHI 89000 1
RAJA 56000 2
ADHI 46000 2
RAVI 45000 2
SOMA 12000 3
RAMYA 12000 3
RAMA 12000 3
RAKHI 10000 4
MUKTHA 8000 4
=========================================================================
HIVE WINDOWING FUNCTIONS
FIRST_VALUE(<<column>>)
"It returns the value of the first row from that window"
hive> select name , first_value(amt) over ( partition by name order by id) from
ranktab;
OUTPUT:
ADHI 46000
JAYA 34000
JAYA 34000
MAHI 89000
MUKTHA 8000
RAJA 33000
RAJA 33000
RAJU 45000
RAJU 45000
RAJU 45000
RAKHI 10000
RAMA 12000
RAMYA 12000
RAVI 45000
SOMA 12000
-----------------------------------------------------------------------------
LAST_VALUE(<<column>>)
It is the reverse of FIRST_VALUE. It returns the value of the last row from that
window.
hive> select name , last_value(amt) over ( partition by name) from ranktab;
OUTPUT:
ADHI 46000
JAYA 76000
JAYA 76000
MAHI 89000
MUKTHA 8000
RAJA 23000
RAJA 23000
RAJU 55000
RAJU 55000
RAJU 55000
RAKHI 10000
RAMA 12000
RAMYA 12000
RAVI 45000
SOMA 12000

Hive-Analytical-Window Functions

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Hive-Analytical-Window Functions

Uploaded by

Copyright:

Available Formats

HIVE ANALYTICAL FUNCTIONS...rank...row_number....

"The rank function will return the rank of the values

"Row number will return the continuous sequence of numbers

RAJU 154000 0.0

"NTILE is an analytic function. It divides an ordered data set into a number of

hive> select name , last_value(amt) over ( partition by name) from ranktab;

You might also like