You are on page 1of 3

Partition Wise-Joins

Partition-wise joins reduce query response time by minimizing the amount of data exchanged among parallel execution servers when joins execute in parallel. This significantly reduces response time and improves the use of both CP and memory resources. !n "racle #eal $pplication Clusters %#$C& environments' partition-wise joins also avoid or at least limit the data traffic over the interconnect' which is the (ey to achieving good scalability for massive join operations.
THE THEORY

$ partition wise join is a join between %for simplicity& two tables that are partitioned on the same column with the same partitioning scheme. !n shared nothing this is effectively hard partitioning locating data on a specific node ) storage combo. !n "racle it is logical partitioning. !f you now join the two tables on that partitioned column you can brea( up the join in smaller joins exactly along the partitions in the data. *ince they are partitioned %grouped& into the same buc(ets' all values required to do the join live in the equivalent buc(et on either sides. +o need to tal( to anyone else' no need to redistribute data to anyone else... in short' the optimal join method for parallel processing of two large data sets.
PWJ'S IN ORACLE

*ince we do not hard partition the data across nodes in "racle we use the Partitioning option to the database to create the buc(ets' then set the ,egree of Parallelism %,"P& and get our P-.s. The main questions always as(ed are/
1. How many partitions should I create? 2. What should my DOP be?

!n a shared nothing system the answer is of course' as many partitions as there are nodes which will be your ,"P. !n "racle we do want you to loo( at the wor(load and concurrency' and once you (now that to understand the following rules of thumb. -ithin "racle we have more ways of joining of data' so it is important to understand some of the P-. ideas and what it means if you have an uneven distribution across processes. $ssume we have a simple scenario where we partition the data on a hash (ey resulting in 0 hash partitions %12 -10&. -e have 3 parallel processes that have been tas(ed with reading these partitions %P2 - P3&. The wor( is evenly divided assuming the partitions are the same size and we can scan this in time t2 as shown below.

Now assume that we have chan ed the system and have a !th partition but still have our 2 wor"ers P1 and P2. #he time it ta"es is actually !$% more assumin the !th partition has the same si&e as the ori inal H1 ' H( partitions.

!n other words to scan these 4 partitions' the time t3 it ta(es is not 2)4th more expensive' it is a lot more expensive and some other join plans may now start to loo( exciting to the optimizer. .ust to post the disclaimer' it is not as simple as ! state it here' but you get the idea on how much more expensive this plan may now loo(... 5ased on this little example there are a few rules of thumb to follow to get the partition wise joins. First' choose a ,"P that is a factor of two %3&. *o always choose something li(e 3' 0' 6' 27' 83 and so on... Second' choose a number of partitions that is larger or equal to 39 ,"P. Third' ma(e sure the number of partitions is divisible through 3 without orphans. This is also (nown as an even number... Fourth' choose a stable partition count strategy' which is typically hash' which can be a sub partitioning strategy rather than the main strategy %range - hash is a popular one&. Fifth' ma(e sure you do this on the join (ey between the two large tables you want to join %and this should be the obvious one...&. Translating this into an example/ ,"P : 6 %determined based on concurrency or by using $uto ,"P with a cap due to concurrency& says that the number of partitions ;: 27. +umber of hash %sub& partitions : 83' which gives each process four partitions to wor( on. This number is somewhat arbitrary and depends on your data and system. !n this case my main reasoning is that if you get more room on the box you can easily move the ,"P for the query to 27 without repartitioning... and of course it ma(es for no leftovers on the table...

Partition-wise joins can be full or partial. "racle decides which type of join to use.

Full P rtition!Wise Joins

$ full partition-wise join divides a large join into smaller joins between a pair of partitions from the two joined tables. To use this feature' you must equipartition both tables on their join (eys. <ull partition-wise joins can occur if two tables that are co-partitioned on the same (ey are joined in a query. The tables can be co-partitioned at the partition level' or at the subpartition level' or at a combination of partition and subpartition levels. #eference partitioning is an easy way to guarantee co-partitioning. <ull partition-wise joins can be executed serially and in parallel.

P rti l P rtition!Wise Joins


"racle can perform partial partition-wise joins only in parallel. nli(e full partition-wise joins' partial partitionwise joins require you to partition only one table on the join (ey' not both tables. The partitioned table is referred to as the reference table. The other table may or may not be partitioned. Partial partition-wise joins are more common than full partition-wise joins. To execute a partial partition-wise join' "racle dynamically repartitions the other table based on the partitioning of the reference table. "nce the other table is repartitioned' the execution is similar to a full partition-wise join. The performance advantage that partial partition-wise joins have over joins in non-partitioned tables is that the reference table is not moved during the join operation. Parallel joins between non-partitioned tables require both input tables to be redistributed on the join (ey. This redistribution operation involves exchanging rows between parallel execution servers. This is a CP -intensive operation that can lead to excessive interconnect traffic in "racle #eal $pplication Clusters environments. Partitioning large tables on a join (ey' either a foreign or primary (ey' prevents this redistribution every time the table is joined on that (ey. "f course' if you choose a foreign (ey to partition the table' which is the most common scenario' select a foreign (ey that is involved in many queries.

"enefits of P rtition!Wise Joins


Partition-wise joins offer benefits described in the following topics/

#eduction of Communications "verhead #eduction of =emory #equirements

You might also like