You are on page 1of 14

Data Warehousing -

Partitioning Strategy
Need for partitioning

Enhance performance and facilitate easy


management of data.
Partitioning also helps in balancing the various
requirements of the system.
It optimizes the hardware performance and
simplifies the management of data warehouse by
partitioning each fact table into multiple separate
partitions.
Effortless management
Backup/Recovery
Partitioning strategy

Range partitioning (eg:Time period)


List partitioning (Region)
Hash partitioning (Uniform Distribution of data)
Composite partitioning (combination of 2 or more)
Sub partitioning
Replication partitioning (Load balancing& parallel
processing)
The list of popular data partitioning techniques is as
follows:
Horizontal Partitioning
Vertical Partitioning
Horizontal Partitioning

In this technique, the dataset is divided based on rows


or records. Each partition contains a subset of rows,
and the partitions are typically distributed across
multiple servers or storage devices. Horizontal
partitioning is often used in distributed databases or
systems to improve parallelism and enable load
balancing
Advantages:
 Greater scalability: By distributing data among several servers or storage devices,
horizontal partitioning makes it possible to process large datasets in parallel.
 Load balancing: By partitioning data, the workload can be distributed equally among
several nodes, avoiding bottlenecks and enhancing system performance.
 Data separation: Since each partition can be managed independently, data isolation and
fault tolerance are improved. The other partitions can carry on operating even if one
fails.
Disadvantages:
 Join operations: Horizontal partitioning can make join operations across multiple
partitions more complex and potentially slower, as data needs to be fetched from
different nodes.
 Data skew: If the distribution of data is uneven or if some partitions receive more
queries or updates than others, it can result in data skew, impacting performance and
load balancing.
 Distributed transaction management: Ensuring transactional consistency across
multiple partitions can be challenging, requiring additional coordination mechanisms.
Vertical Partitioning

Unlike horizontal partitioning, vertical partitioning divides the dataset based on columns or attributes.
In this technique, each partition contains a subset of columns for each row. Vertical partitioning is
useful when different columns have varying access patterns or when some columns are more frequently
accessed than others.
Advantages:
 Improved query performance: By placing frequently accessed columns in a separate partition,
vertical partitioning can enhance query performance by reducing the amount of data read from
storage.
 Efficient data retrieval: When a query only requires a subset of columns, vertical partitioning allows
retrieving only the necessary data, saving storage and I/O resources.
 Simplified schema management: With vertical partitioning, adding or removing columns becomes
easier, as the changes only affect the respective partitions.
Disadvantages:
 Increased complexity: Vertical partitioning can lead to more complex query execution plans, as
queries may need to access multiple partitions to gather all the required data.
 Joins across partitions: Joining data from different partitions can be more complex and potentially
slower, as it involves retrieving data from different partitions and combining them.
 Limited scalability: Vertical partitioning may not be as effective for datasets that continuously grow
in terms of the number of columns, as adding new columns may require restructuring the
partitions.
Vertical Partition

 splits the data vertically. The following images depicts how vertical

partitioning is done.
Vertical partitioning can be performed in the following
two ways −
Normalization
Row Splitting
Normalization
Normalization is the standard relational method of
database organization. In this method, the rows are
collapsed into a single row, hence it reduce space. Take
a look at the following tables that show how
normalization is performed.
Product sales_d Store_i Store_n Locatio
Qty Value Region
_id ate d ame n
3-Aug- Bangal
30 5 3.67 16 sunny S
13 ore
3-Sep- Bangal
35 4 5.33 16 sunny S
13 ore
3-Sep- Mumba
40 5 2.50 64 san W
13 i
3-Sep- Bangal
45 7 5.66 16 sunny S
13 ore
Table after Normalization

Store_id Store_name Location Region


16 sunny Bangalore W
64 san Mumbai S
Product_id Quantity Value sales_date Store_id
30 5 3.67 3-Aug-13 16
35 4 5.33 3-Sep-13 16
40 5 2.50 3-Sep-13 64
45 7 5.66 3-Sep-13 16
Row Splitting
Row splitting tends to leave a one-to-one map
between partitions. The motive of row splitting is to
speed up the access to large table by reducing its size.
Note − While using vertical partitioning, make sure
that there is no requirement to perform a major join
operation between two partitions.

You might also like