Professional Documents
Culture Documents
# 6. Partitioning (Sharding) (18)
Is for achieving scalability. Each partition is a small database on its own. But the db can support
operations involving multiple partitions at the same time.
## Summary
Partitioning scheme should depend on your data. The goal is to avoid hot spots (disproportional
high load).
Partitioning approaches:
- partitions rebalanced dynamically by splitting the range into 2 subranges and adding new
partitions
- Hash partitioning
- usually number of partitions is xed, but dynamic partitioning can also be used
# DISTRIBUTED DATA
There are various reasons why you might want to distribute a database across multiple
machines:
- Scalability
- Latency
- each user can be served from a datacenter that is geographically close to them
- Replication
- Partitioning (Sharding)
In distributed systems part of the system can break in unpredictable way while some other parts
are still working. Whe situation may be nondeterministic - for the same actions we might get
di erent results (failure/success).
It's not about the ultimate design, but about my thought process and COMMUNICATION. Talk out
loud about every point - assumptions, tradeo s, estimations.
ff
ffi
fi
ffi
ff
ff
Use interviewers guidance
1. Scope
- communicate about the limitations in your design, so that interviewer knows you are aware of
them
1. Ask questions
2. Pretend that the data can all t on one machine and there are no memory limitations. How
would you solve the problem? The answer to this question will provide the general outline for your
solution.
fi
fi
fi
ff