You are on page 1of 46

MongoDB Caching

Internals
Hey, I am Uddeshya 👋
SDE-2 @ GTF (GoTo Finance), OSS Enthusiast, Occasional database
tinkerer, Manchester United Masochist

Twt - @uds5501
Github - @uds5501
Email - singhuddeshyaofficial@gmail.com
Agenda (Broadly)

● Honorary mention of mongoDB


● Introduction to wiredTiger.
● Glimpse of WiredTiger architecture for a write transaction.
● How does the cache fill up?
● Deep diving in cache eviction strategy.
This talk is not… ❌

● An endorsement of MongoDB
○ Choice of databases is very subjective, choose yours as per
your use case.
● An exhaustive walkthrough of MongoDB features.
● A guide on mongoDB best practices.
MongoDB
Famous NoSQL database with highly flexible data modelling support
● Supports Transactions.
● Loosely supports ACID properties.
● Supports sharding and fault tolerance.
● Amazing database for side projects [personal opinion]
General
databases?
WiredTiger Storage Engine

The Good
● Default mongoDB on-disk storage engine since MongoDB 3.2
● Document level concurrency.
● Snapshot and Checkpoint durability.
● Supports Journaling.
● Supports on-disk compression algorithms.
● Good ol' plug and play.
WiredTiger Storage Engine

The Bad
● Can't pin documents in cache.
● No separation for reads and writes in cache.
● It doesn't allocate cache on a per-database or per-collection level.
Architecture Overview
Act - 1
Gateways to engine
APIs
Act - 2
The building blocks
Schema

● Defines data format for storage.


● Supports both row storage and
column storage.
● key,value pairs.
● ranging from size 512 B - 4GB in
form of raw byte arrays.
Cursors

● Basic tool for in-memory manipulation of


objects.
● Iterate / Get / Set / Update.
Metadata

● Tracks files, indexes, history files for a


user's database.
dHandles

● Generic data structures to point to the


storage data structures like B Trees / LSM
Trees / Bloom filters / Metadata pages
etc.
Act - 3
In Memory Storage
B Trees
Act - 4
Writing something?
Transactions!

● Backbone of document level concurrency.


● A single thread supports a single
transaction.
● 3 levels of isolation:
○ Snapshot isolation
○ read committed isolation
○ read uncommitted isolation.
Snapshots

● Copy of state at the start of a


transaction.
● Stores
○ Maximum transaction ID
■ t_id >= max is invisible.
○ Concurrent transaction IDs.
■ not committed, hence invisible.
○ Minimum Transaction IDs
■ committed, hence visible.
Cache

● Holds copies of recently accessed and


modified data.
● Cache loads btree pages into memory as
required.
● The cache size is generally fixed.
○ 50% of VM memory.
○ can be configured on runtime.
Block Manager

● Responsible for reading and writing data


from the disk.
● Page header
○ metadata of page it was
created from.
● Block header
○ size
○ checksum
○ flags
○ padding
Tracking the blocks
Why use skiplists?
● O(logN) average inserts, updates and deletes.
● Similar performance to AVL trees but with simpler implementation.
● Higher memory overhead.
● 2 sets of skip list required at each checkpointing process.
When to use what?
● First fit
○ Used for root pages.
○ root pages are created for elements with no on-disk footprint.
● Best fit
○ Used by default for other pages.
○ Helps reduce disk fragmentation
Enough of disk, back to memory.
Caching internals!
Cache struction tracks
● Clean bytes
● Dirty bytes
● Update bytes
● Flags
● Eviction progress
● Eviction slots
● Eviction queues
Eviction dance
How are pages selected for eviction?
How many slots are available?
Where do I evict from?
Which page do I select to evict?
But what if cache grows too large?
Application threads pulling up.

● Application threads start evicting pages when


○ overall cache is >= 95% filled.
○ cache is >= 20% filled with dirty bytes.
○ cache is >= 10% filled with update bytes.
Optimization conclusions

● Block manager:
○ Skip lists usage
○ When to use first fit vs best fit.
● Eviction process:
○ Early exits from cache walks when using eviction threads.
○ Keeping a small thread pool for eviction.
○ Not interrupting application threads' transaction duties.
○ Prioritizing cache eviction only when in dire state.
Arigato!

You might also like