You are on page 1of 13

Cassandra Anti-Patterns (in 5m) Matthew F.

Dennis // @mdennis

Non-Sun (err, Non-Oracle) JVM


No OpenJDK No Blackdown (anyone still use this?) Etc, etc, etc; just use the Sun (Oracle) JVM At least u22, but in general the latest release (unless you have specific reasons otherwise)

CommitLog+Data On The Same Disk

Don't put the commit log and data directories on the same set of spindles

commit log gets a single spindle entirely to itself (standard consumer SATA disks easily sustain > 80 MB/s in sequential writes)

DOES NOT APPLY TO SSDS or EC2


SSDs have no seek time EC2 ephemeral drives are still virtualized (but not the same as EBS) On EC2 or SSDs: use one RAID set for both the commit log and data directories

EBS volumes on EC2

Sounds great, nice feature set, but


Not predictable freezes are common Throughput limited in many cases Stripe them Both commit log and data directory on the same raid set

Use ephemeral drives instead


Oversized JVM heaps

6 8 GB is good (assuming sufficient ram on your boxen) 10 12 GB is possible and in some circumstances correct 16GB == max JVM heap size > 16GB => badness JVM heap ~= boxen RAM => badness (always)

JVM heap size -v- GC suckage

GC Suckage

~16GB

~10GB ~6GB

JVM heap size

Large batch mutations


(large in number of distinct rows)

Timeout / failure => entire mutation must be retried => wasted work Larger mutations => higher likely hood of timehood 1000 mutations to perform? Do 100 batches of 10 in parallel instead of one batch of 1000 Exact number or rows/batch is variable depending on HW, network, load, etc; experiment! (10-100 is a good starting point)

OPP / BOP partitioner

You probably shouldn't use it

No really, you almost certainly shouldn't use it

Creates hot spots Requires baby sitting from ops Not as well tested nor is it widely deployed

C* auto selection of tokens


Always specify your initial token. Auto select doesn't do what you think it does nor does it do what you want

loadbalance is even worse, it doesn't currently do what you think, what you want or what it claims; F#@* my cluster would be a much more apt name than loadbalance Future (next?) release of OPSC will remove your balancing woes

Super Columns

10 15 percent performance penalty on reads and writes Easier / better to use to composite columns

0.8.x makes this a lot easier Done manually in 0.7.x and is still better

Devs working in C* code despise (loathe?) them API probably won't be deprecated, but implementation will be replaced behind the seen with composites (may be ok at that point to use them, but should probably just use composite API direclty) Cassandra and DataStax is committed to maintain the API going forward, even if the implementation changes

Read Before Write


Race conditions Abuses/Thrashes cache (row, key and page) Increases latency Increases IO requirements (by a lot) Increases size in the client

Winblows

Try to avoid it, you'll be happier

Not always possible? Then, I'm sorry for your pain

Run 'nix (in particular, probably Linux)


Easier to get help (IRC, email, meetups, etc) C* performs better Better tested Cheaper Wider deployed (by a lot)

Q?
Cassandra Anti-Patterns Matthew F. Dennis // @mdennis

You might also like