Professional Documents
Culture Documents
2
SUB-CONTENTS
3
4.1.1 MONGODB INTERNALS
• Client-server architecture involves a single server and multiple clients connecting to the server.
• In a sharded and replicated scenario, multiple servers — instead of only one — form the topology.
• In a standalone mode or in a clustered and sharded topology, data is transported from client to
server and back, and among the nodes.
4
4.1.1 MONGODB INTERNALS
5
4.1.1 MONGODB INTERNALS
MONGODB WIRE PROTOCOL
• The wire protocol used for the communication is a simple request-response-based socket protocol.
• The ordering of messages follows the little endian format, which is the same as in BSON.
• In a standard request-response model, a client sends a request to a server and the server responds
to the request.
• In terms of the wire protocol, a request is sent with a message header and a request payload.
• The format for the message header between the request and the response is quite similar.
6
4.1.1 MONGODB INTERNALS
• Figure 13-1 depicts the basic request-response communication between a client and a MongoDB
server.
7
4.1.1 MONGODB INTERNALS
8
4.1.1 MONGODB INTERNALS
INSERTING A DOCUMENT
• When creating and inserting a new document, a client sends an OP_INSERT operation via a request that includes:
9
4.1.1 MONGODB INTERNALS
QUERYING A COLLECTION
• When querying for documents in a collection, a client sends an OP_QUERY operation via a request.
• It receives a set of relevant documents via a database response that involves an OP_REPLY operation.
10
4.1.1 MONGODB INTERNALS
QUERYING A COLLECTION
• In response to a client OP_QUERY operation request, a MongoDB database server responds with an
OP_REPLY. An OP_REPLY message from the server includes:
11
4.1.1 MONGODB INTERNALS
• MongoDB stores database and collection data in fi les that reside at a path specified by the --dbpath
option to the mongod server program.
12
4.1.1 MONGODB INTERNALS
• To query for the total size of the collection, that is, data, unallocated storage, and index storage, you can
query as follows:
13
4.1.1 MONGODB INTERNALS
• Relevant pieces of collections in MongoDB instance for the current example are as follows:
• To view the index collection data size, storage size, and the total size, query as follows:
14
4.1.1 MONGODB INTERNALS
15
4.1.2 MEMBASE ARCHITECTURE
• Membase supports the Memcached protocol and so client applications that use Memcached can
easily include Membase in their application stack.
• Behind the scenes, though, Membase adds capabilities like persistence and replication that Memcached
does not support.
• Each Membase node runs an instance of the ns_server, which is sort of a node supervisor and manages
the activities on the node.
• Clients interact with the ns_server using the Memcached protocol or a REST interface.
16
4.1.2 MEMBASE ARCHITECTURE
• The REST interface is supported with the help of a component called Menelaus.
• Menelaus includes a robust jQuery layer that maps REST calls down to the server.
• Clients accessing Membase using the Memcached protocol reach the under data through a
proxy called Moxi.
• Moxi acts as an intermediary that with the help of vBuckets always routes clients to the appropriate
place.
• To understand how vBuckets route information correctly, you need to dig a bit deeper into the consistent
hashing used by vBuckets.
17
4.1.2 MEMBASE ARCHITECTURE
18
4.1.2 MEMBASE ARCHITECTURE
19
4.1.3 HYPERTABLE UNDER THE HOOD
• In HBase, column-family-centric data is stored in a row-key sorted and ordered manner. You also
learned that each cell of data maintains multiple versions of data. Hypertable supports similar ideas.
20
4.1.3 HYPERTABLE UNDER THE HOOD
• All data for all versions for each row-key is stored in a sorted manner for each column-family.
• Hypertable provides a column-family-centric data-store but its physical storage characteristics are also
affected by the notion of access groups.
• Access groups in Hypertable provide a way to physically store related column data together.
• With Hypertable access groups you have the flexibility to put one or more columns in the same group.
21
4.1.3 HYPERTABLE UNDER THE HOOD
22
4.1.3 HYPERTABLE UNDER THE HOOD
2) BLOOM FILTER
23
4.1.4 APACHE CASSANDRA
i) Peer-to-Peer Model
24
4.1.4 APACHE CASSANDRA
1) PEER-TO-PEER MODEL
25
4.1.4 APACHE CASSANDRA
2) BASED ON GOSSIP AND ANTI-ENTROPY
26
4.1.4 APACHE CASSANDRA
3) FAST WRITES
27
4.1.4 APACHE CASSANDRA
4) HINTED HANDOFF
28
4.1.5 BERKELY DB
Berkeley DB Java Edition (JE) — Key/value store rewritten in Java. Can easily be
incorporated into a Java stack.
Berkeley DB XML — Written in C++, this version wraps the key/value store to behave as an
indexed and optimized XML storage system.
29
4.1.5 BERKELY DB
30
4.1.5 BERKELY DB
31
4.1.5 BERKELY DB
STORAGE CONFIGURATION
1) B-tree
2) Hash
3) Queue
4) Recno
32
4.1.5 BERKELY DB STORAGE CONFIGURATION
B-TREE STORAGE
33
4.1.5 BERKELY DB STORAGE CONFIGURATION
HASH STORAGE
34
4.1.5 BERKELY DB STORAGE CONFIGURATION
QUEUE STORAGE
35
4.1.5 BERKELY DB STORAGE CONFIGURATION
RECNO STORAGE
36
4.2 MIGRATING FROM RDBMS TO NOSQL
37
SUB-CONTENTS
38
4.3.1 USING RAILS WITH NOSQL
39
4.3.1 USING RAILS WITH NOSQL
40
4.3.2 USING DJANGO WITH NOSQL
• Django is a lightweight web framework that allows for rapid prototyping and fast development.
• The SQL standard and the presence of a disintermediating ORM layer makes it possible for Django
applications to swap one RDBMS for another.
• Application sources that work seamlessly across NoSQL products and with both SQL and NoSQL
products are desirable.
41
4.3.2 USING DJANGO WITH NOSQL
42
4.3.2 USING DJANGO WITH NOSQL
43
4.3.3 USING SPRING DATA
Maven makes it elegant and easy to build a project, define its dependencies, manage these dependencies.
44
4.4 USING MYSQL AS A NOSQL SOLUTION
45
4.4 USING MYSQL AS A NOSQL SOLUTION
46
4.4 USING MYSQL AS A NOSQL SOLUTION
Figure 15-2 depicts a typical MySQL with the Memcached data store being accessed by a client.
47
4.4 USING MYSQL AS A NOSQL SOLUTION
Using Memcached with MySQL is beneficial but the architecture has its drawbacks:
Data is in-memory in two places: the storage engine buffer and Memcached.
Replication of data between the storage engine and Memcached can have inconsistent states
of data.
The data is fetched into Memcached via the SQL layer and so the SQL overhead is still present,
even if it’s minimized.
Memcached performance is superior only until all relevant data fi ts in memory. Disk I/O
overheads can be high and can make the system slow.
48
4.4 USING MYSQL AS A NOSQL SOLUTION
• An alternative to using MySQL with Memcached is to bypass the SQL layer and get directly to
the storage engine.
• The HandlerSocket plugin for MySQL is an open-source plugin that allows the bypassing of
the SQL layer to access the underlying MySQL storage engine.
49
4.4 USING MYSQL AS A NOSQL SOLUTION
50
4.4 USING MYSQL AS A NOSQL SOLUTION
51
THANK YOU