You are on page 1of 52

Unit 4

Working with NoSQL

School of Computer Science and Engineering


CONTENTS

4.1 Surveying Database Internals


4.2 Migrating from RDBMS to NoSQL
4.3 Web Frameworks and NoSQL
4.4 Using MySQL as a NoSQL

2
SUB-CONTENTS

4.1 Surveying Database Internals


4.1.1 MongoDB Internals
4.1.2 Membase Architecture
4.1.3 Hypertable under the hood
4.1.4 Apache Cassandra
4.1.5 Berkeley DB

3
4.1.1 MONGODB INTERNALS

• MongoDB follows client-server architecture, commonly found in traditional RDBMSs.

• Client-server architecture involves a single server and multiple clients connecting to the server.

• In a sharded and replicated scenario, multiple servers — instead of only one — form the topology.

• In a standalone mode or in a clustered and sharded topology, data is transported from client to
server and back, and among the nodes.

4
4.1.1 MONGODB INTERNALS

5
4.1.1 MONGODB INTERNALS
MONGODB WIRE PROTOCOL
• The wire protocol used for the communication is a simple request-response-based socket protocol.

• The wire protocol headers and payload are BSON encoded.

• The ordering of messages follows the little endian format, which is the same as in BSON.

• In a standard request-response model, a client sends a request to a server and the server responds
to the request.

• In terms of the wire protocol, a request is sent with a message header and a request payload.

• A response comes back with a message header and a response payload.

• The format for the message header between the request and the response is quite similar.

6
4.1.1 MONGODB INTERNALS

MONGODB WIRE PROTOCOL

• Figure 13-1 depicts the basic request-response communication between a client and a MongoDB
server.

7
4.1.1 MONGODB INTERNALS

MONGODB WIRE PROTOCOL


• The MongoDB wire protocol allows a number of operations. The allowed operations are as follows:

8
4.1.1 MONGODB INTERNALS

INSERTING A DOCUMENT
• When creating and inserting a new document, a client sends an OP_INSERT operation via a request that includes:

9
4.1.1 MONGODB INTERNALS
QUERYING A COLLECTION
• When querying for documents in a collection, a client sends an OP_QUERY operation via a request.
• It receives a set of relevant documents via a database response that involves an OP_REPLY operation.

• An OP_QUERY message from the client includes:

10
4.1.1 MONGODB INTERNALS
QUERYING A COLLECTION
• In response to a client OP_QUERY operation request, a MongoDB database server responds with an
OP_REPLY. An OP_REPLY message from the server includes:

11
4.1.1 MONGODB INTERNALS

MONGODB DATABASE FILES

• MongoDB stores database and collection data in fi les that reside at a path specified by the --dbpath
option to the mongod server program.

• The default value for dbpath is /data/db.

• To use the shell, first start mongod.

• Then connect to the server using the command-line program.

12
4.1.1 MONGODB INTERNALS

MONGODB DATABASE FILES

• After you connect, query for a collection’s size as follows:

• To get the storage size for the collection, query as follows:

• To query for the total size of the collection, that is, data, unallocated storage, and index storage, you can
query as follows:

13
4.1.1 MONGODB INTERNALS

MONGODB DATABASE FILES


• To get fully qualified names and database and collection names of all indexes related to the movies
collection, query for all namespaces in the system as follows:

• Relevant pieces of collections in MongoDB instance for the current example are as follows:

• To view the index collection data size, storage size, and the total size, query as follows:

14
4.1.1 MONGODB INTERNALS

4.1.4 MONGODB DATABASE FILES


• You can also use the collection itself to get the index data size as follows:

• You could run validate on the movies collection as follows:

15
4.1.2 MEMBASE ARCHITECTURE

• Membase supports the Memcached protocol and so client applications that use Memcached can
easily include Membase in their application stack.

• Behind the scenes, though, Membase adds capabilities like persistence and replication that Memcached
does not support.

• Each Membase node runs an instance of the ns_server, which is sort of a node supervisor and manages
the activities on the node.

• Clients interact with the ns_server using the Memcached protocol or a REST interface.

16
4.1.2 MEMBASE ARCHITECTURE

• The REST interface is supported with the help of a component called Menelaus.

• Menelaus includes a robust jQuery layer that maps REST calls down to the server.

• Clients accessing Membase using the Memcached protocol reach the under data through a
proxy called Moxi.

• Moxi acts as an intermediary that with the help of vBuckets always routes clients to the appropriate
place.

• To understand how vBuckets route information correctly, you need to dig a bit deeper into the consistent
hashing used by vBuckets.

17
4.1.2 MEMBASE ARCHITECTURE

• The essence of vBuckets-based routing is illustrated in Figure 13-2.

18
4.1.2 MEMBASE ARCHITECTURE

• The essence of vBuckets-based routing is illustrated in Figure 13-2.

19
4.1.3 HYPERTABLE UNDER THE HOOD

• Hypertable is a high-performance alternative to HBase.

• Hypertable runs on top of a distributed filesystem like HDFS.

• In HBase, column-family-centric data is stored in a row-key sorted and ordered manner. You also
learned that each cell of data maintains multiple versions of data. Hypertable supports similar ideas.

• In Hypertable all version information is appended to the row-keys.

• The version information is identified via timestamps.

20
4.1.3 HYPERTABLE UNDER THE HOOD

• All data for all versions for each row-key is stored in a sorted manner for each column-family.

• Hypertable provides a column-family-centric data-store but its physical storage characteristics are also
affected by the notion of access groups.

• Access groups in Hypertable provide a way to physically store related column data together.

• With Hypertable access groups you have the flexibility to put one or more columns in the same group.

• Hypertable provides the following:


i) Regular Expression Support
ii) Bloom Filter

21
4.1.3 HYPERTABLE UNDER THE HOOD

1) REGULAR EXPRESSION SUPPORT

22
4.1.3 HYPERTABLE UNDER THE HOOD

2) BLOOM FILTER

23
4.1.4 APACHE CASSANDRA

• Apache Cassandra is simultaneously a very popular and infamous NoSQL database.

• Apache Cassandra has the following properties:

i) Peer-to-Peer Model

ii) Based on Gossip and Anti-entropy

iii) Fast Writes

iv) Hinted Handoff

24
4.1.4 APACHE CASSANDRA

1) PEER-TO-PEER MODEL

25
4.1.4 APACHE CASSANDRA
2) BASED ON GOSSIP AND ANTI-ENTROPY

26
4.1.4 APACHE CASSANDRA

3) FAST WRITES

27
4.1.4 APACHE CASSANDRA

4) HINTED HANDOFF

28
4.1.5 BERKELY DB

Berkeley DB comes in three distinct flavors and supports multiple configurations:

 Berkeley DB — Key/value store programmed in C. This is the original flavor.

 Berkeley DB Java Edition (JE) — Key/value store rewritten in Java. Can easily be
incorporated into a Java stack.

 Berkeley DB XML — Written in C++, this version wraps the key/value store to behave as an
indexed and optimized XML storage system.

29
4.1.5 BERKELY DB

30
4.1.5 BERKELY DB

31
4.1.5 BERKELY DB

STORAGE CONFIGURATION

Key/value pairs can be stored in four types of data structures:

1) B-tree

2) Hash

3) Queue

4) Recno

32
4.1.5 BERKELY DB STORAGE CONFIGURATION

B-TREE STORAGE

33
4.1.5 BERKELY DB STORAGE CONFIGURATION

HASH STORAGE

34
4.1.5 BERKELY DB STORAGE CONFIGURATION

QUEUE STORAGE

35
4.1.5 BERKELY DB STORAGE CONFIGURATION

RECNO STORAGE

36
4.2 MIGRATING FROM RDBMS TO NOSQL

37
SUB-CONTENTS

4.3 Web Frameworks and NoSQL


Building scalable web applications can be a very challenging experience.
Requirements keep changing and data keeps evolving.
In such situations traditional RDBMSs tend to be a little less flexible.
Document databases are a good fi t for some of these use cases.

4.3.1 Using Rails with NoSQL


4.1.2 Using Django with NoSQL
4.1.3 Using Spring Data

38
4.3.1 USING RAILS WITH NOSQL

39
4.3.1 USING RAILS WITH NOSQL

40
4.3.2 USING DJANGO WITH NOSQL

• Django is to the Python community what Rails is to Ruby developers.

• Django is a lightweight web framework that allows for rapid prototyping and fast development.

• The Django idiom is also based on an ORM to map models to databases.

• The SQL standard and the presence of a disintermediating ORM layer makes it possible for Django
applications to swap one RDBMS for another.

• Application sources that work seamlessly across NoSQL products and with both SQL and NoSQL
products are desirable.

41
4.3.2 USING DJANGO WITH NOSQL

42
4.3.2 USING DJANGO WITH NOSQL

43
4.3.3 USING SPRING DATA

Maven makes it elegant and easy to build a project, define its dependencies, manage these dependencies.

44
4.4 USING MYSQL AS A NOSQL SOLUTION

45
4.4 USING MYSQL AS A NOSQL SOLUTION

46
4.4 USING MYSQL AS A NOSQL SOLUTION

Figure 15-2 depicts a typical MySQL with the Memcached data store being accessed by a client.

47
4.4 USING MYSQL AS A NOSQL SOLUTION

Using Memcached with MySQL is beneficial but the architecture has its drawbacks:

 Data is in-memory in two places: the storage engine buffer and Memcached.

 Replication of data between the storage engine and Memcached can have inconsistent states
of data.

 The data is fetched into Memcached via the SQL layer and so the SQL overhead is still present,
even if it’s minimized.

 Memcached performance is superior only until all relevant data fi ts in memory. Disk I/O
overheads can be high and can make the system slow.

48
4.4 USING MYSQL AS A NOSQL SOLUTION

• An alternative to using MySQL with Memcached is to bypass the SQL layer and get directly to
the storage engine.

• This is exactly what the HandlerSocket plugin for MySQL does.

• The HandlerSocket plugin for MySQL is an open-source plugin that allows the bypassing of
the SQL layer to access the underlying MySQL storage engine.

49
4.4 USING MYSQL AS A NOSQL SOLUTION

50
4.4 USING MYSQL AS A NOSQL SOLUTION

51
THANK YOU

You might also like