You are on page 1of 41

NoSQL / Spring Data

Polyglot Persistence – An introduction to Spring Data


Pronam Chatterjee
pronamc@vmware.com

© 2011 VMware Inc. All rights reserved


Presentation goal

How Spring Data simplifies the


development of NoSQL
applications

2
Agenda

• Why NoSQL?
• Overview of NoSQL databases
• Introduction to Spring Data
• Database APIs
- MongoDB
- HyperSQL
- Neo4J

3
Relational databases are great

• SQL = Rich, declarative query language


• Database enforces referential integrity
• ACID semantics
• Well understood by developers
• Well supported by frameworks and tools, e.g. Spring JDBC, Hibernate, JPA
• Well understood by operations
• Configuration
• Care and feeding
• Backups
• Tuning
• Failure and recovery
• Performance characteristics
• But….

4
The trouble with relational databases

• Object/relational impedance mismatch


- Complicated to map rich domain model to relational schema
• Relational schema is rigid
- Difficult to handle semi-structured data, e.g. varying attributes
- Schema changes = downtime or $$
• Extremely difficult/impossible to scale writes:
- Vertical scaling is limited/requires $$
- Horizontal scaling is limited or requires $$
• Performance can be suboptimal for some use cases

5
NoSQL databases have emerged…

Each one offers some combination of:


• High performance
• High scalability
• Rich data-model
• Schema less
In return for:
• Limited transactions
• Relaxed consistency
•…

6
… but there are few commonalities

• Everyone and their dog has written one


• Different data models
- Key-value
- Column
- Document
- Graph
• Different APIs – No JDBC, Hibernate, JPA (generally)
• “Same sorry state as the database market in the 1970s before SQL was invented”
http://queue.acm.org/detail.cfm?id=1961297

7
NoSQL databases have emerged…

• NoSQL usage small by


comparison…
• But growing…

8
Agenda
• Why NoSQL?
• Overview of NoSQL databases
• Introduction to Spring Data
• Database APIs
- MongoDB
- HyperSQL
- Neo4J

10
Redis
• Advanced key-value store
- Think memcached on steroids (the good kind)
- Values can be binary strings, Lists, Sets, Ordered Sets, Hash maps, ..
- Operations for each data type, e.g. appending to a list, adding to a
set, retrieving a slice of a list, …
- Provides pub/sub-based messaging K1 V1

• Very fast: K2 V2
- In-memory operations
- ~100K operations/second on entry-level hardware K3 V2

• Persistent
- Periodic snapshots of memory OR append commands to log file
- Limits are size of keys retained in memory.
• Has “transactions”
- Commands can be batched and executed atomically

11
Redis use cases
• Use in conjunction with another database as the SOR
• Drop-in replacement for Memcached
- Session state
- Cache of data retrieved from SOR
- Denormalized datastore for high-performance queries
• Hit counts using INCR command
• Randomly selecting an item – SRANDMEMBER
• Queuing – Lists with LPOP, RPUSH, ….
• High score tables – Sorted sets

Notable users: github, guardian.co.uk, ….

14
vFabric Gemfire - Elastic data fabric
• High performance data grid
• Enhanced parallel disk persistence
• Non Disruptive up/down scalability
• Session state
- Cache of data retrieved from SOR
- Denormalized datastore for high-performance queries
• Heterogenous data sharing
• Java
• .net
• C++
• Co-located Transactions

14
Gemfire - Use Cases

• Ultra low latency high throughput application


• As an L2 cache in hibernate
• Distributed Batch process
• Session state
- Tomcat
- tcServer
• Wide Area replication

14
Neo4j

•Graph data model


- Collection of graph nodes
- Typed relationships between nodes
- Nodes and relationships have properties
•High performance traversal API from roots
- Breadth first/depth first
•Query to find root nodes
- Indexes on node/relationship properties
- Pluggable - Lucene is the default
•Graph algorithms: shortest path, …
•Transactional (ACID) including 2PC
•Deployment modes
- Embedded – written in Java
- Server with REST API

15
Neo4j Data Model

16
Neo4j Use Cases

• Use Cases
- Anything social
- Cloud/Network management, i.e. tracking/managing physical/virtual resources
- Any kind of geospatial data
- Master data management
- Bioinformatics
- Fraud detection
- Metadata management
• Who is using it?
- StudiVZ (the largest social network in Europe)
- Fanbox
- The Swedish military
- And big organizations in datacom, intelligence, and finance that wish to remain anonymous

19
MongoDB

• Document-oriented database
- JSON-style documents: Lists, Maps, primitives
- Documents organized into collections (~table)
• Full or partial document updates
- Transactional update in place on one document
- Atomic Modifiers
• Rich query language for dynamic queries
• Index support – secondary and compound
• GridFS for efficiently storing large files
• Map/Reduce

20
Data Model = Binary JSON documents

{
"name" : "Ajanta",
One document
"type" : "Indian",
=
"serviceArea" : [
"94619", one DDD aggregate
"94618"
],
"openingHours" : [
{
• Sequence of bytes on disk = fast I/O
- No joins/seeks
"dayOfWeek" : Monday,
"open" : 1730,
- In-place updates when possible => no index updates
"close" : 2130 • Transaction = update of single document
}
],
"_id" : ObjectId("4bddc2f49d1505567c6220a0")
}

21
MongoDB query by example

• Find a restaurant that serves the 94619 zip code and is open at 6pm on a Monday

{
serviceArea:"94619",
openingHours: {
$elemMatch : {
"dayOfWeek" : "Monday",
"open": {$lte: 1800},
"close": {$gte: 1800}
}
}
} DBCursor cursor = collection.find(qbeObject);
while (cursor.hasNext()) {
DBObject o = cursor.next();

}

23
MongoDB use cases

• Use cases
- Real-time analytics
- Content management systems
- Single document partial update
- Caching
- High volume writes
• Who is using it?
- Shutterfly, Foursquare
- Bit.ly Intuit
- SourceForge, NY Times
- GILT Groupe, Evite,
- SugarCRM

Copyright (c) 2011 Chris Richardson. All rights reserved.

25
Other NoSQL databases

• SimpleDB – “key-value”
• Cassandra – column oriented database
• CouchDB – document-oriented
• Membase – key-value
• Riak – key-value + links
• Hbase – column-oriented…

http://nosql-database.org/ has a list of 122 NoSQL databases

26
Agenda

• Why NoSQL?
• Overview of NoSQL databases
• Introduction to Spring Data
• Database APIs
- MongoDB
- HyperSQL
- Neo4J

27
NoSQL Java APIs

Database Libraries
Redis Jedis, JRedis, JDBC-Redis, RJC

Neo4j Vendor-provided
MongoDB Vendor-provided Java driver
Gemfire Pure Java map API, Spring-Gemfire templates

But
• Usage patterns
• Tedious configuration
• Repetitive code
• Error prone code
•…

28
Spring Data Project Goals

• Bring classic Spring value propositions to a wide range of NoSQL databases:


- Productivity
- Programming model consistency: E.g. <NoSQL>Template classes
- “Portability”

30
Spring Data sub-projects

• Commons: Polyglot persistence


• Key-Value: Redis, Riak
• Document: MongoDB, CouchDB
• Graph: Neo4j
• GORM for NoSQL

http://www.springsource.org/spring-data

31
Many entry points to use

• Auto-generated repository implementations


• Opinionated APIs (Think JdbcTemplate)
• Object Mapping (Java and GORM)
• Cross Store Persistence Programming model
• Productivity support in Roo and Grails

32
Cloud Foundry supports NoSQL

MongoDB and Redis are provided as services


è Deploy your MongoDB and Redis applications in seconds

33
Agenda

• Why NoSQL?
• Overview of NoSQL databases
• Introduction to Spring Data
• Database APIs
- MongoDB
- HyperSQL
- Neo4J

34
Three databases for today’s talk

Document database

Relational database

Graph database

35
Three persistence strategies for today’s talk

• Lower level template approach


• Conventions based persistence (Hades)
• Cross-Store persistence using JPA and a NoSQL datastore

36
Spring Template Patterns

• Resource Management
• Callback methods
• Exception Translation
• Simple Query API

37
Repository Implementation

38
• Also known as HSQLDB or Hypersonic SQL
• Relational Database
• Table oriented data model
• SQL used for for queries
• … you know the rest…

39
Spring Data Repository Support

• Eliminate bolierplate code – only finder methods


• findByLastName – Specifications for type safe queries
• JPA CrietriaBuilder integration QueryDSL

40
• Type safe queries for multiple backends including JPA, SQL and MongoDB in Java
• Generate Query classes using Java APT
• Code completion in IDE
• Domain types and properties can be referenced safely
• Adopts better to refactoring changes in domain types

http://www.querydsl.com

41
QueryDSL

• Repository Support
• Spring Data JPA
• Spring data Mongo
• Spring Data JDBC extensions
• QueryDslJdbcTemplate

42
Spring Data Neo4J

• Using AspectJ support providing a new programming model


• Use annotations to define POJO entities
• Constructor advice automatically handles entity creation
• Entity field state persisted to graph using aspects
• Leverage graph database APIs from POJO model
• Annotation-driven indexing of entities for search

43
Spring Data Graph Neo4J cross-store

• JPA data and “NOSQL” data can share a data model


• Separate the persistence provider by using annotations
– could be the entire Entity
– or, some of the fields of an Entity
• We call this cross-store persistence
– One transaction manager to coordinate the “NOSQL” store with the JPA relational database
– AspectJ support to manage the “NOSQL” entities and fields
• holds on to changed values in “change sets” until the transaction commits for non-
transactional data stores

44
A cross-store scenario ...

You have a traditional web app using JPA to persist data to a relational
database ...

45
JPA Data Model

46

8/3/11 Slide 46
Cross-Store Data Model

47

You might also like