Professional Documents
Culture Documents
Who am I
Abstract
What and Why : NoSql Fundamentals
Use Case
Challenges Path Ahead
3
What is NoSql
Database which does not adhere to the traditional relational database management system (RDMS) structure .
Why NoSql
Scalability and Performance Cost Data Modeling
This brought organizations look out for alternatives and the need for a cost effective scale out option.
Concurreny,Consistency,Integrity
For Summations,Aggregations,Groupings
Some Fanda-mentals
CAP Theorem
At the most only two properties of the three in a shared/distributed system can be satisfied.
Consistency
Availability
Tolerance to Network Partitions
CAP : Pictorially
Explanation
Use case:
Scaling Web Apps
Critical facts :
Network outages are common Customer shopping carts, email search, social network queriescan tolerate stale data
How:
Compromise on Consistency in-order to remain available vs disrupt user service at outages.
Explanation
Rather than requiring consistency after every transaction, it is enough for the database to eventually be in a consistent state.
Brewers CAP theorem says you have no choice if you want to scale up.
Explanation contd..
Sharp Contrast : High Speed Financial Application
Highly Transactional
Consistent Automated
ACID vs BASE
ACID
Atomic: Everything in a transaction succeeds or the entire transaction is rolled back. Consistent: A transaction cannot leave the database in an inconsistent state. Isolated: Transactions cannot interfere with each other.
Soft-state
Eventual consistency
Consistent Hashing
Common way to load balance .
Commonly, a hash function(e.g MD5 hash) will map a value into a 128-bit key, 0~2^127-1(or 32 bit even as given next) .
Need to store bookings per day of all hotels . Queries centered around city and regions.
Hotel count : 1 Million
Transaction Support
Atomicity MVCC
Hybrid
Support
Q&A
References
Nancy Lynch and Seth Gilbert, Brewer's conjecture and the feasibility of consistent, available, partitiontolerant web services, ACM SIGACT News, Volume 33 Issue 2 (2002), pg. 51-59. Brewer's CAP Theorem", julianbrowne.com, Retrieved 02-Mar-2010 Brewers CAP theorem on distributed systems", royans.net
CAP Twelve Years Later: How the "Rules" Have Changed on-line resource
E. Brewer, "Towards Robust Distributed Systems," Proc. 19th Ann. ACM Symp.Principles of Distributed Computing (PODC 00), ACM, 2000, pp. 7-10; on-line resource D. Abadi, "Problems with CAP, and Yahoos Little Known NoSQL System," DBMS Musings, blog, 23 Apr. 2010; on-line resource. C. Hale, "You Cant Sacrifice Partition Tolerance," 7 Oct. 2010; on-line resource. Facebook: Scaling Out on-line resource. Gemstone : The Hardest Problems In Data Management on-line resource The Log-Structured Merge-Tree (Research Paper) CodeProject : Consistent Hashing on-line resource
Backup Slides
author : "alex",
title : "No Free Lunch", text : "This is the text of the post. It could be very long.", tags : [ "business", "ramblings" ],
votes : 5,
voters : [ "jane", "joe", "spencer", "phyllis", "li" ], comments : [ { who : "jane", when : Date("2011-09-19T04:00:10.112Z"), comment : "I agree." }, { who : "meghan", when : Date("2011-09-20T14:36:06.958Z"), comment : "You must be joking. etc etc ..." } ] }
Cassandra CF
Cassandra SuperCF
Use Case 1
Ecommerce Site
Problem : Record User Preferences e.g : Location,IP,Currency selected, Source of Traffic, Multiple other dynamic values
Solution : In a CF based structure keep it simple UserId_Key: Pref2_Name:Value1,Pref2_Name:Value2,.PrefN_ Name:ValueN
Use Case 1
RowKey: 1350136093705_6501082438199894 => (column=1350136093764, value=-3242432#911167901131523, timestamp=1350136093766000) => (column=1350283322499, value=GOI#200701231712126570, timestamp=1350283322502001) => (column=1350283566051, value=GOI#200703221605283033, timestamp=1350283566054001)
------------------RowKey: 1354435656227_7908056941568359 => (column=1354435656367, value=IDR#200701211254519381, timestamp=1354435656369000, ttl=1728000) ------------------RowKey: 1347648097261_15570089270962881 => (column=1347648097304, value=DEL#201101192008115545, timestamp=1347648097307000)
Use Case 1
Get private Map<String, String> getPrerences(Keyspace keySpace, String userId, String... prefernceNames) throws IOException, CharacterCodingException { SliceQuery<String, String, String> rsq = HFactory.createSliceQuery(keySpace, StringSerializer.get(), StringSerializer.get(), StringSerializer.get()); rsq.setColumnFamily(USER_PREFERENCE); rsq.setKey(userId);
rsq.setColumnNames(prefernceNames);
QueryResult<ColumnSlice<String, String>> orows = rsq.execute(); Map<String, String> preferenceMap = new LinkedHashMap<String, String>();
Use Case 1
Save Mutator<String> m = HFactory.createMutator(keySpace, StringSerializer.get()); HColumn<String, String> userPrefrences = HFactory.createColumn(colkey, colvalue, StringSerializer.get(), StringSerializer.get()); userPrefrences.setTtl(ttlUserPrefrences); m.addInsertion(rowkey, USER_PREFERENCE, userPrefrences); m.execute();
Use Case 2
Online Travel Site
Problem:
Use Case 2
RowKey: 2d323436353731 => (super_column=911167901297486, (column=6c6173747669657765646d657373616765, value=VIEWED#Last viewed 23 hour(s) ago., timestamp=1354962852610000)
Use Case 2
SuperSliceQuery<String, String, String, String> superQuery = HFactory.createSuperSliceQuery(getKeySpace(), StringSerializer.get(), StringSerializer.get(), StringSerializer.get(), StringSerializer.get()); superQuery.setColumnFamily(SUPER_SOCIAL_MESSAGE).setKey(cityCode); QueryResult<SuperSlice<String, String, String>> result = superQuery.execute(); List<HSuperColumn<String, String, String>> superColumns = result.get().getSuperColumns(); if (superColumns != null) { for (HSuperColumn<String, String, String> superColumn : superColumns) { Map<String, String> messages = new HashMap<String, String>(); List<HColumn<String, String>> columns = superColumn.getColumns(); if (columns != null) { for (HColumn<String, String> column : columns) { messages.put(column.getName(), column.getValue()); } } /* The equivalent doc *\ document.addField(superColumn.getName(), messages); documents.add(document); } }
Pig Script : MR
<document> <pigscript start="-16" end="-43200" start1="-1441" end1="-10080" start2="0" end2="-15" start3="0" end3="-1440"> <comment>Delete All Messages</comment> <query><![CDATA[rows0 = LOAD 'cassandra://LH/HotelMessage' USING com.mmt.solr.hotels.cassandra.CassandraStorage() as (key:chararray, cols:bag{T:tuple(name:chararray, value:chararray) } );]]></query> <query><![CDATA[cols0 = FOREACH rows0 GENERATE key as key,flatten($1) as (name:chararray, value:chararray);]]></query> <query><![CDATA[cols0 = FOREACH rows0 GENERATE key as key,flatten($1) as (name:chararray, value:chararray);]]></query> <query><![CDATA[userhotel0 = FOREACH cols0 GENERATE key as key,com.mmt.solr.hotels.cassandra.ByteBufferToString($1) as name,com.mmt.solr.hotels.cassandra.ByteBufferToString($2) as value;]]></query> <query><![CDATA[uriCounts0 = FOREACH userhotel0 GENERATE key as citycode,com.mmt.solr.hotels.cassandra.ToBag(TOTUPLE(name,null));]]></query>
<comment>Last Viewed start 15 minutes to 30 days ago</comment> <query><![CDATA[rows = LOAD 'cassandra://LH/LastViewedHotels?slice_start=#start&slice_end=#end&limit=1024&reversed=true' USING com.mmt.solr.hotels.cassandra.CassandraStorage() as (key:chararray, cols:bag{T:tuple(name:long, value:chararray) } );]]></query> <query><![CDATA[cols = FOREACH rows GENERATE key as key,flatten($1) as (name:long, value:chararray);]]></query> <query><![CDATA[userhotel = FOREACH cols GENERATE key as key,com.mmt.solr.hotels.cassandra.LongToHours($1) as name,com.mmt.solr.hotels.cassandra.ByteBufferToString($2) as value;]]></query> <query><![CDATA[userhotelByCity = FOREACH userhotel GENERATE key as key,flatten($1) as name,flatten(org.apache.pig.piggybank.evaluation.string.Split(value,'#',2)) as (citycode:chararray,hotelid:chararray);]]></query> <query><![CDATA[groupByhotels = GROUP userhotelByCity BY hotelid;]]></query> <query><![CDATA[uriCounts = FOREACH groupByhotels { D = LIMIT userhotelByCity 1; GENERATE flatten(D.citycode) as citycode,com.mmt.solr.hotels.cassandra.ToBag( TOTUPLE(group,com.mmt.solr.hotels.cassandra.StringAppend('VIEWED#Last viewed ',D.name,' ago.'))); };]]></query>
<comment>Last Booked 1 to 8 days ago</comment> <query><![CDATA[rows1 = LOAD 'cassandra://LH/BookedHotels?slice_start=#startA&slice_end=#endA&limit=1024&reversed=true' USING com.mmt.solr.hotels.cassandra.CassandraStorage() as (key:chararray, cols:bag{T:tuple(name:long, value:chararray) } );]]></query> <query><![CDATA[cols1 = FOREACH rows1 GENERATE key as key,flatten($1) as (name:long, value:chararray);]]></query> <query><![CDATA[userhotel1 = FOREACH cols1 GENERATE key as key,com.mmt.solr.hotels.cassandra.LongToHours($1) as name,com.mmt.solr.hotels.cassandra.ByteBufferToString($2) as value;]]></query> <query><![CDATA[userhotelByCity1 = FOREACH userhotel1 GENERATE key as key,flatten($1) as name,flatten(org.apache.pig.piggybank.evaluation.string.Split(value,'#',2)) as (citycode:chararray,hotelid:chararray);]]></query> <query><![CDATA[groupByhotels1 = GROUP userhotelByCity1 BY hotelid;]]></query> <query><![CDATA[uriCounts1 = FOREACH groupByhotels1 { D = LIMIT userhotelByCity1 1;