/  5
 
Data buckets
 Written by Gilad ManorPosted on Wednesday, March 3
rd
2009 at JavaWorld’s Daily Brew 
For me, software development is just a nice way of saying ‘bit moving’. Agood friend of mine used to describe himself as a bit reorganizer. Werearrange invisible magnets, he would say, setting their tiny arrows of residual currents to point this way or that. We are a bunch of “bitniks”and we are all about data. 
Application design and development has been my main sourceof income for the last decade or so, it struck me as odd that there were so fewterms that describe so many kinds of data.It occurred to me, that the Eskimos had their fourteen words for snow, and they saythat the Bedouins have nine words to describe sand. I felt so alone. I felt a need fordiscovering my own flavors of data. It took me a while, but then in a single perfectmoment of clarity, I had realized what lay before me. The orchestration of the moment was this; in the middle of a design meeting, yellingand shouting all around, we were discussing optimization and performance andspirits were high. My thoughts went back to when I have learnt about applicationdesign. The fact was that when developing any business application, the first stepto take is to determine the set of business flows that describe the scope andfunctionality of that application within the organizations it’s meant to serve.Listing these business-flows by rank and cardinality is no bother at all. The simplestevaluation I could think of is according to frequency of use and the sheer number of users that would eventually use the flow.I thought that the categorization of the data body by the same yardstick couldprovide me with the flavors of data I was looking for.
 
http://giladmanor.blogspot.com/giladmanor@yahoo.com Java is a wonderful language, my favorite actually, versatile and strong. In thecontext of this discussion (!), Java has one drawback; the Java Virtual Machine islocated far, so far away from the data acted upon.Unlike the hieroglyphic COBOL, java needs special machinery to access its data. Inthis case, the number of solutions testifies for the complexity of the problem.It’s safe to say that there are absolutely no free meals. Every solution ever inventedto accommodate the data access issue, bears with it its own cost and complications.Careful mapping of the data orientation by category and flavor might reduce thefriction in complex systems that depend on the availability of massive bulks of data.And here I am getting to my point: The mapping of the data reduces the friction incomplex systems and mapping of the data needs more flavors.
The topology of data within an application
I have managed to put together five distinctions of data flavors, but first, I willdescribe my study case application and define the yardstick I am using tocategorize data.My example is an application that sells insurance policies. The simplified outline of such an application would have a customer base and a product list. It would alsoinclude a process for the selling of insurance policies, implemented by stapling theproducts to customers. To make it interesting, I will refer to the use of external services and fixedconfiguration. The yardstick for data categorization is determined by measuring the cardinality of each of the application’s work flows.It is easy to see that for the hierarchy of the business flows, the main workflow is forthe selling an insurance policy (stitching the customer to the product), followed bythe work flow for managing the customer base. Far behind would be the work flowsfor creation, versioning and maintaining of the product list.
Applicative data bucket
 The applicative data bucket is the body of data that is manipulated by the mainbusiness flow and has the highest rate of change. I’m Strictly speaking of altered(modified) data only.In my example application, this would be the data that is handled in the policyselling work flow. The data consists of the stitching tables between the products andthe customer. The stitching tables may also describe a single shopping cart or asingle contract with the customer.
 
http://giladmanor.blogspot.com/giladmanor@yahoo.comIn many cases the data that is added or modified is within the boundaries of a singlesession there may be no reason for cache optimization.In cases where concurrency of change for the same data is permitted, thesynchronization between sessions should be handled with great care andunderstanding of the business implications. It’s important to remember thatdeadlock issues are ten times easier to handle from the business standpoint thentechnologically. There are several solutions for second level caching, just to name a couple, there istheEH cacheproject which I’m using in the product I’m working on and I have alsoheard about theterracottaproject.Choosing to implement the second level cache independently is also an option; it isan easy implementation as long as the cache stays on the same virtual machine.But scaling a cache solution to a clustered environment is a different ball game, andin this particular case, my policy is to make use the effort of others, and not waistmy own resources on an existing product.
Reference data bucket, first degree
 The first order of reference data is data that is used as “read only” by the mainbusiness flow. Yet this referenced data is a moving target since close-by flows (flowsthat are ranked closely to the main flow) change it constantly.In the example application, the customer base management work flows rank insecond. The customer base is modified intensively, when additional customers areintroduced to the database or when existing customers change status and detail.Having two possible concurrent sessions (the main business session and thecustomer data base update session) both accessing the customer data requiresspecial attention and awareness to the business implications of the concurrentmodification.In the calculation of insurance premium rates, the payment is determined inaccordance with personal parameters and the record of each individual customer,thus changes need to be communicated instantly. Thereby, synchronization of the concurrent sessions is a must. Second level cacheor any other innovative solution that allows live cache update between thecompeting sessions is advised. However, since the messages are sent one way,some application friction may be reduced.
Reference data bucket, second degree
 The second order of reference data is data that is referenced by the main businesswork flow as read only, yet changes by other business flows are in a very lowfrequency.

Share & Embed

More from this user

Add a Comment

Characters: ...