This action might not be possible to undo. Are you sure you want to continue?
Choosing a Business Intelligence Appliance
As published in:
The term business intelligence appliance is defined differently by many folks, however, a broadly accepted definition is that it is server hardware and database software bundled to specifically meet data warehousing needs.
“BI appliances come in many variations, so it takes some amount of groundwork and research to make sure that you choose the right appliance that meets your needs.”
Just like the name suggests, BI appliances are turning out to have some similarities with other familiar appliances, i.e., their kitchen counterparts. They come in many variations, so it takes some amount of groundwork and research to make sure that you choose the right appliance that meets your needs. I was involved in two such bake-offs over the last year, and through the exercise, we worked out a pretty good process that helped us assess our options and make our choice. I will note some of the key highlights in our approach. First, the client was a large financial institution, and this initiative was at a departmental level. Our key objective was to consolidate four data marts into a single infrastructure because we were getting many requests to combine information across these and eliminate data latency between them. We were primarily facing challenges on the data load side, which in some cases was greater than 30 hours for a run. Performance pain on the query side but wasn’t very significant. Also as a secondary benefit, we were attempting to consolidate database licenses and servers across these four environments, including development and test boxes for each environment. Like many of the large enterprises, we had the server support outsourced to one of the large infrastructure support players and were being charged a hefty sum per server on a monthly basis. Because the combined total data size was expected to be around 2TB, we were reluctant to even begin the process as we had heard about the 5TB plus starting point for the appliance solutions to prove valuable.
Saama White Paper
However, we really needed something to help the load situation and decided to move ahead with this initiative. We started with the usual “product evaluation” approach and broke it out into four key steps: 1. Long list – based on Internet research. 2. Short list – based on discussions with analyst firms and minimal interactions with the vendors. 3. Proof of concept bake-off for the short list contenders. 4. Final assessment and decision. Step 1: Long List For the long list of candidates we got most of the players from Gartner’s Magic Quadrant. We found that we could broadly classify these into a few categories: 1. Hardware and software solution, 2. Software solution or 3. Hardware solution.
“We put together a set clear and transparent of guidelines for this process to ensure that we had an even playing field.”
Step 2: Short List We trimmed down the initial list based on our size and performance needs. During this process we used input from Gartner and Forrester and had minimal interaction with the actual vendor sales reps. We did consider customer references as part of the decision process. So, that led us to two players, as our main contenders for our POC bake-off. Interestingly, it turned out to be a mix of a “proprietary hardware plus software” player and a “commodity hardware plus software” player. Although one vendor appeared to be a startup player, they were given high marks by the analysts and, more importantly they seemed to be working very closely with Sun and even had common board members, making their viability question a little less risky.
Step 3: POC Bake-Off We put together a set clear and transparent of guidelines for this process to ensure that we had an even playing field. Some of the things we laid out were: • We insisted that the POC be done on site. • We would have one of our team members shadow the vendor engineer during the entire process to understand and report back on what it took. • We also time-bound this to be a one week on-site activity. We knew that the on-site requirement would mean getting a bunch of approvals internally to allow for the vendor hardware to be set up in our data center, so we initiated that process during step one. By the time we
had the final contenders, we had things in place from a legal/security perspective to not hold. For the POC task, we identified one load process that was taking approximately 33 hours as a prime candidate. This process consisted of a set of Informatica jobs and involved picking up data from flat files and moving it to stage, to final schema and finally to a set of aggregate tables. Since we had existing investments in reporting and analytical applications, we had to ensure that the existing schema remain untouched so as to avoid changes to these BI applications. The load process consisted of all the three types of operations, i.e., inserts, deletes and updates. We were going to measure the load performance of this entire task, and then we would have one of our BI environments point to the vendor appliance and benchmark running some of the long-running reports and queries.
“The performance runs of the whole process in both cases yielded mind-blowing results. The 33hour process took less than 40 minutes in both the cases.”
Both the vendors shipped over their boxes to our data center, and for logistic reasons, it turned out that we had the vendors come in and work on their tasks on staggered weeks. So we had vendor one on week 1 and vendor 2 on week three. We would have preferred this to be the same week, but that would have required some additional setup on our side. In both runs, we ran into some technical snags in moving the raw data over to the appliance, but we used one of the big USB devices to move it over. The runs were fairly smooth and the engineers were very good. They knew what they were doing and were able to carry out the tasks with minimal issues. The performance runs of the whole process in both cases yielded mind-blowing results. The 33-hour process took less than 40 minutes in both the cases. We also tested both using mixed loads, i.e., loading data and running reports simultaneously, and we did not see much degradation. These systems have been architected to allow for loads without impacting the end usage. Of course, this can also lead to some read consistency issues if the overall process is not designed properly, but that’s a separate discussion. Both of these appliances had proven themselves with good performance, with very little difference (less than four minutes) between them. So, the decision process now switched from performance to price/performance, price here being total cost of ownership over a three-year period, including accounting for the projected data growth. This particular exercise we did in late 2008, and the economy was such that it was already a buyers market. Both vendors were willing to bend over backward to close the deal. So, even the price/performance was becoming a difficult metric to base the decision on.
We finally made our decision to go with a commodity hardware plus software solution rather than the proprietary hardware plus software solution, our main reasoning behind being that the commodity hardware based system would stand to gain from the billions of dollars of R&D being carried out by the hardware giants like Intel and AMD. This meant that we would have to go with a relatively unknown player, but the strong backing mitigated the risk.
“I can almost assure you that you will be surprised by the performance gains from these systems on both the data load side as well as the query side.”
The BI appliance industry is evolving rapidly, and new players are really pushing hard to make their solutions affordable to even the sub-billiondollar enterprises. I recommend that even enterprises having to deal with just a TB of data (with expected data volume growth) should explore the possibility of introducing a BI appliance into their ecosystem. I can almost assure you that you will be surprised by the performance gains from these systems on both the data load side as well as the query side. Finally, I believe this market is going to evolve rapidly and the commodity hardware plus software players will begin to dominate the marketplace. About the Author Haranath Gnana is senior principal at Saama Technologies Inc., a pureplay business intelligence solution provider. He has more than 15 years experience in the area of information technology, specializing business intelligence. About Saama Technologies Saama Technologies, Inc. is a pure-play business intelligence solution provider that has revolutionized the way organizations make decisions through business intelligence. Since 1997, the company has combined its extreme BI technology expertise, unique intellectual property portfolio and strong relationships with the industry’s leading technology providers to deliver pure business intelligence to the world’s largest informationfocused organizations. Saama’s customers are Fortune 500 organizations within a wide range of industries, including life sciences, technology, financial services, and the public sector. Saama recently acquired data-integration software and NIEM-conformance pioneer Sypherlink, which operates as an independent, wholly-owned subsidiary. For more information, visit www.saama.com.
This action might not be possible to undo. Are you sure you want to continue?