/  8
 
Improving the performance of ad-hocanalysis of large datasets
 
About this document
This is the approach and results of an evaluation in to the capabilities of Infobright CommunityEdition (ICE) versus a traditional MySQL InnoDB database when performing summary and groupanalysis on large
1
data sets. This document is not a full evaluation of ICE, neither is it anendorsement of the product.
Situation
Most organisations will have at least one data warehouse or data marts containing business dataspecific to a department. These databases typically feed management information (MIS) and/orbusiness intelligence (BI) solutions and, in larger organisations, are usually relational data storesoptimised to perform particular tasks
2
.Often business users want to perform additional analysis on the data in the warehouse or mart inorder to gain insights in to customer or employee behaviour. Examples of this might be “Who are mytop 10 customers buying widgets in the following regions over the past six months?”; “Whichemployees over director grade and in the IT department spend the most on employee benefits”; “Which customers using the Safari browser who click on the Swedish landing page go on to spendover 100 krone”.
The problem
This desire to perform ad-hoc analysis or data mining can lead to difficulties for the teams that ownand provide access to the data.This is because data marts are usually optimised for a particular set of use cases and hence areaggregated and indexed on the dimensions that match the use cases. So a Sales data mart may bebuilt to query on dimensions of product code, region, sales manager, but may not be geared up toanswer queries as to the marketing campaign code of the product. The data warehouse itself (if atraditional warehouse) will not make any optimisations along dimensions.For this reason, users are often discouraged or prevented from performing this type of analysis ondata warehouses. If they are allowed access there are two opposing factors:
Long response times to ad-hoc queries lead to a poor user experience
Database optimisations (indexes and aggregate tables) greatly increase the amount of storage required
3
Reason for this evaluation
ICE is of potential interest to us as it provides a platform we can enhance to provide reporting andalerting interfaces that perform deep and sophisticated analytics allowing users access to informationthey previously could not have.
1
1 million plus rows of data
2
Smaller organisations often have their data warehouse made up of one or more spreadsheets
3
This has a knock-on effect of increasing the time required and complexity of populating thedatabase
 
Several of our current clients would benefit from being able to mine their data marts in an efficientand productive (from a user experience perspective) manner.
About Infobright
Infobright is a database designed to solve analytical queries. It is built on MySQL but uses a differentstorage engine, Brighthouse, rather than one of the standard storage engines (e.g. MyISAM,InnoDB).Infobright does not use indexes or aggregate tables but instead relies on the fact that it is a column-oriented (columnar) database which is why it is more suited to aggregate analytics.This is for the most part invisible to the user (depending on which edition is used) and Infobright canbe accessed through the same clients used for a regular MySQL instance.Infobright comes in two flavours. The Community Edition (ICE) is Open Source Software and theEnterprise Edition (IEE) is a commercial product. The chief differences between the two offerings aresupport for data loading and DML (i.e. INSERT, UPDATE, DELETE).
Evaluation
We performed a limited evaluation to determine whether ICE would provide benefits in a real-lifesituation.We used data from a warehouse that belonging to one of our clients and worked with them tounderstand analysis that they would like to be able to perform but up to now have not been able to.The data and problem domain has been made anonymous and generic within this report to protectclient confidentiality.The key principles for the evaluation were:
Use real data volumes
 Ask real questions of the data
Aim
The aim of the evaluation was intended to understand how an Infobright Community Edition (ICE)database compared to a standard MySQL database (using an InnoDB storage engine) over thefollowing dimensions:
User response times to sample queries
Storage space required by the database
Specifications
Tests were performed on a desktop developer’s machine
Pentium Dual-core 2.16GHz, 3Gb RAM, Windows XP Professional
MySQL Community Edition 5.1
o
Using InnoDB

Share & Embed

More from this user

Add a Comment

Characters: ...