About this document
This is the approach and results of an evaluation in to the capabilities of Infobright CommunityEdition (ICE) versus a traditional MySQL InnoDB database when performing summary and groupanalysis on large
data sets. This document is not a full evaluation of ICE, neither is it anendorsement of the product.
Situation
Most organisations will have at least one data warehouse or data marts containing business dataspecific to a department. These databases typically feed management information (MIS) and/orbusiness intelligence (BI) solutions and, in larger organisations, are usually relational data storesoptimised to perform particular tasks
.Often business users want to perform additional analysis on the data in the warehouse or mart inorder to gain insights in to customer or employee behaviour. Examples of this might be “Who are mytop 10 customers buying widgets in the following regions over the past six months?”; “Whichemployees over director grade and in the IT department spend the most on employee benefits”; “Which customers using the Safari browser who click on the Swedish landing page go on to spendover 100 krone”.
The problem
This desire to perform ad-hoc analysis or data mining can lead to difficulties for the teams that ownand provide access to the data.This is because data marts are usually optimised for a particular set of use cases and hence areaggregated and indexed on the dimensions that match the use cases. So a Sales data mart may bebuilt to query on dimensions of product code, region, sales manager, but may not be geared up toanswer queries as to the marketing campaign code of the product. The data warehouse itself (if atraditional warehouse) will not make any optimisations along dimensions.For this reason, users are often discouraged or prevented from performing this type of analysis ondata warehouses. If they are allowed access there are two opposing factors:
•
Long response times to ad-hoc queries lead to a poor user experience
•
Database optimisations (indexes and aggregate tables) greatly increase the amount of storage required
Reason for this evaluation
ICE is of potential interest to us as it provides a platform we can enhance to provide reporting andalerting interfaces that perform deep and sophisticated analytics allowing users access to informationthey previously could not have.
1
1 million plus rows of data
2
Smaller organisations often have their data warehouse made up of one or more spreadsheets
3
This has a knock-on effect of increasing the time required and complexity of populating thedatabase
Add a Comment