You are on page 1of 7

Case Study on WEKA a tool for Business Intelligence-A

Business intelligence (BI) mainly refers to computer-based techniques used in extracting, identifying, and analyzing business data, such as sales revenue by products and/or departments, or by associated costs and incomes.
BI technologies provide historical, current and predictive views of business operations. Common functions of business intelligence technologies are reporting, online analytical processing, analytics, data mining, process mining, complex event processing, business performance management, benchmarking, text mining and predictive analytics. Business intelligence aims to support better business decision-making. Thus a BI system can be called a decision support system (DSS). Though the term business intelligence is sometimes used as a synonym for competitive intelligence, because they both support decision making, BI uses technologies, processes, and applications to analyze mostly internal, structured data and business processes while competitive intelligence gathers, analyzes and disseminates information with a topical focus on company competitors. Business intelligence understood broadly can include the subset of competitive intelligence.

In a 1958 article, IBM researcher Hans Peter Luhn used the term business intelligence. He defined intelligence as: "the ability to apprehend the interrelationships of presented facts in such a way as to guide action towards a desired goal." In 1989, Howard Dresner (later a Gartner Group analyst) proposed "business intelligence" as an umbrella term to describe "concepts and methods to improve business decision making by using factbased support systems."[2] It was not until the late 1990s that this usage was widespread.

Business intelligence and data warehousing

Often BI applications use data gathered from a data warehouse or a data mart. However, not all data warehouses are used for business intelligence, nor do all business intelligence applications require a data warehouse. In order to distinguish between concepts of business intelligence and warehouses, Forrester Research often defines business intelligence in one of two ways: data

1. "Business Intelligence is a set of methodologies, processes, architectures, and technologies that transform raw data into meaningful and useful information used to enable more effective strategic, tactical, and operational insights and decision-making." 2. When using this definition, business intelligence also includes technologies such as data integration, data quality, data warehousing, master data management, text and content analytics, and many others that the market sometimes lumps into the Information Management segment. Therefore, Forrester refers to data preparation and data usage as two separate, but closely linked segments of the business intelligence architectural stack.


Business intelligence and business analytics

Thomas Davenport has argued that business intelligence should be divided into querying, reporting, OLAP, an "alerts" tool, and business analytics. In this definition, business analytics is the subset of BI based on statistics, prediction, and optimization.

Prioritization of business intelligence projects

It is often difficult to provide a positive business case for business intelligence initiatives and often the projects will need to be prioritized through strategic initiatives. Here are some hints to increase the benefits for a BI project. As described by Kimball you must determine the tangible benefits such as eliminated cost of producing legacy reports. Enforce access to data for the entire organization. In this way even a small benefit, such as a few minutes saved, will make a difference when it is multiplied by the number of employees in the entire organization. As described by Ross, Weil & Roberson for Enterprise Architecture, consider letting the BI project be driven by other business initiatives with excellent business cases. To support this approach, the organization must have Enterprise Architects, which will be able to detect suitable business projects.

A 2009 Gartner paper predicted[24] these developments in the business intelligence market: Because of lack of information, processes, and tools, through 2012, more than 35 percent of the top 5,000 global companies will regularly fail to make insightful decisions about significant changes in their business and markets. By 2012, business units will control at least 40 percent of the total budget for business intelligence. By 2012, one-third of analytic applications applied to business processes will be delivered through coarse-grained application mashups. A 2009 Information Management special report predicted the top BI trends: "green computing, social networking, data visualization, mobile BI, predictive analytics, composite applications, cloud computing andmultitouch."[25] Other business intelligence trends include the following:[26] Third party SOA-BI products increasingly address ETL issues of volume and throughput. Cloud computing and Software-as-a-Service (SaaS) are ubiquitous. Companies embrace in-memory processing, 64-bit processing, and pre-packaged analytic BI applications. Operational applications have callable BI components, with improvements in response time, scaling, and concurrency.


WEKA(Waikato Environment for Knowledge Analysis) is a popular suite of machine learning software written in Java, developed at the University of Waikato, New Zealand. WEKA is free software available under the GNU General Public License.

The WEKA Knowledge Explorer is a graphical user interface that allows users to do data mining.

Data Preprocess using WEKA Even though WEKA allows users to upload .csv data files, WEKA mainly uses .arff (AttributeRelation File Format) file format. Thus weka provides the users the functionality to converts .csv to .arff file format. There are two ways to convert, 1: When you upload the .csv file WEKA explorer will prompt users to convert 2. Users can use WEKA CLI. 1. Step 1: Using weka CLI to convert .csv -> .arff 2. java weka.core.converters.CSVLoader bank-data.csv > bank-data.arff

ARFF : It is a text file that describes a list of instances sharing a set of attributes. link. 3 main parts: @relation, @attribute, @data. Sample from bank-data.arff data file that I will be using in this demo. 3|Page


@relation bank-data @attribute id {ID12101,ID12102,ID12103,ID12104,ID12105,...ID12700} @attribute age numeric @attribute sex {FEMALE,MALE} @attribute region {INNER_CITY,TOWN,RURAL,SUBURBAN} @attribute income numeric @attribute married {NO,YES} @attribute children numeric @attribute car {NO,YES} @attribute save_act {NO,YES} @attribute current_act {NO,YES} @attribute mortgage {NO,YES} @attribute pep {YES,NO} @data ID12101,48,FEMALE,INNER_CITY,17546,NO,1,NO,NO,NO,NO,YES ID12102,40,MALE,TOWN,30085.1,YES,3,YES,NO,YES,YES,NO ID12103,51,FEMALE,INNER_CITY,16575.4,YES,0,YES,YES,YES,NO,NO ID12104,23,FEMALE,TOWN,20375.4,YES,3,NO,NO,YES,NO,NO ID12105,57,FEMALE,RURAL,50576.3,YES,0,NO,YES,NO,NO,NO 3. Step 2: Loading bank-data.arff to WEKA.



4. Step 3: Filtering Attributes Need to remove unique attributes before we can start on data mining.

Ex: Removing attribute "ID"



5. Step 4: Discretization

This step involves in categorizing the data. According to WEKA documentation this step is crucial as techniques such as "association rule mining" can only be applied on categorized data.

Discretization must be perform on numeric or continuous attributes. Ex: Discretization performed on "age" and "income" attributes. Research on Stock Market Data.


Every selected Attributes Vs Every Selected Attributes Shown is the set of scatter plots.

Scatter Plots: A scatter plot reveals relationships or association between two variables How scatter plot helps: It helps answer the questions regarding the involved variables X & Y such as

Are X and Y Related? Are X and Y Positively correlated? Are X and Y Negatively correlated? Does Ys variation depends Xs variation? Are there outliers?