driven] business environment."
What is Data-DrivenDecision-Making?
Data-driven decision-making is a processthat requires collaboration and a variety ofskills across all levels of the enterprise.Predictive modeling is only a small piece ofthe process. A large piece of the processrevolves around the data: data acquisition,data quality, data manipulation and datadistribution. In fact, good decisions cannotbe made without reliable, high-quality data.
The Data Story
From my perspective, the most difficultpart of the process is the mathematicalformulation of a model that describes theproblem you are trying to solve. Often, the
Figure 1: Steps to Data-Driven Decision-Making
analytic methods used will depend onwhat data is available. Working together,IT and the analytic teams need to identifywhere the data resides within the organi-zation and what format it is in (relationaldata tables, spreadsheets, enterpriseresource planning [ERP] systems). Arethere multiple instances of the data thatdon't match? Is the data complete? Is addi-tional data from external sources (demo-graphic or socioeconomic data) needed?Doing exploratory data analysis on asubset ofthe data and examining the meta-data is a common practice for understand-ing the data. Summary statistics and visual-ization can be key methods to identifyinganomalies in the data that need to beaddressed prior to a more in-depth model-ing exercise. Data may need to be con-verted or transformed for use in predictivemodeling. Measurement data may need tobe standardized. Individual transactionsmay need to be summarized into new vari-ables representing rates, counts or indica-
Data may need to be reformatted fromproduct or transactional data into cus-tomer-focused data. Assumptions aboutthe underlying distribution ofthe data needto be tested for statistical validity.In the predictive modeling phase,trade-offs need to be considered betweenthe speed of modeling the accuracy of themodel and how easily it is understood.Business users need to trustthe results of the analysis,regardless of their knowledgeof analytical methods. Manysoftware packages provideonly a few simple methodswith limited options, whileothers provide a wide variety.In general, more flexiblemodeling strategies lead tobetter predictions, whichimpact bottom-line revenue.No single methodworks best in all cases. Onewidely accepted strategy isto try all the most commonmodeling methods (decision
regression and neuralnetworks) and comparethem to determine the bestmodel. A common criterionfor evaluation is a compari-son of the expected profitsor losses to actual profits or lossesobtained from model results. This criteri-on enables you to make cross-modelcomparisons and assessments independ-ent of all other factors.Delivering the output from the bestmodel to the business user is a key consid-eration for IT staff Output from the mod-els can be sophisticated or simple. Outputmay be fed programmatically into real-timesystems, such as database engines, messagequeues or Web services, friggering real-timealerts or product recommendation offers tocall center
Alternatively, a set ofreports (documents, spreadsheets or pre-sentations) could be generated either stati-cally or dynamically on demand in a Webportal or a dashboard. Ultimately, theinformation needs to be accessible whereand when it is needed, in a context relevantto the decision-makerData-driven decision-making can beused throughout the enterprise to modelcustomer, supplier and operationalprocesses. The models are corporateassets that may have significant fmancialimpact, particularly in the areas of mar-keting, risk assessment and operations.They must be continually assessed andvalidated for their accuracy over time.IT staff win be tasked with managingthe data and models throughout the life-cycle (development, test/stage, deploy,track, retire) including version control andchange management for audit reportingpurposes.Storing model packages with theirmetadata allows automated model sched-uling, including exception reports andmodel tracking reports. A common meta-data repository provides the ability toperform impact analysis - to analyze andevaluate changes in data definitions ormodel specifications across the organiza-tion before an actual change breaks exist-ing applications.Of course, decision-making is anongoing cycle. Information gleaned fromone iteration of the cycle should be fedback into the process to make it better thenext time.Data mining and statistics are power-ful tools that enable organizations tomake more structured, repeatable deci-
The decision-making processbegins with data access, data explorationand transformation, followed by predic-tive modeling. The process concludeswith the delivery of information to thedecision-makers throughout the enter-prise enabling theni to take action. From abusiness perspective, it doesn't really mat-ter what you call it: statistics, data miningor predictive analytics. Competitiveadvantage comes from making betterdecisions faster and more confidently. ®
Kathy Lange is a senior business
for SASAnalytical Consulting. She may be reached firstname.lastname@example.org.