Professional Documents
Culture Documents
T: 267.616.1444 T: 647.271.1932
F: 1.905.761.1006 F: 905.761.1006
9 Flagstaff Place 150 Borrows Street
Philadelphia, PA Thornhill, ON
19115 USA L4J 2W8 Canada
info@bisolutions.us www.bisolutions.us
WHITE PAPER:
DATA MINING: THE MEANS TO COMPETITIVE ADVANTAGE
APRIL 2008
Copyright © Business Intelligence Solutions | Dmitry Brusilovsky & Eugene Brusilovskiy [2]
Business Intelligence Solutions – White Paper
Data Mining: The Means to Competitive Advantage
www.bisolutions.us | Data. Knowledge. Action.
Your competitors are not familiar with your company's unstructured problems. In fact, they’re probably
not even aware that such problems exist. Your real competitive edge lies in using knowledge-discovering
technology to solve your business’s unstructured problems, as your competitors will be unable to easily
replicate your solutions.
In summary, projects that find the solutions to unstructured business problems can be termed real
data mining projects. The successful resolution of a real data mining project provides organizations
with high payoff.
described with a high degree of completeness described with a high degree of completeness
solved with a high degree of certainty resolved with a high degree of certainty
easily and uniquely translated into quantitative easily and uniquely translated into quantitative
counterpart counterpart
Experts usually agree on the best method and Experts usually disagree on the best method
best solution and best solution
Goal Find the best solution Find reasonable/acceptable solution
Complexity From very simple to complex, usually lies within From complex to very complex, as a rule lies at
one discipline the interface of multiple disciplines
Causality Easy to ascertain Difficult to ascertain
Consequences of Known Unknown
potential actions
Example Project: Sample size identification Project: Customer Feedback Study
Key business question What is customers sample size (only one Is there any relationship between customer
promotion channel vs. synergy of two perception of company reputation/brand equity
promotional channels) to detect at least two and company products portfolio sales?
purchase difference in sales of Product A?
Translation into If the project will be given to 5 different If the project will be given to 5 different data
quantitative counterpart statisticians, all of them will come up with miners, each of them will come up with unique
identical results: same methodology, same methodology, require specific data mining
sample size calculation procedure, and same algorithms and software, and the results will not
outcome – a table with calculated sample be identical (even terms in which findings will be
size as a function of power, regardless of formulated, can be completely different)
software used
Copyright © Business Intelligence Solutions | Dmitry Brusilovsky & Eugene Brusilovskiy [3]
Business Intelligence Solutions – White Paper
Data Mining: The Means to Competitive Advantage
www.bisolutions.us | Data. Knowledge. Action.
Both statistical and data mining paradigms provide a wide range of regression methods/models. Modern
statistical regression models include but are not limited to: robust regression, quantile regression, loess/
kernel regression, additive regression and spline regression. Data mining regression models include tree-
based regression (CART, CHAID, and others), TreeNet, Random Forrest, Multilayer Perceptron, MARS
and others.
Copyright © Business Intelligence Solutions | Dmitry Brusilovsky & Eugene Brusilovskiy [4]
Business Intelligence Solutions – White Paper
Data Mining: The Means to Competitive Advantage
www.bisolutions.us | Data. Knowledge. Action.
What is the difference between statistical and data mining regression? In short, the difference lies
in the varying degrees of tolerance that statistical and data mining models have to viral data anomalies
(Table 3). Data mining approaches can handle high-dimensional heterogeneous data with a high degree
of sparseness and multicollinearity, and with a significant percentage of outliers/leverage points and missing
values, and are able to discover uncharacterizable non-linearities among differently scaled variables in
high-dimensional space. Statistical approaches cannot (Brusilovskiy 2007).
Copyright © Business Intelligence Solutions | Dmitry Brusilovsky & Eugene Brusilovskiy [5]
Business Intelligence Solutions – White Paper
Data Mining: The Means to Competitive Advantage
www.bisolutions.us | Data. Knowledge. Action.
Copyright © Business Intelligence Solutions | Dmitry Brusilovsky & Eugene Brusilovskiy [6]
Business Intelligence Solutions – White Paper
Data Mining: The Means to Competitive Advantage
www.bisolutions.us | Data. Knowledge. Action.
Copyright © Business Intelligence Solutions | Dmitry Brusilovsky & Eugene Brusilovskiy [7]
Business Intelligence Solutions – White Paper
Data Mining: The Means to Competitive Advantage
www.bisolutions.us | Data. Knowledge. Action.
CONCLUSION
Digging for a sustainable competitive advantage requires companies to focus on real data mining
projects and to build a team with the right synergy to discover meaningful results. This is not an
overnight process. It would be a wise management decision to start using data mining technology as
soon as possible, because gaining sustainable company knowledge is a recursive process and data
mining technology is a necessary component to remain competitive. Understanding the nature and
characteristics of a real data mining study will better prepare management to select and handle the right
data mining projects that will offer their company a competitive edge.
Copyright © Business Intelligence Solutions | Dmitry Brusilovsky & Eugene Brusilovskiy [8]
Business Intelligence Solutions – White Paper
Data Mining: The Means to Competitive Advantage
www.bisolutions.us | Data. Knowledge. Action.
REFERENCES
1. Kumar, T. An Introduction to Data Mining in Institutional Research
http://www.airweb.org/webrecordings/Kumar%20Data%20Mining%20Workshop.pdf
2. Hand, D. J. Data Mining: Statistics and More? The American Statistician, May 1998, Vol. 52 No. 2
http://www.amstat.org/publications/tas/hand.pdf
3. Friedman, J.H. 1997. Data Mining and Statistics. What’s connection? Proceedings of the 29th
Symposium on the Interface: Computing Science and Statistics, May 1997, Houston, Texas
4. Smyth, P. (2000), An Introduction to Data Mining, Elumetric.com Inc
5. King, E. A. (2005), How to Buy Data Mining: A Framework for Avoiding Costly Project Pitfalls in
Predictive Analytics, DM Review Magazine, October 2005 issue
http://www.dmreview.com/article_sub.cfm?articleId=1038094
6. Brusilovskiy, P. (2007), Data Mining in Pharmaceutical Marketing and Sales Analysis. 2007 ICSA
Applied Statistics Symposium
statgen.ncsu.edu/icsa2007/talks/Session%203D%20Data%20Mining%20in%20Pharmaceutical%2
0Marketing%20.ppt
7. Brusilovskiy, P. and by Hernandez, R. (2001), Data Mining for a Sustainable Competitive Advantage.
DM Review Magazine, June 2001 issue
http://www.dmreview.com/editorial/dmreview/print_action.cfm?articleId=3508
Copyright © Business Intelligence Solutions | Dmitry Brusilovsky & Eugene Brusilovskiy [9]