You are on page 1of 7

Data Mining Techniques for Optimizing Inventories for

Electronic Commerce
Anjali Dhond Amar Gupta Sanjeev Vadhavkar
Massachusetts Institute of Technology Massachusetts Institute of Technology Massachusetts Institute of Technology
Room E53-311 Room E53-311 Room 1-270
40 Wadsworth Street 40 Wadsworth Street 77 Massachusetts Avenue
617-253-8906 617-253-8906 617-253-6232
adhond@mit.edu agupta@mit.edu vada@mit.edu

ABSTRACT
As part of their strategy for incorporating electronic commerce
1. INTRODUCTION
The past two decades have witnessed a dramatic increase in
capabilities, many organizations are involved in the development
information being stored in electronic format. This surge will be
of information systems that will establish effective linkages with
further compounded by an ever-growing number of organizations
their suppliers, customers, and other channel partners involved in
embracing the paradigm of electronic commerce. The amount of
transportation, distribution, warehousing and maintenance
information in the world is estimated to double every 20 months
activities. These linkages have given birth to comprehensive data
and the size and the numbers of databases are increasing at a still
warehouses that integrate operational data with supplier,
faster pace. The increase in the use of electronic data gathering
customer, channel partners and market information. Data mining
devices, such as point-of-sale devices and remote sensing devices,
techniques can now provide the technological leap needed to
is one factor for this explosive growth.
structure and prioritize information from these data warehouses to
In electronic commerce environments, the rapidly escalating
address specific end-user problems. Emerging data mining
volume of data makes timely and accurate data analysis beyond
techniques permit the semi-automatic discovery of patterns,
the reach of the best human domain expert, even hordes of them
associations, changes, anomalies, rules, and statistically
working day and night. Instead, emerging data mining techniques
significant structures and events in data. Very significant business
offer far superior abilities to discover hidden knowledge,
benefits have been attained through the integration of data mining
interesting patterns and new business rules hidden within huge
techniques with current information systems aiding electronic
repositories of electronic databases. Currently regarded as the key
commerce. This paper explains key data mining principles that
element of the more elaborate process of Knowledge Discovery in
can play a pivotal role in an electronic commerce environment.
Database (KDD), the data-mining paradigm integrates theoretical
The paper also highlights two case studies in which neural
perspectives from the realms of statistics, machine learning and
network-based data mining techniques were used for inventory
artificial intelligence. From the standpoint of technology
optimization. The results from the data mining prototype in a
implementation, it relies on advances in data modeling, data
large medical distribution company provided the rationale for the
warehousing and information retrieval. However, the more
strategy to reduce the total level of inventory by 50% (from a
important challenges lie in organizing business practices around
billion dollars to half a billion dollars) in the particular
the knowledge discovery activity. As organizations gear towards a
organization, while maintaining the same level of probability that
web-enabled economy and increasingly rely on online information
a particular customer’s demand will be satisfied. The second case
sources for a variety of decision support applications, one will
study highlights the use of neural network based data mining
witness a growing reliance on data mining techniques in the
techniques for forecasting hot metal temperatures in a steel mill
electronic commerce space.
blast furnace.
Data mining involves the semi-automatic discovery of
patterns, associations, changes, anomalies, rules, and statistically
Keywords significant structures and events in data. In other words, data
Inventory Optimization, Temporal Data Mining, Data Massaging. mining attempts to extract knowledge from data. Data mining
differs from traditional statistics in several ways: formal statistical
inference is assumption-driven in the sense that a hypothesis is
formed and validated against the data. Data mining in contrast, is
discovery driven in the sense that patterns and hypothesis are
automatically extracted from large data sets. Further, the goal in
Permission to make digital or hard copies of part or all of this work or data mining is to extract qualitative models, which can easily be
personal or classroom use is granted without fee provided that copies are not translated into business patterns, logical rules or visual
made or distributed for profit or commercial advantage and that copies bear representations. Therefore, the results of the data mining process
this notice and the full citation on the first page. To copy otherwise, to
may be patterns, insights, rules, or predictive models that are
republish, to post on servers, or to redistribute to lists, requires prior specific
permission and/or a fee. frequently beyond the capabilities of the best human domain
KDD 2000, Boston, MA USA experts.
© ACM 2000 1-58113-233-6/00/08 ...$5.00

480
In the electronic commerce space, data mining techniques highest level of customer satisfaction. The former principle is not
have the potential of providing companies with competitive quantified in numerical terms. On the latter issue, Medicorp
advantages in optimizing their use of information. Potential strives to achieve a 95% fulfillment level. That is, if a random
applications include the following [16][17][18][19]: customer walks into a random store on a random day for a random
♦ To manage customer relationships by predicting customer drug, the probability for the availability of the particular item
buying habits, calibrating customer loyalty and retention, must be 95%. The figure of 95% is based on the type of goods
analyzing customer segments, target marketing and that Medicorp carries, and the service levels offered by
promotion effectiveness, customer profitability, customer competitors of Medicorp for the same items. Medicorp has a
lifetime value, and customer acquisition effectiveness. corporate wide data warehouse system that maintains data on what
♦ To enable financial management through analytical fraud was sold, at what price, and to whom at each store.
detection, claims reduction, detection of high cost to serve After reviewing various options, and using conventional
orders or customers, risk scoring, credit scoring, audit inventory optimization techniques, Medicorp adopted a “three-
targeting and enforcement targeting. weeks of supply” approach. This approach involved the regression
♦ To position products by product affinity analysis that shows study of historical data to compute a seasonally – adjusted
opportunities for cross selling, up selling and strategic estimate of the forecasted demand for the next three week period.
product bundling. This estimated demand is the inventory level that Medicorp keeps,
♦ To develop efficient and optimized inventory management or strives to keep, on a continuing basis. Each store within the
system based on Web customer demand predictions. Medicorp chain orders replenishments on a weekly basis and
♦ To implement more efficient supply chains with suppliers receives the ordered items 2-3 days later from a regional data
and contractors. warehouse. Historically, this model has yielded the 95% target for
customer satisfaction.
2. CASE STUDIES To find the best solution to the inventory problem, we
analyzed data maintained within the transactional data warehouse
2.1 Medicorp – Pharmaceutical Distribution at Medicorp. The Medicorp data warehouse is of the order of
Company several gigabytes in size. In the modeling phase, we extracted a
portion of the recent data fields, which was deemed to provide
Large organizations, especially geographically dispersed
adequate raw data for a preliminary analysis:
organizations, are usually obliged to carry large inventories of
products ready for delivery on customer demand. Inventory ♦ Date field – Indicates the date of the drug transaction
optimization pertains to the problem of how much quantity of ♦ NDC number – Uniquely identifies a drug (equivalent to a
each product should be kept in the inventory at each store and drug name)
each warehouse. If too little inventory is carried relative to ♦ Customer number – Uniquely identifies a customer (useful in
demand, unsatisfied customers could turn to competitors. On the tracking repeat customers)
other hand, a financial cost is incurred for carrying excessive ♦ Quantity number – Identifies the amount of the drug
inventory. In addition, some products have short expiration purchased
periods and shelf life and therefore, must be replaced periodically. ♦ Sex field – Identifies the sex of the customer
Inventories take a lot of money to maintain. The best way to ♦ Days of Supply – Identifies how long that particular drug
manage an inventory is through the development of better purchased will last
techniques for predicting customer demands and managing stock ♦ Cost Unit Price – Establishes the per unit cost to Medicorp
inventories accordingly. In this way, the size and the constitution of the particular drug
of the inventory can be optimized with respect to changing ♦ Sold Unit Price – Identifies per unit cost to the customer of
demands. the particular drug
With hundreds of chain stores and with revenues of several Before adopting neural network based data mining
billion dollars per annum, “Medicorp” is a large retail distribution techniques, preliminary data analysis was utilized to help search
company. Medicorp revenues exceeded $15 billion from over for seasonal trends, correlation between field variables and
4100 stores in 25 states in the United States. Medicorp dispenses significance of variables, etc. Our preliminary data provided
approximately 12% of all retail prescriptions in the United States. evidence for the following patterns:
In keeping with its market-leading position, Medicorp is forced to ♦ Most sales of drug items showed minimal correlation to
have a large standing inventory of products ready to deliver on seasonal changes.
customer demand. The problem is how much quantity of each ♦ Women are more careful about consuming medication than
drug should be kept in the inventory at each store and warehouse. men, indicating that women customers were more likely to
Because of unfulfilled prescriptions, unsatisfied customers may complete the prescription fully than men.
switch company loyalties, relying on other pharmacy chains for ♦ Drug sales are heaviest on Thursdays and Fridays, indicating
their needs. On the other hand, Medicorp incurs a financial cost if that inventory replenishment would be best ordered on
it carries excessive inventories. In addition, pharmaceutical drugs Monday.
have a short expiration date and must be renewed periodically. ♦ Drug sales (in terms of quantity of drug sold) show differing
Historically, Medicorp has maintained an inventory of degrees of variability:
approximately a billion dollars on a continuing basis, using ♦ Maintenance type drugs (for chronic ailments) show low
traditional regression models to determine inventory levels for degrees of sales variability.
each drug item. The corporate policy of Medicorp is governed by ♦ Acute type drugs (for temporary ailments) show high degrees
two competing principles: minimize total inventory and achieve of sales variability.

481
There is no general theory that specifies the type of neural transformation of data, reuse, and aggregation of data. The one we
network, number of layers, number of nodes (at various layers), or found most effective involved changing future data sets with some
learning algorithm for a given problem. As such, data mining known fraction of past data sets. If X[i]’ represents the ith changed
analysts must experiment with a large number of neural networks data set, X[i] represents the ith initial data set, X[i-1] represents
before converging upon the appropriate one for the problem in the initial (i-1)th initial data set and µ is some numerical factor,
hand. In order to evaluate the relative performance of each neural then the new time series can be computed as X[i]’ = X[i] + µ *
network, we used statistical techniques to measure the error values X[i-1], X[0]’ = X[0]. The modified time series thus has data
in predictions. Most major neural network architectures and major elements that retain a fraction of the information of past elements.
learning algorithms were tested using sample data patterns from By modifying the actual time series with the proposed scheme, the
Medicorp. Multi Layer Perceptron (MLP) models and Time Delay memory of non-zero sales items is retained for a longer period of
Neural Network (TDNN) models yielded promising results and time, making it easier to train the neural networks with the
were studied in greater detail. modified time series.
Modeling short time-interval predictions is difficult, as it As mentioned before, the policies at Medicorp are governed
requires a greater number of forecast points, shows greater sales by two competing principles: minimize drug inventories and
demand variability, and exhibits lesser dependence on previous enhance customer satisfaction via high availability of items in
sales history. Using MLP architectures and sales data for one class stock. As such, we calibrated the different inventory models using
of products, we initially attempted to forecast sales demand on a two parameters: “undershoot” and “days of supply”. The number
daily basis. The results were unsatisfactory: the networks of “undershoots” denotes the number of times a customer would
produced predictions with very low correlation (generally below be turned away if a particular inventory model were used over the
20 %) and very high absolute error values (generally above 80 %). “test” period. The “days-of-supply” statistic is the number of days
Hence, modeling for larger time intervals was attempted next. the particular item in the inventory is expected to last. By using
As expected, forecasting for a week proved more accurate the latter parameter, one reduces the complexity and allows for
than for a day and forecasting for one month proved more equitable comparisons across different categories of items. For
accurate than for a week. Indeed, when predicting aggregate example, items in the inventory are measured in different ways: by
annual sales demand, we obtained average error values of only weight or by volume or by number. If one talked in terms of raw
2%. Keeping a weekly prediction interval provided the best amount, one would need to take into account different units of
compromise between the accuracy of prediction and the measure. However, the “days-of-supply” parameter allows all
usefulness of the predicted information for Medicorp. The weekly items to be specified in terms of one unit: days. The level of
forecasts are useful for designing inventory management systems popularity of the item gets factored into the “days-of-supply”
for individual Medicorp stores, while the yearly forecasts are parameter. While maintaining a 95% probability of customer, the
useful for determining the performance of a particular item in a MLP model reduces “days-of-supply” for items in the inventory
market and the overall financial performance of the organization. by 66%. On the average, the neural network “undershoots” only
The neural network was trained with historic sales data using three times (keeping the 95% customer satisfaction policy of
two methods: the standard method and the rolling method. The Medicorp).
difference between these two methods is best explained with an Our models suggested that, as compared to the “three-weeks
example. Assume that weekly sales data (in units sold) were 10, of supply” thumb rule, the level of inventory needs to be
20, 30, 40, 50, 60, 70, 80, 90, 100, etc. In the standard method, “reduced” for popular items and “increased” for less popular or
we would present the data: “10, 20, 30” and ask the network to unpopular items. This inference appears counter-intuitive at first
predict the fourth value: “40”. Then, we would present the glance. However, since fast moving items are already carried in
network with “40, 50, 60” and ask it to predict the next value: large amounts, and since they can be replenished at weekly
“70”. We would continue this process until all training data were intervals, one can reduce the inventory level without adversely
exhausted. On the other hand, using the rolling method, we impacting the likelihood of availability when needed. This is the
would present historic data as “10, 20, 30” and ask the network to factor that permits significant reduction in the size of the total
predict the fourth value: “40”; then, we would present the network inventory, and has been highlighted by a number of observers in
with “20, 30, 40” and ask it to predict the fifth value: “50”. We the popular press.
would continue using the rolling method until all the training data To summarize the effort, we developed the neural network
were exhausted. based data mining model for reducing the inventory at Medicorp
The rolling method has an advantage over the standard from over a billion dollars worth of drugs to about one-half billion
method in that it produces a greater quantity of training examples dollars (reduction by 50%) while maintaining the original
from the same data sample, but at the expense of training data customer satisfaction level (95% availability level).
quality. The rolling method can “confuse” the neural network
because of the close similarity between training samples. Using 2.2 Steelcorp – Iron and Steel Company
the previous example for instance, the rolling method would The blast furnace is the heart of any steel mill. Inside the
produce “10, 20, 30”; “20, 30, 40”; “30, 40, 50”. Each of these blast furnace, the oxygen from the iron oxides is removed to yield
training samples differs from another data set by a single number nearly pure liquid iron. This liquid iron, or pig iron, is the raw
only. This minuscule difference may reduce the neural network’s material used in the steel plants. As with any product, the quality
ability to understand the underlying pattern in the data. of this pig iron can vary. The most important determinants of the
At Medicorp, some items sell infrequently. In fact, some of quality are (1) the amount and composition of any impurities, and
the specialized drugs may sell only twice or thrice a year at a (2) the temperature of the hot metal when it is tapped from the
particular store. This lack of sales data is a major problem in blast furnace [8]. The quality of the pig iron produced is
training neural networks. To solve it, we used other methods for important in determining how costly it will be to produce steel

482
from the pig iron, as well as constraining what final types of steel the other data points were taken every five minutes. Linear
into which the pig iron can be made. Therefore, it is crucial that interpolation between measurements of HMT was used to
hot metal temperature be maintained within an optimal range of approximate values for the missing data points.
values [9]. A blast furnace is very difficult to model due to the The raw data from the blast furnace contained a total of 9100
complex flow conditions with mass and heat transfer inside. For data points taken every five minutes. This five minute level data
many years, blast furnace operators have been aware of the fact may see some inputs changing rapidly from one value to another,
that there are no universally accepted methods for accurately but since the temperature changes slowly over a longer period of
controlling blast furnace operation and predicting the outcome. time these short term changes do not have a noticeably affect on
The Hot Metal Temperature and Silicon Content are important the output. Domain knowledge from Steelcorp indicated that an
indicators of the internal state of a blast furnace as well as of the effective unit for considering the data would be in blocks of one
quality of the pig iron being produced. The production of pig iron hour. Therefore, groups of twelve data points were averaged to
involves complicated heat and mass transfers and introduces create one data point, which represented one-hour block. While
complex relationships between the various chemicals used. hourly averaging of the data improved the predictive ability of the
This case study presents preliminary results from the use of network, it had a side effect of greatly reducing the number of
Artificial Neural Networks (ANNs) as a means of modeling these data points available for training the networks. The hourly
complex inter-variable relationships. The research is based on averaging reduced the number of data points to approximately
three months of operational data collected from the blast furnace 760. A moving window technique was used to deter this problem.
of “Steelcorp”. Steelcorp is one of Asia’s largest manufacturers The moving window takes the first twelve data points and
of iron and steel and has multiple blast furnaces operating in averages them, but in the next step it shifts over by a five-minute
tandem at multiple locations. Most of the blast furnaces are state- interval and averages the new data point with the previous eleven
of-the-art and automatically collect and store data at periodic data points. The window continues to slide one data point at a
intervals on a number of input and output parameters for future time, until the end of the set is reached. This technique allowed
analysis. the use of almost the same number of data points as in the original
There have been many attempts by researchers to use AI dataset.
techniques in order to predict different state variables of the blast The initial data contained 35 input parameters. Analysis of
furnace based on measured conditions within the furnace. the data revealed that some of the input variables were redundant
However, modeling the relationships between various variables in and others were not useful in predicting HMT or Silicon content.
the blast furnace has been quite difficult using standard statistical In order to discover which variables were the most important, a
techniques [5]. The main reason is that non-linearities exist sensitivity analysis was performed on all of the 35 input variables.
between the different parameters used in pig iron (hot metal) The way this was done was to calculate the correlation coefficient
production. Production of hot metal in a blast furnace is the result between each input variable and the corresponding output
of complex chemical reactions that scientists have not been able to variable (HMT). The reasoning is that the higher the correlation
model explicitly. Therefore, many have turned to neural networks between a particular input and HMT, the more ‘important’ that
in order to predict various blast furnace parameters. For example, particular input variable must be in determining HMT. Therefore,
Bulsari and Saxen [5] used feed-forward neural networks when such a variable should be included in the dataset. Using
trying to classify the state of a blast furnace based on the correlation relationships and information from the blast furnace
measurement of blast furnace temperatures. Bulsary et al [9] used experts at Steelcorp, the number of input variables was narrowed
multi layered feed-forward artificial neural networks to predict the down from 35 to 11. These 11 variables were: total coke, carbon
silicon content of hot metal from a blast furnace. Several different oxide, hydrogen, steam, group 1 heat flux, group 2 heat flux,
artificial neural network models were tried by Singh et al [7] in actual coke injection, % oxygen enrichment, ore/coke ratio, hot
order to predict the silicon content of hot metal using: coke rate, blast temperature (degrees C), charge time for 10 semi charges,
hot blast temperature, slag rate, top pressure, slag basicity and the and the previous measured hot metal temperature (HMT).
logarithm of blast kinetic energy. Two distinct types of data sets were created in order to model
The raw data from the blast furnace was not future silicon content. The first type of data set consisted of 38 of
consistent/enough for direct use during modeling. Reasons for the 39 input/output columns of the five minute interval HMT data
this ranged from problems inherent to the data, such as missing or (the only column omitted was the time column). These variables
very anomalous values, to more subtle flaws such as not taking were used as the inputs in order to predict the lone output
into account the effect of time lags in the production process. variable, Si%, which is a column that was extracted from hot
Several steps were involved in preprocessing the raw data into a metal chemistry data.
dataset that would be suitable for training an artificial neural Since Si% was measured at a less frequent rate than the HMT
network. input variables, the addition of the silicon column as the output
Extremely abnormal data values were adjusted to make the column resulted in having large, contiguous regions of the output
data more consistent. Values that were more than two standard variable that had the same, constant value. Therefore, linear
deviations from the mean were modified so that they would be interpolation and hourly averaging were performed. In addition,
two standard deviations away from the mean. In some cases a the usual practices of implementing the best lags for each input
minimum value for a variable was specified. If two standard column, filling of missing values with previous values, and
deviations below the mean is smaller than the minimum then the normalization of each column were also implemented. The second
data was adjusted to the minimum value. This process removed type of silicon data set was processed in the same manner as
any outliers from the dataset. A major problem with the original above, but, includes additional variables as inputs. Specifically,
data was inaccurate values of HMT for many of the data points. the additional inputs used were taken from the Coke and Sinter
HMT can be measured only approximately once every hour while

483
data sets, and include the following: Coke Ash, Coke V.M., that the networks are not focusing on just the previous silicon
C.S.R., C.R.I., RDI, CaO, SiO2, MgO, Al2O3, FeO and Fe. value when predicting into the future.
To present some of the key results from the modeling
exercise, prediction results from modeling HMT 2 and 4 hours
into the future are shown below (Figures 1 and 2). Figure 1 shows
the graph of predicted Hot Metal Temperature, two hours ahead of
time, against the observed value. The network with the lowest
mean squared error (MSE) had 19 hidden nodes. A noticeable lag
is present which indicates that the most important variable (as far
as the model is concerned) for predicting future HMT is the
previous known HMT.

Figure 3: Silicon Content 2 Hour Network with coke and


sinter as inputs Solid - Predicted Dashed – Actual

Figure 1: HMT 2 Hours Solid - Predicted Dashed -Actual

Figure 4: Silicon Content 4 Hour Network with coke and


sinter as inputs Solid - Predicted Dashed - Actual

The research group is currently looking at other artificial


intelligence techniques such as genetic algorithms and pattern
matching to control the conditions in the blast furnace. A
prediction can indicate the future condition of the blast furnace
based on the current conditions. This is extremely useful when the
blast furnace operator can alter the current conditions in order to
Figure 2: HMT 4 Hours Solid - Predicted Dashed - Actual keep the future conditions within desirable range. In order to
Similar analysis was performed for modeling silicon content. perform the task, the relationships between the variables being
Preliminary results from the modeling are shown in Figures 3 and controlled and the variables affecting them must be known. But
4. From Figures 1-4 and comparing the absolute error values from some characteristics of the problem make it difficult: 1) each
the analyses, our results indicate that the addition of coke and variable being controlled is affected by a large number of
sinter as inputs did not provide any clear advantage in terms of variables, 2) the relationships between the variables being
predicting silicon content. One interesting point to notice is that, controlled and the variables affecting them are non-linear, and 3)
unlike in the HMT case, there is no deterioration in the quality of these non-linear relationships change over time. The first step
silicon predictions as the prediction horizon increases. In towards controlling the conditions in a blast furnace involves
addition, the predictions do not seem to "lag" the actual values. finding out which input variables are most influential in
This is a problem that we had with HMT estimation. This means

484
producing an output. Our analysis uses HMT as the output 4. CHECKLIST FOR POTENTIAL
variable.
BENEFICIARIES
Since we have been able to train networks that can predict
HMT with a relatively high degree of accuracy, we now will find Based on our experience with many data mining projects, we
what kind of importance the neural network itself assigns each have the following suggestions to offer for using data mining
input variable when predicting HMT. The neural network will techniques for electronic commerce related applications:
provide us with some insight into which variables we should pay Have a clearly articulated business problem and then
special attention to when trying to predict and control HMT. We determine whether data mining is the proper solution technology.
calculate the derivative of the output (HMT) with respect to each It might be tempting to use data mining to solve every business
input variable using the formulas demonstrated by Takenaga et al problem related to databases - but some problems are unsuited to
[9] for the case when one hidden layer is present in the neural data mining. A question such as "What were my sales from Web
network. We have extended their work to the case of neural customers in Massachusetts last month?" can be best answered
networks with two hidden layers. Since the weights of the network with a database query or on-line analytical processing tool. Data
are fixed, each derivative will be a function of the weights and the mining is about answering questions such as: “What are the
input values at the given time. The derivative of the output with characteristics of my most profitable Web customers from
respect to a given input variable depends on the value of that Massachusetts?” “How do I optimize my inventory for the next
input variable as well as the values of all the other input variables month?” In the electronic commerce space, data mining can be
at that time. This means that the derivative of HMT with respect used effectively to increase market share by knowing who your
to an input variable will vary over time. This approach is expected most valuable customers are, defining their features and then
to give us insights into which variables may be more influential in using that profile to either retain them or target new, similar
predicting HMT when compared to others. customers.
Have business division(s) be intimately involved in the
3. CONCLUSIONS endeavor from the beginning. Data mining is gradually evolving
from a technology-driven concept to a business solution-driven
With the advent of electronic commerce, the rapid growth of concept. Earlier, information technology consumers were eager to
business databases has overwhelmed the traditional, interactive employ data mining technologies without much regard to its
approaches to data analysis and created a need for a new incumbent business processes and organizational disciplines. Now
generation of tools for intelligent and automated discovery in business divisions, rather than technology divisions, are
data. The paper presented preliminary data from research efforts spearheading the data mining efforts in major corporations.
currently underway in MIT Sloan School of Management in Understand and deliver the fundamentals. At the heart of
strategic data mining towards that end. Prototype based on these any data mining effort, there must be a business process. No
tools was successful in reducing the total level of inventory by amount of technology firepower can take its place. The
50% in Medicorp, while maintaining the same level of probability fundamentals of the business must be incorporated seamlessly in
that a particular customer’s demand will be satisfied. the data mining effort. For example, it may be important to keep
The paper also highlighted many interesting challenges in mind that Web customers are different from non-Web
within the context of providing neural network based data mining customers; therefore, any data mining results derived from
tools for Inventory control. In neural network based data mining, analyzing an entire customer base may not be applicable to a web-
the most difficult problems were encountered at the data customer base. In fact, data mining tools can be used to model the
preparation stage. The problem of too few and relatively irregular differences in the two types of customer bases thereby creating a
timing of data points was addressed in multiple ways. more effective experience for the customer.
Linearization was used to overcome the erratic frequency with Have your technology folks be involved too. Software
which the input variables were measured. The concept of “moving vendors are responding to the technology-to-business migration
windows” was used to boost the number of data points reduced by by growing emphasis on one-button data mining products.
the use of linearization. Both methods are good at resolving the Vendors can repackage data mining tools, enhancing their
problems they are intended for, but distort the way the input graphical user interface and automating some of their more
parameters are represented in the modeling stage. This is a side esoteric aspects. However, it still falls on the analyst to acquire,
effect which one has to cope with, in situations involving missing clean, and feed data to the software; make dynamic selection of
and/or infrequent data. appropriate algorithms; validate and assimilate the results of the
In the modeling stage, different neural network algorithms data mining runs; and generate business rules from the patterns in
were experimented with, along with different input-hidden-output the data. Most of the operational complexity, time consumption
node configurations, different randomizing algorithms and and potential benefits of data mining lie in performing these steps
different learning rates. The variations on the number of nodes and performing them well.
and the different algorithms did not produce very different results,
indicating that these factors are not important. Even though in
general, Time-Delay family of neural networks are more powerful
networks in time series analysis because of their ability to capture
time-dependencies in the data set, in this case, they did not
outperform simple Feed-Forward neural networks. For both the
case studies, time-dependencies of the data set were explicitly
defined and compensated in the modeling stage.

485
5. ACKNOWLEDGMENTS [10] Takenaga, H et al. “Input Layer Optimization of
Neural Networks by Sensitivity Analysis and Its
The authors would like to thank various members of the Data Application to Recognition of Names,” in Electrical
Mining research group at the Sloan School of Management for Engineering in Japan. Vol. 111, No. 4, 1991.
their help in building training data for the ANNs and testing [11] Elvers, B, ed. Ullman's Encyclopedia of Industrial
various ANN models. Proactive support from the top management Chemistry. John Wiley and Sons, New York. 1996.
of Medicorp and Steelcorp throughout the research is greatly
appreciated. [12] Smith, Murray. Neural Networks for Statistical
Modeling. Van Nostrand Reinhold, New York. 1993.
6. REFERENCES [13] Chauvin, Y. “Generalization Performance of
Overtrained Back-Propagation Networks," in "Neural
[1] Knoblock, C, ed. “Neural networks in real-world Networks,” Lecture Notes in Computer Science., pp
applications,” in IEEE Expert, August 1996, pp 4-10. 46-55. Springer-Verlag, New York. 1990
[2] Bhat, N and McAvoy, T.J. “Use of Neural Nets For [14] Bhattacharjee, D., Dash S.K., Das, A.K. Application of
Dynamic Modeling and Control of Chemical Process Artificial Intelligence in Tata Steel. Tata Search, 1999.
Systems,” in Computers in Chemical Engineering, Vol.
[15] Weigend, A.S. and Gershenfeld, N.A. “Results of the
14, No. 4/5, pp 573-583.
time series prediction competition at the Santa Fe
[3] Rumelhart D. and McClelland J. Parallel Distributed Institute,” in IEEE International Conference on Neural
Processing: Exploration in the Microstructure of Networks, pp 1786-1793. IEEE Press, Piscataway, NJ,
Cognition, Vol.1, MIT Press 1986. 1993.
[4] Pal S.K. and Mitra S., “Multilayer Perceptron, fuzzy [16] Reyes, Carlos, Ganguly, A, Lemus, G and Gupta, A.
sets and classification”, in IEEE Transactions on “A hybrid model based on dynamic programming,
Neural Networks, Vol.3, No.5, September 1992. neural networks, and surrogate value for inventory
[5] Bulsari, A and Saxen, H. “Classification of blast optimization applications” in Journal of the
furnace probe temperatures using neural networks,” in Operational Research Society, Vol. 49, 1998, pp. 1-10.
Steel Research. Vol. 66. 1995. [17] Bansal, K., Gupta, A., Vadhavkar, S. “Neural
[6] Biswas A.K., Principles of Blast Furnace Ironmaking, Networks Based Forecasting Techniques for Inventory
SBA Publications, 1984. Control Applications,” in Data Mining and Knowledge
Discovery, Vol. 2, 1998.
[7] Singh, H and Sridhar, Nallamali and Deo, Brahma.
“Artificial neural nets for prediction of silicon content [18] Gupta, A., Vadhavkar, S. and Au, S. “Data Mining for
of blast furnace hot metal,” in Steel Research, vol. 67 Electronic Commerce,” in Electronic Commerce
(1996). No. 12 Advisor, Volume 4, Number 2, September/October
1999, pp. 24-30.
[8] Osamu, L. and Ushijima, Yuichi and Toshiro, Sawada.
“Application of AI techniques to blast furnace [19] Bansal, K., Vadhavkar, S. and Gupta, A “Neural
operations,” in Iron and Steel Engineer, October 1992. Networks Based Data Mining Applications for Medical
Inventory Problems”, in International Journal of Agile
[9] Bulsari, A and Saxen, Henrik and Saxen Bjorn. “Time-
Manufacturing, Volume 1 Issue 2, 1998, pp. 187-200,
series prediction of silicon in pig iron using neural
Urvashi Press, India.
networks,” in International Conference on Engineering
Applications of Neural Networks (EANN '92).

486

You might also like