Professional Documents
Culture Documents
com
ScienceDirect
Available online at www.sciencedirect.com
Available online at www.sciencedirect.com
Procedia Computer Science 00 (2017) 000–000
ScienceDirect
ScienceDirect
www.elsevier.com/locate/procedia
Stock unavailability in the supply of medicines to pharmacies can be caused by several factors including manufacturing
problems,
Abstract lack of raw materials, end of product selling, disease and epidemics outbreaks. Furthermore, the sale of medicines by
some pharmacies to foreign markets has increased in recent years, and is considered one of the main causes of medicine supply
failuresunavailability
Stock in Portugal. This paper
in the depicts
supply of the case study
medicines to of a pharmaceutical
pharmacies can be distribution
caused by company in Portugal
several factors and aims
including to address
manufacturing
two main research
problems, issues.
lack of raw The first
materials, endone consisted
of product in detecting
selling, diseasecustomers (pharmacies)
and epidemics outbreaks.andFurthermore,
products (medicines)
the sale ofwhich may by
medicines be
considered outliers and perform stock proration when these outliers are detected, in order to avoid abnormal
some pharmacies to foreign markets has increased in recent years, and is considered one of the main causes of medicine supply sales and out-of-
stocks ininpharmacies.
failures Thepaper
Portugal. This second one targeted
depicts the casethe salesofprediction
study for the pharmaceutical
a pharmaceutical distribution
distribution company company,
in Portugal andinaims
ordertotoaddress
better
control
two main and manageissues.
research the levels of stock
The first of medicines,
one consisted so as to
in detecting avoid excessive
customers inventory
(pharmacies) costs while
and products guaranteeing
(medicines) whichcustomer
may be
demand satisfaction,
considered outliers andand thus decreasing
perform the possibility
stock proration when theseofoutliers
loss ofarecustomers
detected, due to stock
in order outages.
to avoid In outliers
abnormal sales anddetection
out-of-
(customers and products) we used the Box-plot method as well as the SPSS statistical software. For sales
stocks in pharmacies. The second one targeted the sales prediction for the pharmaceutical distribution company, in order to prediction, thebetter
time
series data mining method smoothed Pegels was used, while the implementation was done in SQL and
control and manage the levels of stock of medicines, so as to avoid excessive inventory costs while guaranteeing customer the analyzed data was
stored in satisfaction,
demand an Oracle database.
and thus decreasing the possibility of loss of customers due to stock outages. In outliers detection
© 2017 The Authors.
(customers Published by Elsevier B.V. method as well as the SPSS statistical software. For sales prediction, the time
© 2017 Theand products)
Authors. we used
Published the Box-plot
by Elsevier B.V.
series data mining
Peer-review under method smoothed
responsibility Pegels
of the was used,
scientific while the
committee of implementation
the CENTERIS was done in SQLConference
- International and the analyzed data was
on ENTERprise
stored in an Oracle
Information Systemsdatabase.
/ ProjMAN - International Conference on Project MANagement / HCist - International Conference on
© 2017and
Health TheSocial
Authors.
Care Published by Elsevier
Information SystemsB.V.
and Technologies.
*
Corresponding author. Tel.: +0-000-000-0000 ; fax: +0-000-000-0000 .
E-mail address: acarlosrib@gmail.com
Peer-review under responsibility of the scientific committee of the CENTERIS - International Conference on ENTERprise
Information
Systems / ProjMAN - International Conference
Augusto Ribeiro et al. /on ProjectComputer
Procedia MANagement
Science /121
HCist - International
(2017) 282–290 Conference on 283
Health and Social Care Information Systems and Technologies.
Keywords: organizational decision support, outliers, data mining, time series, sales prediction
1. Introduction
One of the responsibilities of the wholesale distributors of medicines in Portugal is to comply with the law of
having a minimum stock of medicines in order to guarantee supplies in the national market and, thus, avoid possible
breakdown situations in pharmacies. According to Infarmed - National Authority for Medicines and Health Products
I.P. 1, medicine stock breakdowns caused when there is no available quantity of certain medicines to satisfy the
requests of customers (pharmacies), can have as origin several factors, such as: manufacturing problems, lack of raw
material, end of product commercialization, disease outbreaks, epidemics, etc.
In addition to these factors, the sale of medicines by some pharmacies to foreign markets has increased in recent
years and is considered to be one of the main causes of medicine supply failures in Portugal, according to news
broadcasted in several media2. The reasons pointed out for this practice of medicine exports are the decrease in
medicine prices in Portugal and the difficult financial situation of pharmacies, which have made the sale of
medicines to foreign markets more and more attractive and profitable.
For the pharmaceutical distribution companies, it is therefore of prior importance to detect customers
(pharmacies) and products (medicines) outliers (values that distinctly stand out or are inconsistent from others) and
prorate (divide proportionally) the stock when those outliers are detected, in order to prevent the abnormal sale and
to avoid stock breakdowns in pharmacies. In addition to this need, there is a gap between the periodicity of deliveries
of medicines to pharmacies, which may have several deliveries per day, and the procurement of medicine stock by
distributors, which can take about two days.
On the other hand, it is essential for pharmaceutical distributors to have a good forecast of the needs of medicines,
due to the short-term shelf-life of many medicines and the need to control stock levels, in order to avoid excessive
inventory costs as well as the loss of customers due to stock outages.
An adequate sales prediction is generally associated with achieving a good balance between inventory costs and a
proper customer demand satisfaction3. In the specific case of the pharmaceutical distribution industry, the problem is
of furthermore importance due to the short life cycle of most products and the product quality requirement, which in
turn is strongly linked to public health issues4.
This paper extends the work described in4,5 by addressing in combination two main research issues in order to
improve organizational decision making within the pharmaceutical distribution business, using the case of a
pharmaceutical distribution company in Portugal. The first one consisted in detecting customers (pharmacies) and
products (medicines) which may be considered outliers and perform stock proration when these outliers are detected,
in order to avoid abnormal sales and out-of-stocks in pharmacies. The second one targeted the sales prediction for
the pharmaceutical distribution company, in order to better control and manage the stock levels of medicines, so as
to avoid excessive inventory costs while guaranteeing customer demand satisfaction, and thus decreasing the
possibility of loss of customers due to stock outages.
The rest of the paper is structured as follows. In Sections 2 and 3 we provide the background on outliers' detection
and data mining, so as to set the theoretical underpinning of the approach described. In Section 4 we describe the
application of the approach for the case of the pharmaceutical distribution company in Portugal. Section 5 concludes
with considerations on the achievements produced so far and directions for future work.
occurrence of an error in a sales record that must be fixed when detected. In the case of “real” outliers, after
investigating their origin, one of the following three action options must be performed: replacement of the outlier
value by the forecast; replacement of the outlier value by the mean value of immediately adjacent observations;
outlier marked for the future (in the case of a promotional campaign).
If forecasts are calculated on the basis of data series that include outliers, they may be compromised due to the
impact of these values, and the correction of these values will, in general, improve the results obtained in the forecast
calculations8. For this situation to be avoided, the data must be analyzed and, if the presence of an outlier is detected,
it should be replaced by a more appropriate and typical value.
The analysis of outliers’ observations is already an old procedure and dates from the first attempts to analyze a set
of data9. Among the several outlier definitions provided in the literature, an early definition for the concept is: “An
outlying observation, or outlier, is one that appears to deviate markedly from other members of the sample in which
it occurs” 10. This definition was further modified by Barnett and Lewis11, while adding “An observation (or subset
of observations) which appears to be inconsistent with the remainder of that set of data”. A recent definition for
outlier was put forward by12 as “An outlier is an observation that deviates so much from the other observations to
arouse suspicions that it was generated by a different mechanism”. Finally, there is still the possibility of an outlier
being a normal observation, “surprising veridical data"13 so, before deciding what should be done with outliers’
observations, the causes that led to their appearance should be identified.
In most cases, the reasons for outliers’ existence determine how these observations should be handled. According
to Kriegel, Kröger, and Zimek14, the detection of outliers is used within the following contexts: to verify
measurement errors/performance/input values, to evaluate the inherent variability of population elements, in fraud
detection, to know consumer spending behavior, in medical studies, in pharmaceutical research and marketing.
2.2 Outliers identification methods
According to the existing literature15 a large number of outliers’ detection tests have been proposed. By way of
example, tests based on the “distance from the mean” criterion, or the Dixon test based on a value that is too large
(or small) compared to its nearest neighbor, can be referred. It is also important to point out that, for the detection of
outliers, the median (𝑚𝑚𝑚𝑚𝑚𝑚) of the values should be used rather than the mean (𝑥𝑥̅)16.
2.2.1 The Box-plot Method
The method for outliers detection based on the box-plot rule was introduced by Tukey17. Subsequently, this rule
was studied by Hoaglin, Iglewicz, and Tukey18, and was converted into an adequate rule for identifying an outlier by
Hoaglin and Iglewicz19. The box-plot graph has since that time become one of the most popular graphical statistical
procedures. Tukey also included a simple rule for identifying observations as atypical values. This rule identifies
outliers when out of range:
(1)
where: 𝑄𝑄1 – 1st Quartile; 𝑄𝑄3 – 3rd Quartile; 𝑔𝑔 – value used to differentiate between “moderate” and “severe”
outliers. The most common choices for g are 1.5 to signal “moderate” values and 3.0 for signaling “severe” values.
Values outside the range of 𝑄𝑄3 + 1,5 × (𝑄𝑄3-𝑄𝑄1) and 𝑄𝑄1-1,5 × (𝑄𝑄3-𝑄𝑄1) are considered moderate outliers (O),
whereas values outside the range 𝑄𝑄3 + 3 × ( 𝑄𝑄3-𝑄𝑄1) and 𝑄𝑄1-3 × (𝑄𝑄3-𝑄𝑄1) are considered severe outliers (*) (Fig. 1).
It should also be noted that for both moderate and severe outliers it is necessary to investigate their origins (i.e., why
they exist).
3. Data Mining
Data mining can be viewed as the process of exploring large volumes of data to identify consistent patterns, such
as association rules or time sequences, so as to detect systematic relationships between variables20. It uses algorithms
to discover rules, identify key factors and trends, discover hidden patterns and relationships in large databases; this
information, once interpreted, is used to support organizational decision making.
The information resulting from the data mining process can, thus, be used to improve procedures, making the
organization proactive and, therefore, more competitive. This improvement results from the identification of patterns
and behaviors, allowing the organization to take corrective initiatives in relation to current actions and foresee future
more competitive settings within the market.
The activities associated with data mining can be divided into two main groups: description and prediction.
Descriptive data mining identifies rules that characterize the analyzed data. On the other hand, in predictive data
mining, certain attributes of the database or dataset are used to predict the unknown or future value of a target
variable of interest. This distinction is associated with the objective of data mining activity, which may allow to
increase knowledge about the data, description, or support the decision-making process, prediction, through models
capable of predicting the value of a variable21. An example of data mining in the descriptive category is the
determination of the purchasing profiles of an organization's customers, based on the analysis of the customers
transaction database, so as to create targeted marketing campaigns. An example of data mining in the prediction
category is sales prediction for a product, based on the analysis of the sales history of that product, in order to ensure
a better stock management.
Regarding prediction, the best model is the one that presents the highest accuracy, allowing a percentage of hit
higher than the percentage of hit achieved by other models, even though those might be easier to obtain and to
perceive. On the other hand, the best model in description may not be the one that provides more accurate results in
terms of model confidence, but rather the one that allows a broader knowledge of the data analyzed. Each of the
identified data mining categories (description and prediction) includes a set of methods that should be used taking
into account the nature of the problem to be solved.
Time series may be seen as a particular case of time dependent data. A time series X(t) can be described as a
sequence of values produced by a system and obtained at regular time intervals, being represented by the expression:
X(t) = ..., xt-2, xt-1, xt, xt+1, xt+2, … , in which the succession of values x(t + i) corresponds to a set of sampling
values of a specific variable, always measured in the same conditions but at different time points, and in which the
time instants that define each sampling point are sorted ascendingly.
There are data sets in which the target attribute (variable whose value is intended to be predicted) is time
dependent, i.e., it is associated with a consecutive sequence of periods, and the interest is to know that dependence.
Time series models aim to identify regular patterns in historical observations in order to make predictions for the
future. These models are used in several areas, including business management (forecasting of the demand for
products, electricity consumption), finance (predicting changes in financial markets), macroeconomics (predicting
economic growth, inflation rates), public management (traffic forecasts on bridges or roads)22.
286 Augusto Ribeiro et al. / Procedia Computer Science 121 (2017) 282–290
Author name / Procedia Computer Science 00 (2017) 000–000 5
(2)
where: 𝑡𝑡 = Current time period; 𝛼𝛼 = Smoothing process constant (0 < 𝛼𝛼 < 1); 𝛽𝛽 = Smoothing trend constant (0 <
𝛽𝛽 < 1); 𝐶𝐶𝑡𝑡 = Smoothing value in period t; 𝑇𝑇𝑡𝑡 = Trend value in period t; Ø = Damping constant (0 < Ø < 1); (𝑚𝑚) =
Value of the forecast for period t+m
In order to measure the accuracy of the model, the SMAPE (Symmetric Mean Absolute Percentage Error) metric
can be used, as well as to determine the best parameter values (α, β, Ø) to be considered (Equation 3).
(3)
The pharmaceutical distribution company where this study was undertaken is part of a worldwide group,
headquartered in Portugal and is one of the largest marketing and pharmaceutical distribution companies in the
country. Through the several warehouses of the company, which support the coverage of the whole national
territory, the company daily supplies pharmacies with products ranging from medicines to other pharmaceutic
products.
The pharmaceutical distribution company deals with thousands of invoices per month (approximately 180,000
invoices), leading to the need to create mechanisms for a faster detection of outliers as well as for a more accurate
evaluation of severe outliers. Regarding severe outliers (as discussed in Section 2.2.1), a rule was created that allows
the distinction between “severe” and “very severe” outliers. This was needed to make the process of outliers’
Augusto Ribeiro et al. / Procedia Computer Science 121 (2017) 282–290 287
6 Author name / Procedia Computer Science 00 (2017) 000–000
classification faster and more accurate, since a great number of severe outliers were detected both for customers and
for ordered quantities of products,
To that end, in the first place, severe outliers (customers and products) are identified through the Tukey rule
(Box-plot method) using the SPSS software. A full description of outliers results can be found in5. Subsequently, for
outliers, a value called outlier_value (starting from initial value 0) is calculated (from step 1) to step 4)) as follows:
1. If the quantity ordered > quantities ordered by all customers of the same classification (3 times the IQR
(InterQuartile Range) ); Then outlier_value = outlier_value +1 - (1 / number of customers with equal rating)
2. If the quantity ordered > quantities ordered by all customers from the same warehouse (3 times the IQR);
Then outlier_value = outlier_value +1 - (1 / number of customers of the same warehouse)
3. If the quantity ordered > monthly warehouse consumption; Then outlier_value = outlier_value +1 - (1 /
number of customers of the same warehouse)
4. If the quantity ordered > monthly company consumption (all the warehouses); Then outlier_value =
outlier_value +1
This rule obtained by practical experimentation assigns (numerical) weights to each of the severe outliers,
allowing their classification by descending order; the rule was implemented as an algorithm in the company’s ERP
system, allowing the sending of an email alert to the company’s procurement department whenever outliers are
detected.
Orders of a product (medicine) to the suppliers of the pharmaceutical distribution company are set based on the
analysis of the history of the product quantities requested by the customers (pharmacies) of the company. On the
other hand, there are periods in the year in which the demand for certain medicines is higher (for example, anti-flu
and antipyretics in Winter).
The aim is, therefore, to identify regular patterns of historical observations in the quantities ordered by the
pharmacies of a medicine, to enable the pharmaceutical distribution company to make sales forecasts of these
medicines for a future period. The sales prediction for a product, will indicate the quantity of that product to be
ordered, by the pharmaceutical distribution company to its suppliers.
In the data set to be analyzed (quantities of a product requested by customers/pharmacies), the target attribute
(product quantity to be ordered/by month) is time dependent, that is, it is associated to a consecutive sequence of
periods, and the interest here is to know this dependence.
Thus, the targeted problem falls into a case of application of the predictive data mining category (the aim is to
forecast the future value of an attribute of interest) and the time-series data mining method described in Section 3.2,
since the target attribute (quantity to be ordered of a medicine per month) is time-dependent, it is associated with a
consecutive sequence of periods and it is intended to predict its value for a given future period (in this case, the
current month and the two subsequent months).
4.2.2. Calculation of the associated error and analysis and interpretation of the results obtained
As shown in Figure 2, the values generated α, β, Ø for the damped Pegels method and the associated SMAPE
error are stored in the table; subsequently, the combination of the constants with the lowest associated error for each
product/warehouse is selected (α=0.1, β=0.1, Ø=0.9).
Figure 2. SMAPE error sorted in ascending order and obtained by combining the 3 constants of the method
After selecting the combination with the lowest error, that is then applied to the calculation of the sales forecast,
with the result being saved in the field "Pegels Predicted Quantity" of the table (Figure 3).
Figure 3. Results of the application of the sales prediction method to Product ID 427795
When this work was undertaken, the procedure used by the pharmaceutical distribution company to determine the
quantity to be ordered of a given product for the following month was based on the calculation of the arithmetic
average of the quantities sold of that product in the last 12 months (Figure 4).
For the case of product code 427795 and for the same sales warehouse, the table in Figure 4 shows the values
obtained for the predicted sales quantities fields determined by both the dampened Pegels method and the process
calculation of the average sales used by the company.
Figure 4. Comparing results of predicted sales using the Pegels method and the adhoc method used by the company
The analysis of the table shows that the predicted sales values obtained with the Pegels method are in general
closer to the real values, than those obtained with the adhoc process in use by the company. In addition, it is
important to note that the values obtained by the Pegels method satisfy in most cases the demand, which does not
happen in the calculation made by the other process.
Augusto Ribeiro et al. / Procedia Computer Science 121 (2017) 282–290 289
8 Author name / Procedia Computer Science 00 (2017) 000–000
From the study developed and the application undertaken for the pharmaceutical distribution company depicted
in this work, it is concluded that by using the Box-plot method and the rule created (in Section 4.1) for the detection
of outliers (customers and products), outliers may be identified in a fast and precise way, in order to take preventive
initiatives. That is, the application of the described process to the company allows, whenever outliers are detected,
the sending of an email alert to the company’s procurement department advising that the stock of that product
should be prorated in order to prevent its breakage. Thus, a simple, fast and economical process of outliers’
detection to manage the stock of medicines in the company was obtained. In spite of this, the work undertaken
regarding the detection of outliers (customers and products) could be extended to cover all the products marketed by
the company. Another issue to address would be the detection of outliers on a daily basis, in order to detect outliers
with an adequate time gap, before the stock breaks.
Regarding the sales prediction issue, the performance of the time-series damped Pegels method was found to be
satisfactory for sales forecast of products at the individual level, allowing results closer to the real ones and more
reliable, when compared with those obtained with the process previously used by the company. In fact, by using the
values obtained with the Pegels method for sales forecasting, the company could better determine the values to be
ordered of the targeted products and, in most cases, guarantee stock levels that could effectively satisfy the real
demand values, without considering excessive levels of stocks and with a strong impact on costs.
Nevertheless, it is considered that there is still room for improvement in the forecasting technique. On the one
hand, it would be interesting to verify if, using a broader forecast horizon than the one used in this work, the results
obtained could have an acceptable level of accuracy. The inclusion of a longer forecast horizon (e.g. 12 months) in
sales forecasting may be of interest for the pharmaceutical distribution company to achieve a greater margin of
maneuver in negotiating prices with suppliers.
On the other hand, additional data such as advertising actions undertaken by suppliers and promotions made by
competitors (other pharmaceutical distribution companies) may have an impact on the actual sales of the products.
Thus, it could be worth considering these aspects as additional variables in determining the sales forecast.
References