You are on page 1of 9

Available online at www.sciencedirect.

com

ScienceDirect
Available online at www.sciencedirect.com
Available online at www.sciencedirect.com
Procedia Computer Science 00 (2017) 000–000
ScienceDirect
ScienceDirect
www.elsevier.com/locate/procedia

Procedia Computer Science 00 (2017) 000–000


Procedia Computer Science 121 (2017) 282–290
www.elsevier.com/locate/procedia

CENTERIS - International Conference on ENTERprise Information Systems / ProjMAN -


International Conference on Project MANagement / HCist - International Conference on Health
and Social Care Information Systems and Technologies, CENTERIS / ProjMAN / HCist 2017, 8-10
CENTERIS - International November
Conference2017,
on ENTERprise Information Systems / ProjMAN -
Barcelona, Spain
International Conference on Project MANagement / HCist - International Conference on Health
and Social Care Information Systems and Technologies, CENTERIS / ProjMAN / HCist 2017, 8-10
Improving organizational decision
November support:Spain
2017, Barcelona, Detection of outliers and
sales prediction for a pharmaceutical distribution company
Improving organizational decision support: Detection of outliers and
Augusto Ribeiroa*, Isabel Serucab,c, Natércia Durãod
sales prediction for a pharmaceutical distribution company
a
OCP Portugal, Trav. Nº Sra. Caridade 28, 4470-256 Maia, Portugal
b
Univ Portucalense, Research on Economics, Management and Information d
Augusto Ribeiro , Isabel Seruca , Natércia Durão
a* b,c
Technologies - REMIT, Rua Dr. António Bernardino Almeida, 541-619, P 4200-072, Oporto, Portugal
c
ISTTOS, Centro Algoritmi, University of Minho, Portugal
a
OCP Portugal, Trav. Nº Sra. Caridade 28, 4470-256 Maia, Portugal
d
Univ Portucalense, Portucalense Institute for Legal Research – IJP, Research on Economics, Management and
b
Univ Portucalense,
Information Technologies – REMIT, RuaResearch on Economics,
Dr. António Bernardino Management and Information
Almeida, 541-619, P 4200-072, Oporto, Portugal
Technologies - REMIT, Rua Dr. António Bernardino Almeida, 541-619, P 4200-072, Oporto, Portugal
c
ISTTOS, Centro Algoritmi, University of Minho, Portugal
d
Univ Portucalense, Portucalense Institute for Legal Research – IJP, Research on Economics, Management and
Abstract
Information Technologies – REMIT, Rua Dr. António Bernardino Almeida, 541-619, P 4200-072, Oporto, Portugal

Stock unavailability in the supply of medicines to pharmacies can be caused by several factors including manufacturing
problems,
Abstract lack of raw materials, end of product selling, disease and epidemics outbreaks. Furthermore, the sale of medicines by
some pharmacies to foreign markets has increased in recent years, and is considered one of the main causes of medicine supply
failuresunavailability
Stock in Portugal. This paper
in the depicts
supply of the case study
medicines to of a pharmaceutical
pharmacies can be distribution
caused by company in Portugal
several factors and aims
including to address
manufacturing
two main research
problems, issues.
lack of raw The first
materials, endone consisted
of product in detecting
selling, diseasecustomers (pharmacies)
and epidemics outbreaks.andFurthermore,
products (medicines)
the sale ofwhich may by
medicines be
considered outliers and perform stock proration when these outliers are detected, in order to avoid abnormal
some pharmacies to foreign markets has increased in recent years, and is considered one of the main causes of medicine supply sales and out-of-
stocks ininpharmacies.
failures Thepaper
Portugal. This second one targeted
depicts the casethe salesofprediction
study for the pharmaceutical
a pharmaceutical distribution
distribution company company,
in Portugal andinaims
ordertotoaddress
better
control
two main and manageissues.
research the levels of stock
The first of medicines,
one consisted so as to
in detecting avoid excessive
customers inventory
(pharmacies) costs while
and products guaranteeing
(medicines) whichcustomer
may be
demand satisfaction,
considered outliers andand thus decreasing
perform the possibility
stock proration when theseofoutliers
loss ofarecustomers
detected, due to stock
in order outages.
to avoid In outliers
abnormal sales anddetection
out-of-
(customers and products) we used the Box-plot method as well as the SPSS statistical software. For sales
stocks in pharmacies. The second one targeted the sales prediction for the pharmaceutical distribution company, in order to prediction, thebetter
time
series data mining method smoothed Pegels was used, while the implementation was done in SQL and
control and manage the levels of stock of medicines, so as to avoid excessive inventory costs while guaranteeing customer the analyzed data was
stored in satisfaction,
demand an Oracle database.
and thus decreasing the possibility of loss of customers due to stock outages. In outliers detection
© 2017 The Authors.
(customers Published by Elsevier B.V. method as well as the SPSS statistical software. For sales prediction, the time
© 2017 Theand products)
Authors. we used
Published the Box-plot
by Elsevier B.V.
series data mining
Peer-review under method smoothed
responsibility Pegels
of the was used,
scientific while the
committee of implementation
the CENTERIS was done in SQLConference
- International and the analyzed data was
on ENTERprise
stored in an Oracle
Information Systemsdatabase.
/ ProjMAN - International Conference on Project MANagement / HCist - International Conference on
© 2017and
Health TheSocial
Authors.
Care Published by Elsevier
Information SystemsB.V.
and Technologies.
*
Corresponding author. Tel.: +0-000-000-0000 ; fax: +0-000-000-0000 .
E-mail address: acarlosrib@gmail.com

1877-0509 © 2017 The Authors. Published by Elsevier B.V.


*
Corresponding author. Tel.: +0-000-000-0000 ; fax: +0-000-000-0000 .
Peer-review under responsibility
E-mail address: of the scientific committee of the CENTERIS - International Conference on ENTERprise Information Systems /
acarlosrib@gmail.com
ProjMAN - International Conference on Project MANagement / HCist - International Conference on Health and Social Care Information Systems
and Technologies.
1877-0509 © 2017 The Authors. Published by Elsevier B.V.
Peer-review under responsibility of the scientific committee of the CENTERIS - International Conference on ENTERprise Information Systems /
ProjMAN - International Conference on Project MANagement / HCist - International Conference on Health and Social Care Information Systems
and Technologies.
1877-0509 © 2017 The Authors. Published by Elsevier B.V.
Peer-review under responsibility of the scientific committee of the CENTERIS - International Conference on ENTERprise Information
Systems / ProjMAN - International Conference on Project MANagement / HCist - International Conference on Health and Social
Care Information Systems and Technologies.
10.1016/j.procs.2017.11.039
2 Author name / Procedia Computer Science 00 (2017) 000–000

Peer-review under responsibility of the scientific committee of the CENTERIS - International Conference on ENTERprise
Information
Systems / ProjMAN - International Conference
Augusto Ribeiro et al. /on ProjectComputer
Procedia MANagement
Science /121
HCist - International
(2017) 282–290 Conference on 283
Health and Social Care Information Systems and Technologies.

Keywords: organizational decision support, outliers, data mining, time series, sales prediction

1. Introduction

One of the responsibilities of the wholesale distributors of medicines in Portugal is to comply with the law of
having a minimum stock of medicines in order to guarantee supplies in the national market and, thus, avoid possible
breakdown situations in pharmacies. According to Infarmed - National Authority for Medicines and Health Products
I.P. 1, medicine stock breakdowns caused when there is no available quantity of certain medicines to satisfy the
requests of customers (pharmacies), can have as origin several factors, such as: manufacturing problems, lack of raw
material, end of product commercialization, disease outbreaks, epidemics, etc.
In addition to these factors, the sale of medicines by some pharmacies to foreign markets has increased in recent
years and is considered to be one of the main causes of medicine supply failures in Portugal, according to news
broadcasted in several media2. The reasons pointed out for this practice of medicine exports are the decrease in
medicine prices in Portugal and the difficult financial situation of pharmacies, which have made the sale of
medicines to foreign markets more and more attractive and profitable.
For the pharmaceutical distribution companies, it is therefore of prior importance to detect customers
(pharmacies) and products (medicines) outliers (values that distinctly stand out or are inconsistent from others) and
prorate (divide proportionally) the stock when those outliers are detected, in order to prevent the abnormal sale and
to avoid stock breakdowns in pharmacies. In addition to this need, there is a gap between the periodicity of deliveries
of medicines to pharmacies, which may have several deliveries per day, and the procurement of medicine stock by
distributors, which can take about two days.
On the other hand, it is essential for pharmaceutical distributors to have a good forecast of the needs of medicines,
due to the short-term shelf-life of many medicines and the need to control stock levels, in order to avoid excessive
inventory costs as well as the loss of customers due to stock outages.
An adequate sales prediction is generally associated with achieving a good balance between inventory costs and a
proper customer demand satisfaction3. In the specific case of the pharmaceutical distribution industry, the problem is
of furthermore importance due to the short life cycle of most products and the product quality requirement, which in
turn is strongly linked to public health issues4.
This paper extends the work described in4,5 by addressing in combination two main research issues in order to
improve organizational decision making within the pharmaceutical distribution business, using the case of a
pharmaceutical distribution company in Portugal. The first one consisted in detecting customers (pharmacies) and
products (medicines) which may be considered outliers and perform stock proration when these outliers are detected,
in order to avoid abnormal sales and out-of-stocks in pharmacies. The second one targeted the sales prediction for
the pharmaceutical distribution company, in order to better control and manage the stock levels of medicines, so as
to avoid excessive inventory costs while guaranteeing customer demand satisfaction, and thus decreasing the
possibility of loss of customers due to stock outages.
The rest of the paper is structured as follows. In Sections 2 and 3 we provide the background on outliers' detection
and data mining, so as to set the theoretical underpinning of the approach described. In Section 4 we describe the
application of the approach for the case of the pharmaceutical distribution company in Portugal. Section 5 concludes
with considerations on the achievements produced so far and directions for future work.

2. Data analysis and outliers detection

2.1 Outlier definition


Historical data sets may be influenced by unusual and non-repetitive events7: the outliers. Two types of outliers
can be identified: the gross errors and the “true” outliers. The former are associated with processing errors, e.g the
284 Augusto Ribeiro et al. / Procedia Computer Science 121 (2017) 282–290
Author name / Procedia Computer Science 00 (2017) 000–000 3

occurrence of an error in a sales record that must be fixed when detected. In the case of “real” outliers, after
investigating their origin, one of the following three action options must be performed: replacement of the outlier
value by the forecast; replacement of the outlier value by the mean value of immediately adjacent observations;
outlier marked for the future (in the case of a promotional campaign).
If forecasts are calculated on the basis of data series that include outliers, they may be compromised due to the
impact of these values, and the correction of these values will, in general, improve the results obtained in the forecast
calculations8. For this situation to be avoided, the data must be analyzed and, if the presence of an outlier is detected,
it should be replaced by a more appropriate and typical value.
The analysis of outliers’ observations is already an old procedure and dates from the first attempts to analyze a set
of data9. Among the several outlier definitions provided in the literature, an early definition for the concept is: “An
outlying observation, or outlier, is one that appears to deviate markedly from other members of the sample in which
it occurs” 10. This definition was further modified by Barnett and Lewis11, while adding “An observation (or subset
of observations) which appears to be inconsistent with the remainder of that set of data”. A recent definition for
outlier was put forward by12 as “An outlier is an observation that deviates so much from the other observations to
arouse suspicions that it was generated by a different mechanism”. Finally, there is still the possibility of an outlier
being a normal observation, “surprising veridical data"13 so, before deciding what should be done with outliers’
observations, the causes that led to their appearance should be identified.
In most cases, the reasons for outliers’ existence determine how these observations should be handled. According
to Kriegel, Kröger, and Zimek14, the detection of outliers is used within the following contexts: to verify
measurement errors/performance/input values, to evaluate the inherent variability of population elements, in fraud
detection, to know consumer spending behavior, in medical studies, in pharmaceutical research and marketing.
2.2 Outliers identification methods
According to the existing literature15 a large number of outliers’ detection tests have been proposed. By way of
example, tests based on the “distance from the mean” criterion, or the Dixon test based on a value that is too large
(or small) compared to its nearest neighbor, can be referred. It is also important to point out that, for the detection of
outliers, the median (𝑚𝑚𝑚𝑚𝑚𝑚) of the values should be used rather than the mean (𝑥𝑥̅)16.
2.2.1 The Box-plot Method
The method for outliers detection based on the box-plot rule was introduced by Tukey17. Subsequently, this rule
was studied by Hoaglin, Iglewicz, and Tukey18, and was converted into an adequate rule for identifying an outlier by
Hoaglin and Iglewicz19. The box-plot graph has since that time become one of the most popular graphical statistical
procedures. Tukey also included a simple rule for identifying observations as atypical values. This rule identifies
outliers when out of range:

(1)

where: 𝑄𝑄1 – 1st Quartile; 𝑄𝑄3 – 3rd Quartile; 𝑔𝑔 – value used to differentiate between “moderate” and “severe”
outliers. The most common choices for g are 1.5 to signal “moderate” values and 3.0 for signaling “severe” values.

Figure 1. Box-plot example with outliers identification


Augusto Ribeiro et al. / Procedia Computer Science 121 (2017) 282–290 285
4 Author name / Procedia Computer Science 00 (2017) 000–000

Values outside the range of 𝑄𝑄3 + 1,5 × (𝑄𝑄3-𝑄𝑄1) and 𝑄𝑄1-1,5 × (𝑄𝑄3-𝑄𝑄1) are considered moderate outliers (O),
whereas values outside the range 𝑄𝑄3 + 3 × ( 𝑄𝑄3-𝑄𝑄1) and 𝑄𝑄1-3 × (𝑄𝑄3-𝑄𝑄1) are considered severe outliers (*) (Fig. 1).
It should also be noted that for both moderate and severe outliers it is necessary to investigate their origins (i.e., why
they exist).

3. Data Mining

Data mining can be viewed as the process of exploring large volumes of data to identify consistent patterns, such
as association rules or time sequences, so as to detect systematic relationships between variables20. It uses algorithms
to discover rules, identify key factors and trends, discover hidden patterns and relationships in large databases; this
information, once interpreted, is used to support organizational decision making.
The information resulting from the data mining process can, thus, be used to improve procedures, making the
organization proactive and, therefore, more competitive. This improvement results from the identification of patterns
and behaviors, allowing the organization to take corrective initiatives in relation to current actions and foresee future
more competitive settings within the market.

3.1. Data Mining Categories

The activities associated with data mining can be divided into two main groups: description and prediction.
Descriptive data mining identifies rules that characterize the analyzed data. On the other hand, in predictive data
mining, certain attributes of the database or dataset are used to predict the unknown or future value of a target
variable of interest. This distinction is associated with the objective of data mining activity, which may allow to
increase knowledge about the data, description, or support the decision-making process, prediction, through models
capable of predicting the value of a variable21. An example of data mining in the descriptive category is the
determination of the purchasing profiles of an organization's customers, based on the analysis of the customers
transaction database, so as to create targeted marketing campaigns. An example of data mining in the prediction
category is sales prediction for a product, based on the analysis of the sales history of that product, in order to ensure
a better stock management.
Regarding prediction, the best model is the one that presents the highest accuracy, allowing a percentage of hit
higher than the percentage of hit achieved by other models, even though those might be easier to obtain and to
perceive. On the other hand, the best model in description may not be the one that provides more accurate results in
terms of model confidence, but rather the one that allows a broader knowledge of the data analyzed. Each of the
identified data mining categories (description and prediction) includes a set of methods that should be used taking
into account the nature of the problem to be solved.

3.2. Predictive Data Mining: Time series models

Time series may be seen as a particular case of time dependent data. A time series X(t) can be described as a
sequence of values produced by a system and obtained at regular time intervals, being represented by the expression:
X(t) = ..., xt-2, xt-1, xt, xt+1, xt+2, … , in which the succession of values x(t + i) corresponds to a set of sampling
values of a specific variable, always measured in the same conditions but at different time points, and in which the
time instants that define each sampling point are sorted ascendingly.
There are data sets in which the target attribute (variable whose value is intended to be predicted) is time
dependent, i.e., it is associated with a consecutive sequence of periods, and the interest is to know that dependence.
Time series models aim to identify regular patterns in historical observations in order to make predictions for the
future. These models are used in several areas, including business management (forecasting of the demand for
products, electricity consumption), finance (predicting changes in financial markets), macroeconomics (predicting
economic growth, inflation rates), public management (traffic forecasts on bridges or roads)22.
286 Augusto Ribeiro et al. / Procedia Computer Science 121 (2017) 282–290
Author name / Procedia Computer Science 00 (2017) 000–000 5

3.2.1. Exponential Smoothing


Exponential smoothing is a method widely used in the production of a time series. The method considers
exponential weights, which decrease according to the age of the observations. That is, recent observations have
higher weights than the old ones for forecasting. In exponential smoothing, there are one or more smoothing
parameters to be determined (or estimated), and these choices determine the weights assigned to the observations.
The Holt-Winters (HW) label is often assigned to a set of procedures that form the core of the family of
exponential smoothing prediction methods. The basic structures were provided by CC Holt in 1957 and P. Winters in
196023. The robustness and accuracy of the predictions made by exponential smoothing has led to their wide use in
applications where a large number of series need an automated procedure, such as in the case of inventory control.
Although Holt’s method tends to be the most popular approach to model a series trend, its linear forecasting function
has been criticized for its tendency to surpass actual data beyond the short term.
Thus, despite the popularity of Holt’s method, empirical evidence has shown that the Holt’s linear forecasting
function tends to overestimate24. Consequently, Gardner and McKenzie24 propose the use of a Ø damping parameter
in the Holt method to better control the extrapolation of trends.
Pegels25 further suggests that his multiplicative trend method may be more useful than the Holt’s method which
considers an additive trend, since the multiplicative tendency tends to be more likely in real-life applications.
According to Taylor26, there may be an advantage in including an extra parameter in the Pegels formulation to
dampen the extrapolated trend, in a similar way to the damping parameter used in the Holt’s method. Thus, in the
Pegels’ method with multiplicative tendency, Taylor26 suggests the inclusion of a damping parameter (Equation 2).

(2)

where: 𝑡𝑡 = Current time period; 𝛼𝛼 = Smoothing process constant (0 < 𝛼𝛼 < 1); 𝛽𝛽 = Smoothing trend constant (0 <
𝛽𝛽 < 1); 𝐶𝐶𝑡𝑡 = Smoothing value in period t; 𝑇𝑇𝑡𝑡 = Trend value in period t; Ø = Damping constant (0 < Ø < 1); (𝑚𝑚) =
Value of the forecast for period t+m
In order to measure the accuracy of the model, the SMAPE (Symmetric Mean Absolute Percentage Error) metric
can be used, as well as to determine the best parameter values (α, β, Ø) to be considered (Equation 3).

(3)

4. The case: A pharmaceutical distribution company

The pharmaceutical distribution company where this study was undertaken is part of a worldwide group,
headquartered in Portugal and is one of the largest marketing and pharmaceutical distribution companies in the
country. Through the several warehouses of the company, which support the coverage of the whole national
territory, the company daily supplies pharmacies with products ranging from medicines to other pharmaceutic
products.

4.1. Alerting the company for customers and products outliers

The pharmaceutical distribution company deals with thousands of invoices per month (approximately 180,000
invoices), leading to the need to create mechanisms for a faster detection of outliers as well as for a more accurate
evaluation of severe outliers. Regarding severe outliers (as discussed in Section 2.2.1), a rule was created that allows
the distinction between “severe” and “very severe” outliers. This was needed to make the process of outliers’
Augusto Ribeiro et al. / Procedia Computer Science 121 (2017) 282–290 287
6 Author name / Procedia Computer Science 00 (2017) 000–000

classification faster and more accurate, since a great number of severe outliers were detected both for customers and
for ordered quantities of products,
To that end, in the first place, severe outliers (customers and products) are identified through the Tukey rule
(Box-plot method) using the SPSS software. A full description of outliers results can be found in5. Subsequently, for
outliers, a value called outlier_value (starting from initial value 0) is calculated (from step 1) to step 4)) as follows:
1. If the quantity ordered > quantities ordered by all customers of the same classification (3 times the IQR
(InterQuartile Range) ); Then outlier_value = outlier_value +1 - (1 / number of customers with equal rating)
2. If the quantity ordered > quantities ordered by all customers from the same warehouse (3 times the IQR);
Then outlier_value = outlier_value +1 - (1 / number of customers of the same warehouse)
3. If the quantity ordered > monthly warehouse consumption; Then outlier_value = outlier_value +1 - (1 /
number of customers of the same warehouse)
4. If the quantity ordered > monthly company consumption (all the warehouses); Then outlier_value =
outlier_value +1
This rule obtained by practical experimentation assigns (numerical) weights to each of the severe outliers,
allowing their classification by descending order; the rule was implemented as an algorithm in the company’s ERP
system, allowing the sending of an email alert to the company’s procurement department whenever outliers are
detected.

4.2. Sales prediction

Orders of a product (medicine) to the suppliers of the pharmaceutical distribution company are set based on the
analysis of the history of the product quantities requested by the customers (pharmacies) of the company. On the
other hand, there are periods in the year in which the demand for certain medicines is higher (for example, anti-flu
and antipyretics in Winter).
The aim is, therefore, to identify regular patterns of historical observations in the quantities ordered by the
pharmacies of a medicine, to enable the pharmaceutical distribution company to make sales forecasts of these
medicines for a future period. The sales prediction for a product, will indicate the quantity of that product to be
ordered, by the pharmaceutical distribution company to its suppliers.
In the data set to be analyzed (quantities of a product requested by customers/pharmacies), the target attribute
(product quantity to be ordered/by month) is time dependent, that is, it is associated to a consecutive sequence of
periods, and the interest here is to know this dependence.
Thus, the targeted problem falls into a case of application of the predictive data mining category (the aim is to
forecast the future value of an attribute of interest) and the time-series data mining method described in Section 3.2,
since the target attribute (quantity to be ordered of a medicine per month) is time-dependent, it is associated with a
consecutive sequence of periods and it is intended to predict its value for a given future period (in this case, the
current month and the two subsequent months).

4.2.1. Use of the sales prediction method


The dampened Pegels method26 and the SMAPE metrics described in Section 3.2.1 were used in the forecast
calculation. The dampened Pegels method was selected since, in comparison to other exponential smoothing
methods, it is the one that presents the best results for the monthly 1428 series of the M3-Competition26.
Sales prediction was made for 357 medicines marketed by the pharmaceutical distribution company. The
selection of these products (of the approximately 20,000 products marketed by the company) was made taking into
account the relevance of their sales in terms of business to the company.
The sales forecast was made for the current month when this project was undertaken (January 2016) and for the
two subsequent months (February and March 2016). The prediction was made based on the analysis of historical
data concerning the previous 24 months. The implementation of the data mining damped Pegels method and the
calculation of the associated error were done in SQL, since the company database (products, sales) is an ORACLE
version 11.2 database.
288 Augusto Ribeiro et al. / Procedia Computer Science 121 (2017) 282–290
Author name / Procedia Computer Science 00 (2017) 000–000 7

4.2.2. Calculation of the associated error and analysis and interpretation of the results obtained
As shown in Figure 2, the values generated α, β, Ø for the damped Pegels method and the associated SMAPE
error are stored in the table; subsequently, the combination of the constants with the lowest associated error for each
product/warehouse is selected (α=0.1, β=0.1, Ø=0.9).

Figure 2. SMAPE error sorted in ascending order and obtained by combining the 3 constants of the method

After selecting the combination with the lowest error, that is then applied to the calculation of the sales forecast,
with the result being saved in the field "Pegels Predicted Quantity" of the table (Figure 3).

Figure 3. Results of the application of the sales prediction method to Product ID 427795

When this work was undertaken, the procedure used by the pharmaceutical distribution company to determine the
quantity to be ordered of a given product for the following month was based on the calculation of the arithmetic
average of the quantities sold of that product in the last 12 months (Figure 4).
For the case of product code 427795 and for the same sales warehouse, the table in Figure 4 shows the values
obtained for the predicted sales quantities fields determined by both the dampened Pegels method and the process
calculation of the average sales used by the company.

Figure 4. Comparing results of predicted sales using the Pegels method and the adhoc method used by the company

The analysis of the table shows that the predicted sales values obtained with the Pegels method are in general
closer to the real values, than those obtained with the adhoc process in use by the company. In addition, it is
important to note that the values obtained by the Pegels method satisfy in most cases the demand, which does not
happen in the calculation made by the other process.
Augusto Ribeiro et al. / Procedia Computer Science 121 (2017) 282–290 289
8 Author name / Procedia Computer Science 00 (2017) 000–000

5. Conclusions and Future Work

From the study developed and the application undertaken for the pharmaceutical distribution company depicted
in this work, it is concluded that by using the Box-plot method and the rule created (in Section 4.1) for the detection
of outliers (customers and products), outliers may be identified in a fast and precise way, in order to take preventive
initiatives. That is, the application of the described process to the company allows, whenever outliers are detected,
the sending of an email alert to the company’s procurement department advising that the stock of that product
should be prorated in order to prevent its breakage. Thus, a simple, fast and economical process of outliers’
detection to manage the stock of medicines in the company was obtained. In spite of this, the work undertaken
regarding the detection of outliers (customers and products) could be extended to cover all the products marketed by
the company. Another issue to address would be the detection of outliers on a daily basis, in order to detect outliers
with an adequate time gap, before the stock breaks.
Regarding the sales prediction issue, the performance of the time-series damped Pegels method was found to be
satisfactory for sales forecast of products at the individual level, allowing results closer to the real ones and more
reliable, when compared with those obtained with the process previously used by the company. In fact, by using the
values obtained with the Pegels method for sales forecasting, the company could better determine the values to be
ordered of the targeted products and, in most cases, guarantee stock levels that could effectively satisfy the real
demand values, without considering excessive levels of stocks and with a strong impact on costs.
Nevertheless, it is considered that there is still room for improvement in the forecasting technique. On the one
hand, it would be interesting to verify if, using a broader forecast horizon than the one used in this work, the results
obtained could have an acceptable level of accuracy. The inclusion of a longer forecast horizon (e.g. 12 months) in
sales forecasting may be of interest for the pharmaceutical distribution company to achieve a greater margin of
maneuver in negotiating prices with suppliers.
On the other hand, additional data such as advertising actions undertaken by suppliers and promotions made by
competitors (other pharmaceutical distribution companies) may have an impact on the actual sales of the products.
Thus, it could be worth considering these aspects as additional variables in determining the sales forecast.

References

1. Infarmed. Rupturas de Stock de Medicamentos, 2012. [Online]. Available:


https://www.infarmed.pt/portal/page/portal/INFARMED/PUBLICACOES/TEMATICOS/SAIBA_MAIS_SOBRE/SAIBA_MAIS_ARQUIV
O/43_Rupturas_Stock.pdf.
2. Jornal Público. Farmácias que exportem medicamentos que fazem falta em Portugal vão pagar coima quatro vezes superior. Público; 2013.
3. A. Gupta, C. D. Maranas, and C. M. McDonald. Mid-term supply chain planning under demand uncertainty: customer demand satisfaction
and inventory management. Comput. Chem. Eng.; 2000. p. 2613–2621.
4. P. Doganis, A. Alexandridis, P. Patrinos, and H. Sarimveis. Time series sales forecasting for short shelf-life food products based on artificial
neural networks and evolutionary computing. J. Food Eng.;2006. p. 196–204.
5. Ribeiro A, Durão N and Seruca I. Detection of outliers for a pharmaceutical distribution company in Portugal, Á. Rocha, L. P. Reis, M. P.
Cota, O. S. Suárez & R. Gonçalves (eds.), Atas da 11ª Conferência Ibérica de Sistemas e Tecnologias de Informação (CISTI'2016);2016. p
527-531.
6. Ribeiro A, Seruca I and Durão N. Sales prediction for a pharmaceutical distribution company: a data mining based approach, Á. Rocha, L. P.
Reis, M. P. Cota, O. S. Suárez & R. Gonçalves (eds.), Atas da 11ª Conferência Ibérica de Sistemas e Tecnologias de Informação
(CISTI'2016);2016. p 532-538.
7. C. Chen and L. Lon-Mu. Forecasting time series with outliers. J. Forecast.; 1993. p. 13–35.
8. G. Duncan, W. Gorr, and J. Szczypula. Forecasting analogous time series. Pittsburgh, 1998.
9. V. Hodge and J. Austin. A survey of outlier detection methodologies. Artif. Intell. Rev.; 2004. p. 1–43.
10. F. E. Grubbs. Procedures for Detecting Outlying Observations in Samples. in Technometrics; 1969. p. 1–21.
11. V. Barnett and T. Lewis. Outliers in Statistical Data. John Wiley & Sons, 1994.
12. C. Aggarwal. Outlier analysis. Boston: Kluwer Academic Publishers, 2013.
13. G. H. John. Robust Decision Trees: Removing Outliers from Databases. in Proceedings of the First International Conference on Knowledge
Discovery and Data Mining; 1995. p. 174–179.
14. H. P. Kriegel, P. Kröger, and A. Zimek. Outlier detection techniques. in Tutorial at the 13th Pacific-Asia; 2009. p. 6.
15. E. C. De Oliveira. Comparação das diferentes técnicas para a exclusão de ‘outliers’. Metrologia, 2008.
16. C. Leys, C. Ley, O. Klein, P. Bernard, and L. Licata. Detecting outliers : Do not use standard deviation around the mean , use absolute
deviation around the median. J. Exp. Soc. Psychol., no. outliers; 2013. p. 4–6.
290 Augusto Ribeiro et al. / Procedia Computer Science 121 (2017) 282–290
Author name / Procedia Computer Science 00 (2017) 000–000 9

17. J. W. Tukey. Exploratory Data Analysis. Reading, MA: Addison-Wesley, 1977.


18. D. C. Hoaglin, B. Iglewicz, and J. W. Tukey. Performance of Some Resistant Rules for Outlier Labeling. J. Am. Stat. Assoc.; 1986. p. 991–
999.
19. D. C. Hoaglin and B. Iglewicz. Fine Tuning Some Resistant Rules for Outlier Labeling. J. Am. Stat. Assoc.; 1987. p. 1147–1149.
20. M. Santos and I. Ramos. Business Intelligence: Tecnologias da Informação na Gestão de Conhecimento. FCA, 2009.
21. M. J. A. Berry and G. S. Linoff. Mastering Data Mining: The Art and Science of Customer Relationship Management. Wiley, 2000.
22. Vercellis C. Business intelligence data mining and organization for decision making. 2nd ed. Chichester: John Wiley & Sons, 2009.
23. P. R. Winters. Forecasting Sales by Exponentially Weighted Moving Averages. Manage. Sci.; 1960. p. 324–342.
24. J. Gardner, E.S. and E. McKenzie. Forecasting trends in time series. Manage. Sci.; 1985. p. 1237–1246.
25. C. C. Pegels. Exponential forecasting: Some new variations. Manage. Sci.; 1969. p. 311–315.
26. J. W. Taylor. Exponential Smoothing with a Damped Multiplicative Trend Exponential Smoothing with a Damped Multiplicative Trend. Int. J.
Forecast.; 2003. p. 715–725.

You might also like