You are on page 1of 4

Journal of Applied Computer Science & Mathematics, no.

15 (7) /2013, Suceava

Evolutionary Forecasting Method of Treatment Results


Irina TSYGANKOVA
St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences, Laboratory of Applied
Informatics, Russia
pallada-ltd@infopro.spb.su

Abstract–Processing method of poorly formalized Realization of these techniques is impossible without


multivariable arrays of biomedical information, based on introduction of modern informational systems supporting the
evolutional method for solving of extreme tasks of multivariable process of solution-making in the ordinary medical practice.
function, is presented in the article. Method allows predicting The utilization of such systems in the work of a modern
treatment results take account of biomedical and social features
of the patients. Method allows selecting the weights of input
doctor will improve the quality of the medical services, make
parameters without preliminary reduction of the easier the work of the medical staff, improve the quality of
multidimensional feature space which eliminates the loss of the patients’ lives as well as substantially reduce the costs for
important information and to identify weak links in these disease treatment and prevention.
information arrays. Results of numerical experiments which
have shown high efficiency of method are presented. Value of II. TASK DESCRIPTION
the mean average prognostication error amounted to 10-17%.
Developed method can be used in various subject areas in which The present article is focused on the task of forecasting the
information about the objects kept in the large volume data sets, treatment results with involvement of the set treatment
are described in the protocols of "input-output", and for them technique by the example of the chronic skin disease
the hypothesis of monotony of the decision-making in the local psoriasis. Initial information about the patients is presented in
area is valid.
the form of the numerical tables “object – property” with the
Keywords: data processing, evolutionary method, biomedical description of the input and output parameters (peculiarities,
information, forecasting, program complex, support decision characteristics) of the patients. The input parameters include
making. individual data concerning each patient: anamnesis,
associated diseases, clinical and functional, metabolic and
I. INTRODUCTION immunological indices, treatment technique. The output
(target) parameters are duration of the patient’s stay in the
Growth of the requirements to the quality of life,
hospital (number of patient days), duration of the treatment
appearance of the new diagnostic and medical technologies
till the improvement of the patient’s state (treatment
led to sharp increase in the prices for medical services. This
efficiency), duration of the remission period, presence (or
dramatically aggravated the problem of optimizing the costs
absence) of the typical residual lesions on the skin, number of
for disease treatment and prevention both for individual
the disease exacerbation per year. Input parameters to a
patients and for medical organizations of different levels. The
different extent influence the output ones, but which of them
solution to this problem can be achieved only by utilizing
exert more significant influence on the target parameters and
modern techniques of optimization and forecasting the results
by which model the relation of their influence is described, is
of each treatment with account for medical and biological as
still unknown.
well as social characteristics of every single patient.
In a general case the initial information about the objects is
The development of computer and information
presented as a matrix
Z  Z1 , Z 2 ,, Z i ,, Z N 
technologies allows at present stage switching to solving the
tasks of medical forecasting by means of involvement of the
intellectual data analysis techniques [1-3]. The particularities i  i1 i2 ij iM 
where Z  z , z ,  , z ,  , z  is a vector of parameters
of the real medical and biological data are high dimensions
and diversity of data types, a large number of “noise” and of the i -th object. Each zij parameter takes a value from the
duplicated characteristics, omitted and abnormal values. In set of permissible values. The totality of objects parameters is
this situation the techniques based on the evolutionary divided into input V  v1 , v2 ,  , vt  and output
approach and unlike the traditional techniques of optimal Y   y1 , y2 ,  , y s  parameters. Input parameters are diverse;
solution seeking aimed at achievement of the best (the most
they are measured on quantitative and qualitative scales.
Denote by X  x1 , x2,  , xm  parameters such that are
acceptable) solution in comparison with those earlier
obtained or set as an initial value become more efficient.
measured in quantitative scales. Denote by

  9
Computer Science Section
 

U  u1 , u 2 ,  , u h  parameters such that are measured in  


method. Denote by W  w1 , w2 ,  , w j ,  , wm weight vector
qualitative (nominal and serial) scales. Vector output where w j ,  j  1,2,  , m  – weight factors of input
parameter Y is measured in a quantitative scale.
It is required to forecast with the acceptable accuracy the parameters.
values of the unknown output parameters of a new object Each object can be presented as a vector of
multidimensional space of quantitative parameters
according to its known input ones.
This problem is badly formalized task because all the 
Oi  x1 , x2 ,  , x j ,  , xm , y 
information about the objects is represented only by the set of where xi is input parameters of the object, y is output
parameters that you cannot say with any certainty that they
(targeted) parameter of the object, p  m  1 is total number
are complete, non-contradictory and non-distorted. With
these initial data we will use the model of “black box” and of multidimensional space parameters. In this case the task of
when developing the algorithms for data analysis we will use determining the unknown parameter by known input
only the arrays of precedents and the hypothesis about the parameters is reduced to the interpolation function y  f  X 
monotony of the solution space that is “similar input given at the nodes p –dimensional irregular grid.
situations lead to the similar output responses of the system”. Since the level of smoothness of f  X  is unknown in
III. EVOLUTIONARY PROCESSING METHOD order to interpolate it within the whole area of determination
it is suggested to use a function of the type
Solution of the projection task using the proposed method f  X   y r d  X , W 
consists of several stages: where d is measure of proximity between objects.
 preliminary processing of data, “Weighed” Euclidean distance is considered as the measure
 selection of weight parameters in the learning process, of proximity between the objects i and l
 prediction of targeted parameters.
 w j x ji  x jl  ,0  w j  1
m
Preliminary processing stage includes: structuring of data, d il  2
(1)
j 1
revealing and elimination of anomalous and missed values,
coding and normalizing of data measured in continuous Weight factors are selected using the Monte-Carlo method.
scales. Parameters measured in discrete scales and possessing In order to ensure the required accuracy of projected
more than two gradations are transformed into a set of binary parameter computation let us introduce a criterion which
values. minimizes the average absolute forecast error
 
Let us introduce vector G  g1 , g 2 ,  , g j ,  , g K where
Qw 
1 N0
 yi  yri d   min (2)
g j  j  1,2,  , K  represents binary features of objects. At N 0 i 1

the stage of preliminary processing the whole set of the where y  yr d  is the difference between the observed and
objects is divided into subsets (samples) according to the g j calculated values of the output parameter, N 0 is analyzed
sample size.
values. Total number of these samples will be C K where K
If necessary predict multiple output parameters significant
coefficients b j ,  j  1,2,  , s  is given a priori for each
– the number of binary values, C – the number of variants
(alternatives) of object group by each binary feature g j . The
predicted parameter. Values of coefficients b j are selected
following variants of objects groups are possible:
– objects are included in the selection irrespectively of the from the interval 0,1 and the normalizing condition should
value g j , be observed for them
s
– in the selection there are objects for which g j  0 , bj  1
j 1
– in the selection there are objects for which g j  1 .
where  s is number of predicted parameters. Then the
The same object might be included in several samples criterion (2) can be written as
which contain various numbers of objects. Further in only the
1 N0 s  j  j
informatively valuable selections are used in which the Qw    b j yi  yri d   min (3)
number of objects is significantly more than the number of N 0 i 1 j 1
quantitative input parameters. In order to determine the estimated values of yri let us
At the next stage (training period) weights of input reduce the task of multivariate interpolation grid function
parameters X are defined for each informatively valuable y  f  X  to one-dimensional task of extrapolation function
selection. Determination of weight factors is based on the
evolutional approach to the resolution of extreme y ri d  , i  1,2,  , N 0  . For this purpose according to the
multivariable function problems and the random search formula (1) the distances are determined between each point

  10
Journal of Applied Computer Science & Mathematics, no. 15 (7) /2013, Suceava

i of the spatial grid and other points where the values of environment. A modular-type object-based approach is used
function y are determined. Then the distances are ranked in in the development of the software system; it allows creating
the increasing order. Denote by easily modifiable applied software packages. The software
 
Di  d i1 , d i 2 ,  , d il ,  d i  N 0 1 ranked distance vector. system consists of a database, a package of software modules
and user interface. Access to the software system is set
Then, having an array of number pairs d k , yk  according to the user-defined role. The database is
k  1,2,  , N 0  1 the problem of extrapolation of discrete represented by an array of medical-biological information
dependence y d k  is solved by the continuous function
about patients, medical treatment methods and results. A
relation model is used for structuring of information. It
yr d  . In the course of creation of the approximating allows naturally displaying the data in a table of “object-
function yr d  only n -nearest points are used. In a general property” type. Data arrays are stored in electronic tables of
case the value is determined in the process of preliminary Excel format.
Data exchange between the electronic table and the
computation experiment. As a model for approximation a
software module is performed using the process automation
quadratic polynomial is used
mechanism OLE.
y r d    ai d i .
2
The software package includes:
i 0  initial data preliminary processing module,
Factors ai are determined based on the conditions of  learning module which provides the calculation of weights
functional minimization of input parameters,
 forecasting module of a new patient output parameters
E    y k  y r d k , ai 2  min .
n
based on his/her input parameters.
k 1
Software modules are implemented in the environment of
Iterative refinement criterion Qw calculated according to object-oriented programming language C++Builder.
the formula (2) or (3), continues as long as: User interface of the software system ensures the input of
 number of iterations in the course of which the solution initial data and the presentation of the calculation results.
does not improve will exceed the predetermined value, Interface has a user-friendly display which is familiar to the
 calculated value of the average absolute projected error users. Object-oriented approach to the interface structure and
falls below the a priori determined value of the acceptable the use of graphic components included in the libraries of
error, Windows and C++Builder operating systems allows
 maximum time of computation is exceeded. promptly modifying the interface according to the
The evolutionary computation process maybe stopped and requirements of the users.
resumed at any time. Protected hierarchical access to the information databases
The next stage of task solution is the use of the obtained and software modules of the following categories of users is
results in the process of learning to predict the unknown provided: doctor, administrator, developer. Doctor enters and
target parameters of the new object. To do this, first, identify edits initial data of the patients, selects the method of
those informative sampling which gets a new object treatment from regulated list and obtains estimated result of
considering its qualitative characteristics. The sample in the new patient forecasted parameters. It is possible to review
which the prediction error has the smallest value is used for in the interactive mode a table with the description of input
further analysis. Computation of each targeted parameter of and output parameters of patients who completed the
the new objet is reduced to the of extrapolation function treatment and have parameters similar to the ones of the new
y r d  in the proximity of this object grid point. patients.
Administrator updates and maintains databases, performs
After the output parameters of the new object become
the estimation procedures for the selection of weight factors
known, the object contributes to the training samples, and the
for various combinations of qualitative values of medical-
weight factors are refined according to the above method.
biological data.
Thus, the forecasting of target parameters is not a one-time
Developer has a full access to the software system and an
operation but a process in the course of which the initial data
opportunity to modify the software code.
are continuously collected, refined and consolidated,
refinement of the weight factors and verification of the results V. NUMERICAL EXPERIMENT
are performed.
In order to estimate the efficiency of the developed
IV. SOFTWARE SYSTEM
forecasting method a numerical experiment was conducted
The developed method of forecasting was implemented in with using of real biomedical data of patients with psoriasis.
the software system to support making medical decisions [4]. These data were obtained in medical facilities located in St.
Software system is intended to function in Windows Petersburg. Described above software system to support the
medical decision-making was used to conduct numerical

  11
Computer Science Section
 

experiments. The data on 308 patients were included in the Reliability of obtained results is confirmed by the
initial sample based on which the research was conducted. calculations based on the control sample.
Out of them 45 records were selected at random, they were in
the control sample, and the remaining 263 patients were VI. CONCLUSION
included in learning sample. The total number of numerical Suggested prognostication method may be used in various
parameters which characterize each patient is 44. Out of them application environments where application data are
39 are input parameters, 5 – output parameters. The summarized in the information arrays of large volume,
forecasting task is solved separately for each output defined in “input-output” protocols, and the hypothesis of the
parameter. Summarized results of calculations aimed at the monotony decision holds for them in the local area. The
assessment of the target parameters prognostication are developed method of medical-biological information
shown in the Table I. processing allows selecting weight coefficients of input
Conducted calculations aimed at the assessment of the parameters without a reduction of the attribute space
target parameters prognostication demonstrated a rather high dimensionality which, in its turn, allows excluding a loss of
efficiency of the suggested method. The value of the mean notional information and revealing weak connections in the
average prognostication error amounted to 10-17%. information arrays under consideration.
TABLE I
RESULTS OF FORECAST OUTPUT PARAMETERS REFERENCES
Forecasted parameter Average forecast error
[1] Z.Michalewicz Genetic algorithms + data structures = evolution
Treatment period in programs. Berlin etc.: Springer, 1996.
0,101
hospital(number of bed-days) [2] A.A.Freitas Data Mining and Knowledge Discovery with Evolutionary
Algorithms. Berlin etc.: Springer, 2002.
Effect of treatment (the period
0,112 [3] I.H.Witten, E.Frank, M.A.Hall Data mining: Practical machine learning,
of acute stage)
tools and techniques. Elsevier, 2011.
Number of flare-ups per year 0,139 [4] I.A.Tsygankova “Program complex system of decision-making,”
Software & Systems. Moscow, pp. 155-158, N 4, 2008 [in Russian with
Degree of resolution (residual English abstract]
0,163
lesions on the skin)
Period of remission 0,167

I. Tsygankova is Senior Researcher at the St. Petersburg Institute for Informatics and Automation RAS, Laboratory of Applied Informatics,
Russia. She has obtained her Ph.D. in Computer Science and Engineering in the year 1992.Her research interests include: Data mining with
evolutionary algorithms, Software engineering, Forecasting and Classification, Machine Learning, Database systems.

  12

You might also like