Professional Documents
Culture Documents
9
Computer Science Section
the stage of preliminary processing the whole set of the where y yr d is the difference between the observed and
objects is divided into subsets (samples) according to the g j calculated values of the output parameter, N 0 is analyzed
sample size.
values. Total number of these samples will be C K where K
If necessary predict multiple output parameters significant
coefficients b j , j 1,2, , s is given a priori for each
– the number of binary values, C – the number of variants
(alternatives) of object group by each binary feature g j . The
predicted parameter. Values of coefficients b j are selected
following variants of objects groups are possible:
– objects are included in the selection irrespectively of the from the interval 0,1 and the normalizing condition should
value g j , be observed for them
s
– in the selection there are objects for which g j 0 , bj 1
j 1
– in the selection there are objects for which g j 1 .
where s is number of predicted parameters. Then the
The same object might be included in several samples criterion (2) can be written as
which contain various numbers of objects. Further in only the
1 N0 s j j
informatively valuable selections are used in which the Qw b j yi yri d min (3)
number of objects is significantly more than the number of N 0 i 1 j 1
quantitative input parameters. In order to determine the estimated values of yri let us
At the next stage (training period) weights of input reduce the task of multivariate interpolation grid function
parameters X are defined for each informatively valuable y f X to one-dimensional task of extrapolation function
selection. Determination of weight factors is based on the
evolutional approach to the resolution of extreme y ri d , i 1,2, , N 0 . For this purpose according to the
multivariable function problems and the random search formula (1) the distances are determined between each point
10
Journal of Applied Computer Science & Mathematics, no. 15 (7) /2013, Suceava
i of the spatial grid and other points where the values of environment. A modular-type object-based approach is used
function y are determined. Then the distances are ranked in in the development of the software system; it allows creating
the increasing order. Denote by easily modifiable applied software packages. The software
Di d i1 , d i 2 , , d il , d i N 0 1 ranked distance vector. system consists of a database, a package of software modules
and user interface. Access to the software system is set
Then, having an array of number pairs d k , yk according to the user-defined role. The database is
k 1,2, , N 0 1 the problem of extrapolation of discrete represented by an array of medical-biological information
dependence y d k is solved by the continuous function
about patients, medical treatment methods and results. A
relation model is used for structuring of information. It
yr d . In the course of creation of the approximating allows naturally displaying the data in a table of “object-
function yr d only n -nearest points are used. In a general property” type. Data arrays are stored in electronic tables of
case the value is determined in the process of preliminary Excel format.
Data exchange between the electronic table and the
computation experiment. As a model for approximation a
software module is performed using the process automation
quadratic polynomial is used
mechanism OLE.
y r d ai d i .
2
The software package includes:
i 0 initial data preliminary processing module,
Factors ai are determined based on the conditions of learning module which provides the calculation of weights
functional minimization of input parameters,
forecasting module of a new patient output parameters
E y k y r d k , ai 2 min .
n
based on his/her input parameters.
k 1
Software modules are implemented in the environment of
Iterative refinement criterion Qw calculated according to object-oriented programming language C++Builder.
the formula (2) or (3), continues as long as: User interface of the software system ensures the input of
number of iterations in the course of which the solution initial data and the presentation of the calculation results.
does not improve will exceed the predetermined value, Interface has a user-friendly display which is familiar to the
calculated value of the average absolute projected error users. Object-oriented approach to the interface structure and
falls below the a priori determined value of the acceptable the use of graphic components included in the libraries of
error, Windows and C++Builder operating systems allows
maximum time of computation is exceeded. promptly modifying the interface according to the
The evolutionary computation process maybe stopped and requirements of the users.
resumed at any time. Protected hierarchical access to the information databases
The next stage of task solution is the use of the obtained and software modules of the following categories of users is
results in the process of learning to predict the unknown provided: doctor, administrator, developer. Doctor enters and
target parameters of the new object. To do this, first, identify edits initial data of the patients, selects the method of
those informative sampling which gets a new object treatment from regulated list and obtains estimated result of
considering its qualitative characteristics. The sample in the new patient forecasted parameters. It is possible to review
which the prediction error has the smallest value is used for in the interactive mode a table with the description of input
further analysis. Computation of each targeted parameter of and output parameters of patients who completed the
the new objet is reduced to the of extrapolation function treatment and have parameters similar to the ones of the new
y r d in the proximity of this object grid point. patients.
Administrator updates and maintains databases, performs
After the output parameters of the new object become
the estimation procedures for the selection of weight factors
known, the object contributes to the training samples, and the
for various combinations of qualitative values of medical-
weight factors are refined according to the above method.
biological data.
Thus, the forecasting of target parameters is not a one-time
Developer has a full access to the software system and an
operation but a process in the course of which the initial data
opportunity to modify the software code.
are continuously collected, refined and consolidated,
refinement of the weight factors and verification of the results V. NUMERICAL EXPERIMENT
are performed.
In order to estimate the efficiency of the developed
IV. SOFTWARE SYSTEM
forecasting method a numerical experiment was conducted
The developed method of forecasting was implemented in with using of real biomedical data of patients with psoriasis.
the software system to support making medical decisions [4]. These data were obtained in medical facilities located in St.
Software system is intended to function in Windows Petersburg. Described above software system to support the
medical decision-making was used to conduct numerical
11
Computer Science Section
experiments. The data on 308 patients were included in the Reliability of obtained results is confirmed by the
initial sample based on which the research was conducted. calculations based on the control sample.
Out of them 45 records were selected at random, they were in
the control sample, and the remaining 263 patients were VI. CONCLUSION
included in learning sample. The total number of numerical Suggested prognostication method may be used in various
parameters which characterize each patient is 44. Out of them application environments where application data are
39 are input parameters, 5 – output parameters. The summarized in the information arrays of large volume,
forecasting task is solved separately for each output defined in “input-output” protocols, and the hypothesis of the
parameter. Summarized results of calculations aimed at the monotony decision holds for them in the local area. The
assessment of the target parameters prognostication are developed method of medical-biological information
shown in the Table I. processing allows selecting weight coefficients of input
Conducted calculations aimed at the assessment of the parameters without a reduction of the attribute space
target parameters prognostication demonstrated a rather high dimensionality which, in its turn, allows excluding a loss of
efficiency of the suggested method. The value of the mean notional information and revealing weak connections in the
average prognostication error amounted to 10-17%. information arrays under consideration.
TABLE I
RESULTS OF FORECAST OUTPUT PARAMETERS REFERENCES
Forecasted parameter Average forecast error
[1] Z.Michalewicz Genetic algorithms + data structures = evolution
Treatment period in programs. Berlin etc.: Springer, 1996.
0,101
hospital(number of bed-days) [2] A.A.Freitas Data Mining and Knowledge Discovery with Evolutionary
Algorithms. Berlin etc.: Springer, 2002.
Effect of treatment (the period
0,112 [3] I.H.Witten, E.Frank, M.A.Hall Data mining: Practical machine learning,
of acute stage)
tools and techniques. Elsevier, 2011.
Number of flare-ups per year 0,139 [4] I.A.Tsygankova “Program complex system of decision-making,”
Software & Systems. Moscow, pp. 155-158, N 4, 2008 [in Russian with
Degree of resolution (residual English abstract]
0,163
lesions on the skin)
Period of remission 0,167
I. Tsygankova is Senior Researcher at the St. Petersburg Institute for Informatics and Automation RAS, Laboratory of Applied Informatics,
Russia. She has obtained her Ph.D. in Computer Science and Engineering in the year 1992.Her research interests include: Data mining with
evolutionary algorithms, Software engineering, Forecasting and Classification, Machine Learning, Database systems.
12