Professional Documents
Culture Documents
net/publication/335503258
CITATIONS READS
0 916
4 authors, including:
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
CAD: Breast cancer imaging in mammography, DBT and MRI View project
All content following this page was uploaded by Kieran Stone on 05 September 2019.
Kieran Stone1 , Reyer Zwiggelaar1 , Phil Jones2 and Neil Mac Parthaláin1
1
Dept. Of Computer Science,
Aberystwyth University, Ceredigion, Wales.
kis12@aber.ac.uk, rzz@aber.ac.uk, ncm@aber.ac.uk
2
Hywel Dda University Health Board, Bronglais District General Hospital,
Aberystwyth, Ceredigion, Wales.
phil.jones@wales.nhs.uk
1 Introduction
Patient hospital length of stay can be defined as the number of days that an
in-patient will remain in hospital during a single admission event [4]. LoS pro-
vides an enhanced understanding of the flow of patients through hospital care
units which is a significant factor in the evaluation of operational functions in
various healthcare systems. LoS is regularly considered to be a metric which can
be applied to identify resource utilisation, cost and severity of illness(es) [5]. Pre-
vious work has sought to group patients by their respective medical condition(s).
This approach assumes that each disease, condition or procedure is associated
with a predefined, recommended LoS. However, LoS can be affected by a great
number of different factors which tend to extend the original target LoS includ-
ing (but not limited to): a patient’s level of fitness, medical complications, social
circumstances, discharge planning, treatment complexity, etc.
The work in the literature that utilises data mining approaches to perform
knowledge discovery, relies on datasets (and methodologies) that are often dis-
ease, condition or patient-group specific [6]. There is no work which utilises gen-
eral admissions data in order to gain an understanding of the more generic factors
that influence length of stay. Furthermore, very limited work focuses solely upon
sparse diagnostic information, relying instead on specialised condition-relative
data only.
In this paper, a novel approach for the prediction of LoS is investigated by
employing a limited dataset which contains only diagnostic code information,
along with generic patient information such as age, sex, and postcode (to give
an indication of geographical catchment/demographic). This data however, can
be very sparse, as some patients tend to have more recorded data in the form
of diagnostic codes. Additionally, several confounding factors can be present
which mean that vagueness and uncertainty plays a large part in determining
the outcome. The data is analysed using a selection of both traditional learning
approaches and those based upon fuzzy and fuzzy-rough sets.
The remainder of the paper is structured as follows: In section 2, a broad
overview of the current approaches to predicting LoS is presented along with an
appraisal of the current state-of-the-art. In section 3, the data is presented and
the approach to dealing with that data is discussed. In section 4, the experimental
evaluation is presented along with the results. Finally, some conclusions are
drawn and topics for further exploration are identified and discussed.
2 Background
sults. Allied to this is the need for very large amounts of subjective expert input
and data in order to correctly ‘tune’ the approach such that the output or predic-
tion is useful. As a consequence of these factors, the process of actually training
such approaches, means that these methods are also very resource and effort in-
tensive [8]. More specifically, compartmental modelling is a well established and
mathematically sound methodology, however it is based on a single day census of
beds and as such can be highly dependent on the day the census was carried out.
These models also do not allow for continuous time frames [9]. Phase-type distri-
butions provide detailed insights into the causality of LoS but make use of more
complex methodologies than traditional methods and thus require considerably
more effort to ensure that model output is clinically meaningful [10].
Another set of approaches to predicting LoS are those based upon statistical
and arithmetic foundations. Such approaches include: [11]. These have proven to
be useful for estimating LoS, but focus on the problem of average LoS, which is
a much less informative measure than the more general problem. Such methods
tend to be easy to use and are rooted in statistical theory and are therefore easy
to reproduce. However, they also suffer from a number of weaknesses; many of
the approaches adopt foundations that are not transparent to human scrutiny
e.g. Principal Component Analysis (PCA). In addition, many of the methods do
not account for the inherent complexity and uncertainty that is typically found
in LoS data [7].
More recently, machine learning and data mining approaches have been ap-
plied to the LoS problem in an attempt to improve modelling, but also to aid in
dealing with more complex data that is associated with LoS prediction. Much
of the work in this area produces very accurate models with performance in the
high 80% - low 90% [12] but using sub-symbolic learning methods. Transparency
is central to good clinical practice and clinicians may be reluctant to accept the
output of such methods when there is no explicit explanation for the derivation
of the results and rules [13]. The work in this paper is based solely upon methods
that are transparent to human scrutiny.
3 Data
The data for this work is drawn from the Hywel Dda Health Board, and collected
at Bronglais General Hospital, Aberystwyth, Ceredigion, Wales between March
2016 and March 2017. It consists of data relating to 6,543 admissions that were
admitted via the Accident and Emergency department. It is balanced in terms
of the number of male vs. female patients. The data was anonymised to remove
any personally identifiable attributes. The general process of data collection and
patient admission is shown in Figure 1.
The data has 16 conditional features and these include attributes such as:
a patient’s age and sex, the first four letters of their postcode, their primary
diagnosis, followed by up to 12 secondary diagnoses, (depending on which condi-
tions are diagnosed for the patient). There are no missing values in the dataset.
However, 12 data objects which did not have an associated discharge date were
4 Kieran Stone, Reyer Zwiggelaar, Phil Jones and Neil Mac Parthaláin
removed along with records which did not have any diagnoses, leaving 6,531
records. A summary of the data characteristics can be found in Table 1 and the
numbers of patients and days stayed are shown in Figure 2.
The diagnoses features are recorded using the ICD-10 classification (Interna-
tional Statistical Classification of Diseases and Related Health Problems). This
is a World Health Organisation (WHO) system of comprehensive medical classi-
fication. It contains codes for: diseases, signs and symptoms, abnormal findings,
complaints, external causes of injury or diseases and even social circumstances.
For the data under consideration, the features representing each of the ICD-10
diagnostic codes has a potential domain of up to 70,000 different values. How-
ever, the actual number of unique ICD-10 codes for the data used in this work
is 1,393. These individual diagnostic codes are then grouped by ICD block e.g.
I25.5 - Ischaemic cardiomyopathy and I51.7 - Cardiomegaly are drawn from the
same block and so are placed in the same group relating to heart conditions in
the data: I00-99. This reduces the possible number of feature values for each
of the diagnoses to a maximum of 22. The rationale for performing this step is
that feature values may become too sparse for some individual ICD-10 codes.
In addition, from a clinical standpoint, many closely related codes are often
overlapping (or are neighbouring) or may only differ slightly depending on the
examining clinician’s interpretation - see discussion. In terms of data acquisition,
the data is coded at the time of discharge by the information department. The
data is thus a combination of electronically entered data from time of admission
and time of discharge as well as data entered by experienced clinical coders who
Predicting Hospital Length of Stay for Accident and Emergency Admissions 5
review the notes and enter ICD-10 codes for primary diagnosis (i.e reason for
admission) and secondary diagnoses. Demographic data is stored digitally and
is updated as and when required (e.g address, registered GP).
It is important to note that quite often patients admitted with the same
primary diagnosis can have differing LoS particularly if any given patient is
discharged on the same day as admission. The number and types of diagnoses
therefore are not related to length of stay. Indeed, it is possible to have a high
number of secondary diagnostic codes attached to a patient who only had a short
stay. The converse is also true; a patient may have a long LoS but only a single
diagnostic code recorded in the data.
4 Application
In this section, the methodology that is used to explore the data is presented. The
first task was to use the data to determine what the predictive accuracy might be
when framed as a regression problem. This provides some important indications
as to the ability to model LoS in this way. However, from a hospital resource
management standpoint, this may not be as useful as determining lengths of stay
that are longer than a single day - since each 24 hour stay (or part thereof) has
an associated cost. With this in mind the problem is re-framed as a classification
problem. The different experimental configurations for these two problems are
described in the following section.
Fig. 3. Experimentation
Predicting Hospital Length of Stay for Accident and Emergency Admissions 7
patient LoS in terms of short stays and longer stays where the required level of
care and cost associated with LoS is clinically significant.
Given that it is more clinically meaningful to reliably estimate 0-1 days stayed
versus longer periods, the problem was re-framed as a classification task. The
learners in Section 4.1 were once again employed using the same experimental
setup. 10 × 10 fold cross validation was used to classify short stay or long stay
as a binary classification problem. This had the effect of a small imbalance in
the decision class - 59% long stay : 41% short stay. The classification results are
shown in Table 3.
Learner Short Stay Accy.(%) Long Stay Accy.(%) Overall Accy.(%) (s.d.) ROC
ZeroR 0 100 59.07(0.06) 0.49
FNN(1) 70.52 45.56 55.78(1.87)* 0.58
FNN(3) 74.44 40.61 54.46(1.76)* 0.57
FNN(5) 75.83 38.54 53.8 (1.88)* 0.57
FRNN 65.13 48.18 55.12(1.89)* 0.61
VQNN(1) 70.44 45.74 55.85(1.89)* 0.58
VQNN(3) 74.37 39.39 53.71(1.77)* 0.62
VQNN(5) 76.35 37.24 53.25(1.64)* 0.64
OWANN(1) 70.44 45.74 55.85(1.89)* 0.58
OWANN(3) 74.37 39.39 53.71(1.77)* 0.64
OWANN(5) 76.35 37.24 53.25(1.64)* 0.65
QSBA 47.10 91.49 73.32(1.52)v 0.73
FURIA 59.29 86.00 75.07(1.88)v 0.73
J48 60.19 86.44 75.70(1.62)v 0.77
JRip 58.99 86.80 75.42(1.66)v 0.74
PART 45.30 89.76 71.56(1.65)v 0.76
NaiveBayes 65.91 79.62 74.01(1.72)v 0.81
* indicates statistically worse than ZeroR, ’v’ indicates statistically better
Predicting Hospital Length of Stay for Accident and Emergency Admissions 9
As can be seen in Table 3 the ZeroR metric simply returns the long stay class
for all instances as it is the majority class: 59.07%. These results demonstrate
that all of the fuzzy/fuzzy-rough NN algorithms perform statistically worse than
the ZeroR benchmark indicating that they are only comparable with chance. This
is possibly due to the crisp or discrete-valued nature of the data. Increasing the
number of nearest-neighbours appears only to decrease the performance of each
learner further. Of the fuzzy/fuzzy-rough approaches, only QSBA and FURIA
were shown to statistically outperform the baseline performance with accura-
cies of 73.32% and 75.07% respectively. The highest accuracies were achieved
using J48, JRIP, and NaiveBayes. FURIA and QSBA offer at least comparable
performance with the other standard rule based algorithms without any param-
eter tuning. Note that there is no statistical significance between the results
of these five learners. NaiveBayes, QSBA, FURIA, PART, JRip and PART all
yield the highest values for ROC with scores over 0.7. It is interesting to note
that the nearest-neighbour learners perform well for short stay and poorly for
long stay however, the converse is true with QSBA, FURIA and the other tra-
ditional learners with each achieving >79%. This could be due to the tendency
of NN algorithms to overfit the minority class because they only consider local
neighbourhoods. The overall poor performance of the NN learners may be due
to the discrete nature of the data itself and fuzzifying either the domain of the
decision feature or indeed the conditional features may help in this respect - see
conclusion.
Feature selection and ranking The results of the feature analysis are shown
in Table 4. The second column of this table is the frequency of occurrence of that
feature in the selected subset for each of the 10 randomisations of the data across
the 10 folds of classification; totalling 100 results. The third column shows the av-
erage individual rank for each feature, again as part of a 10-fold cross-validation.
These results demonstrate that there is considerable variation between the rel-
ative importance of each of the features individually and in combination with
one another. It is clear for FRFS that eight of the features always appear in the
selected subset; age at admission, four characters from postcode, sex, primary
diagnosis and secondary diagnosis 1 - 4. Quite often, the contribution of indi-
vidual features in a dataset in isolation is not very useful, as it doesn’t model
the dependencies between attributes [21]. This means that several low ranked
features when combined as a subset could have higher dependency than higher
ranked (but somewhat redundant) features. This is reflected when observing sec-
ondary diagnosis 8-12 as these never appear in any of the FRFS selected feature
subsets. However, secondary diagnosis 11 & 12 were ranked 6.36 and 6.3 respec-
tively when considered in isolation by FRFR. Interestingly, age at admission had
an average rank of 11.17 out of 16 using FRFR, despite appearing in 100% of
the FRFS subsets. Also, the average ranks for each of the features revealed the
four character postcode to be the most important feature along with primary
diagnosis, Secondary diagnoses 1 & 2 and Sex as the next most important.
10 Kieran Stone, Reyer Zwiggelaar, Phil Jones and Neil Mac Parthaláin
Attribute Selected Classifiers Using the best performing learners from Table
3, a feature selection preprocessing step was carried out. The results are shown
in Table 5, which offers a comparison of classification accuracy for FRFS and
FRFR. The overall performance of each of the learners is comparable to the per-
formance shown in Table 3 with J48 and JRip both scoring slightly higher than
before with accuracies of 76.39% and 75.74% respectively. Notably for FRFS,
regardless of the number of features that have been selected the performance
of the learners is almost identical to that of the unreduced dataset. This is an
indication that there is some redundancy in the features as demonstrated in
Table 4. Note that the top 11 ranked features are used here for FRFR to build
classification models as this is the subset size that is typically returned by FRFS.
As a result, a notable decrease in performance in terms of FRFR is evident for
all of the learners.
4.2 Discussion
The primary aim of this work is to provide a reliable estimation of LoS. When
this was framed as a regression problem, the results were poor regardless of the
learner employed. By changing the task to that of classification, where the two
classes represent a more meaningful outcome (from a clinical and administrative
standpoint) as short stay and long stay, it is possible to achieve ∼ 76.4% correct
classification accuracy. It is interesting to note that there are very small differ-
ences between the standard deviations in Table 3 for all of the best performing
learners, possibly indicating that there is not much more to be discovered from
this data without further expert input. To provide a more accurate and clini-
cally meaningful prediction of LoS, clearly a much more targeted approach is
required which would encompass a patient’s social factors, demographics and
medical history etc.
The investigation of the features and their contribution to the classification
outcomes provides some useful insights. It is clear that regardless of the ap-
proach involved that the first four secondary diagnosis features are important
in determining LoS. The feature four character postcode also has the highest
rank for FRFR indicating that geographical addresses play an important part
in determining length of stay.
One of the potential limitations of this work is the use of ICD-10 codes in
the data. These diagnostic codes represent diseases, signs and symptoms as well
as other abnormal findings. Whilst they are used on a daily basis in a health
care setting, in the context of LoS, the large groupings may not be useful in
their current format. The reason for this is that codes can sometimes encompass
disparate sets of disease classifications. This is an area that requires further
investigation and expert input in order to address these shortcomings.
5 Conclusion
Determining LoS with high accuracy can be difficult as the task is confounded
by multiple competing factors such as patient care and characteristics, social
factors, and morbidity. All of these have the potential to extend patient LoS.
Nevertheless, in this work the best performing learners were able to predict short
stays and long stays with an average accuracy of 75%. This is clinically significant
as there is a considerable difference in cost and care associated hospital LoS. The
ability to predict stay for a longer or shorter periods enables hospital managers to
allocate resources, improve patient care and deliver increased hospital function.
This work determined that the first four characters of postcode was the most
useful factor in classifying LoS. This is particularly important given that the data
was collected from a hospital in a rural setting, which is significantly larger and
more demographically varied than that of urban hospitals. It is also apparent
from the feature analysis that the increasing number of secondary diagnoses that
exist within the data become redundant regardless of LoS.
The novel approach to assessing LoS presented uses only generic patient di-
agnosis data. Although this is a preliminary study, it has highlighted the fact
12 Kieran Stone, Reyer Zwiggelaar, Phil Jones and Neil Mac Parthaláin
that that there is limited knowledge that can be discovered from the data. In
this work, patient diagnoses were grouped according to their ICD-10 classifica-
tions which may not be ideal - further expert input would help to improve this.
However, further division into more granular or more related classifications may
provide some extra leverage. As an alternative to this, the modelling could be im-
proved by fuzzifying the domain of the decision feature with clinically informed
categories of stay such as: very short, short, medium, long, etc.
6 Acknowledgements
Kieran Stone would like to acknowledge the financial support for this research
through Knowledge Economy Skills Scholarship (KESS 2). It is part funded by
the Welsh Government’s European Social Fund (ESF) convergence programme
for West Wales and the Valleys. WEFO (Welsh European Funding Office) con-
tract number: C80815.
References
1. Garg, L., McClean, S.I., Barton, M., Meenan, B.J., Fullerton, K., 2012. In-
telligent Patient Management and Resource Planning for Complex, Hetero-
geneous, and Stochastic Healthcare Systems. IEEE Transactions on Systems,
Man, and Cybernetics - Part A: Systems and Humans 42, 1332âĂŞ1345.
https://doi.org/10.1109/TSMCA.2012.2210211
2. Kelly, M., Sharp, L., Dwane, F., Kelleher, T. and Comber, H., 2012. Factors predict-
ing hospital length-of-stay and readmission after colorectal resection: a population-
based study of elective and emergency admissions. BMC health services research,
12(1), p.77.
3. Harper, P.R. and Shahani, A.K., 2002. Modelling for the planning and management
of bed capacities in hospitals. Journal of the Operational research Society, 53(1),
pp.11-18.
4. Huntley, D.A., Cho, D.W., Christman, J. and Csernansky, J.G., 1998. Predicting
length of stay in an acute psychiatric hospital. Psychiatric Services, 49(8), pp.1049-
1053
5. Chang, K.C., Tseng, M.C., Weng, H.H., Lin, Y.H., Liou, C.W. and Tan, T.Y., 2002.
Prediction of length of stay of first-ever ischemic stroke. Stroke, 33(11), pp.2670-
2674.
6. Awad, A., Bader-El-Den, M. and McNicholas, J., 2017. Patient length of stay and
mortality prediction: A survey. Health services management research, 30(2), pp.105-
120.
7. Marshall, A., Vasilakis, C. and El-Darzi, E., 2005. Length of stay-based patient
flow models: recent developments and future directions. Health Care Management
Science, 8(3), pp.213-220.
8. Altinel, I.K. and Ulaŧ, E., 1996. Simulation modeling for emergency bed require-
ment planning. Annals of Operations Research, 67(1), pp.183-210.
9. Vasilakis, C. and Marshall, A.H., 2005. Modelling nationwide hospital length of
stay: opening the black box. Journal of the Operational Research Society, 56(7),
pp.862-869.
Predicting Hospital Length of Stay for Accident and Emergency Admissions 13
10. Faddy, M.J. and McClean, S.I., 1999. Analysing data on lengths of stay of hospital
patients using phaseâĂŘtype distributions. Applied Stochastic Models in Business
and Industry, 15(4), pp.311-317.
11. Wey, S.B., Mori, M., Pfaller, M.A., Woolson, R.F. and Wenzel, R.P., 1988. Hospital-
acquired candidemia: the attributable mortality and excess length of stay. Archives
of internal medicine, 148(12), pp.2642-2645.
12. Hachesu, P.R., Ahmadi, M., Alizadeh, S. and Sadoughi, F., 2013. Use of data min-
ing techniques to determine and predict length of stay of cardiac patients. Healthcare
informatics research, 19(2), pp.121-129.
13. Tu, J.V., 1996. Advantages and disadvantages of using artificial neural networks
versus logistic regression for predicting medical outcomes. Journal of clinical epi-
demiology, 49(11), pp.1225-1231.
14. Quinlan, J.R., 1993. C4. 5: Programs for Machine Learning. Morgan Kaufmann.
15. Cohen, W.W., 1995. Fast effective rule induction. In Machine Learning Proceedings
1995 (pp. 115-123). Morgan Kaufmann.
16. Frank, E. and Witten, I.H., 1998. Generating accurate rule sets without global
optimization.
17. Keller, J.M., Gray, M.R. and Givens, J.A., 1985. A fuzzy k-nearest neighbor algo-
rithm. IEEE transactions on systems, man, and cybernetics, (4), pp.580-585.
18. Hühn, J. and Hüllermeier, E., 2009. FURIA: an algorithm for unordered fuzzy rule
induction. Data Mining and Knowledge Discovery, 19(3), pp.293-319.
19. Rasmani, K.A. and Shen, Q., 2004, July. Modifying weighted fuzzy subsethood-
based rule models with fuzzy quantifiers. In 2004 IEEE International Conference on
Fuzzy Systems (IEEE Cat. No. 04CH37542) (Vol. 3, pp. 1679-1684). IEEE.
20. Cornelis, C., Verbiest, N. and Jensen, R., 2010, October. Ordered weighted average
based fuzzy rough sets. In International Conference on Rough Sets and Knowledge
Technology (pp. 78-85). Springer, Berlin, Heidelberg.
21. Jensen, R. and Shen, Q., 2009. New approaches to fuzzy-rough feature selection.
IEEE Transactions on fuzzy systems, 17(4), pp.824-838.