You are on page 1of 4

32nd Annual International Conference of the IEEE EMBS

Buenos Aires, Argentina, August 31 - September 4, 2010

An Open Data Mining Framework for the Analysis of Medical


Images: Application on Obstructive Nephropathy Microscopy
Images
Charalampos Doukas, student Member, IEEE, Theodosis Goudas, Simon Fischer, Ingo Mierswa,
Aristotle Chatziioannou and Ilias Maglogiannis, Member, IEEE

Abstract—This paper presents an open image-mining data mining of biomedical image data. The tools
framework that provides access to tools and methods for the implemented as Web Services can be directly integrated
characterization of medical images. Several image processing workflow management platforms (e.g., TAVERNA [3]),
and feature extraction operators have been implemented and allowing their integration in several workflows
exposed through Web Services. Rapid-Miner, an open source
corresponding to different image processing pipelines.
data mining system has been utilized for applying classification
operators and creating the essential processing workflows. The Proper authentication and encryption mechanisms have been
proposed framework has been applied for the detection of utilized in order to guarantee the appropriate security. The
salient objects in Obstructive Nephropathy microscopy images. rest of the paper is organized as follows: Section 2 discusses
Initial classification results are quite promising demonstrating related work in image mining through Web Services.
the feasibility of automated characterization of kidney biopsy Section 3 presents the tools and methods that enable the
images.
functionality of the proposed platform, whereas the
architecture scheme is described in Section 4. Section 5
I. INTRODUCTION
describes the Obstructive Nephropathy images

T he kidney is a multicellular, multistructure organ that is


responsible for a part of the complex process of blood’s
purification. Kidneys are being affected by many chronic
characterization process followed by initial evaluation
results in Section 6. Finally, Section 7 concludes the paper.

diseases, like Obstructive Nephropathy [1], which is the


main cause of renal failure. It is caused by obstruction of the
urinary tract, with hydronephrosis, which is dilation of the
renal pelvis and calyses resulting from obstruction to flow of
urine. Considering that Obstructive Nephropathy is not a
rare disease ([1]), an auto detection of the pathogenic areas
on a kidney biopsy image is very useful, especially in such
cases where the discrimination of healthy versus pathogenic a) b)
Fig. 1. Samples of obstructive nephropathy images. a) healthy biopsy
biopsy samples is quite difficult and complex (see Fig. 1). sample, b) pathogenic biopsy sample.
The automated characterization of such images requires
proper preprocessing, (e.g., image enhancement, color II. RELATED WORK
processing), feature extraction and classification. Nowadays Despite the great impact of the Web Services in the
the trend in image processing and software engineering in development and deployment of web applications, the
general is towards the development of algorithms and tools exploitation of the latter in the domain of biomedical data
that provide each one of these functions individually [2], mining is still quite narrowed. Only a few systems exist in
especially in form of Web Services, a technology that the literature that utilize Web Services in order to provide
enables developers to programmatically access functionality and access to computational resources for
heterogeneous, distributed resources, providing easier processing, annotating and mining biomedical data and
integration and interoperability between data and especially medical images. The MIAKT system [4] provides
applications. This paper presents an open framework based knowledge management of data generated by screening
on Web Services that provides access to complete tools for processes and means for medical staff to investigate,
annotate, and analyze the data using web and GRID
Manuscript received April 1, 2010. This work is partly funded by the EU services. In [5], authors have proposed a model for
via the e-LICO FP7 Collaborative Project (grant agreement 231519). implementing SOA (Service Oriented Architecture) based
C. Doukas is with the University of the Aegean, Samos, Greece (e-mail:
doukas@aegean.gr). T. Goudas and I. Maglogiannis are with the University
image processing systems. The proposed architecture
of Central Greece, Lamia, Greece (e-mail: {goudas, imaglo}@ucg.gr). S. consists of a programming model, a service model and a
Fischer and I. Mierswa are with Rapid-I GmbH (e-mail: {fischer, messaging model. The authors focused on the concept of
ingo.mierswa}@rapid-i.com). A. Chatziioannou is with the National service. The INBIOMED platform [6] is a Web Services
Hellenic Research Foundation, Athens, Greece (email: achatzi@eie.gr)

978-1-4244-4124-2/10/$25.00 ©2010 IEEE 4108


oriented architecture that provides a framework for sharing loosely-coupled features allow service providers to modify
resources and medical image processing algorithms. The backend functions while maintaining the same interface to
Web Services integrated into the platform provide clients. Web Services are accessed through the
morphological operators and filters (e.g., erosion, dilation, HTTP/HTTPS protocols and utilize XML (eXtendible
opening, closing) and segmentation methods. Markup Language) for data exchange. This in turn implies
The aforementioned works focus mostly on the provision that Web Services are independent of platform,
of specific functionalities for analyzing and processing programming language, tool and network infrastructure.
medical images of a narrowed range of modalities through Services can be assembled and composed in such a way to
Web Services. To the best of our knowledge, there is no foster the reuse of existing back-end infrastructure. The WS-
work that enables the mining of biomedical images based on Security kit (Rampart) [8] has been utilized for user
a number of tools and frameworks providing complete authentication. WS-Security is a standard for adding security
functionality through a single Web Service framework. The to SOAP Web Service message exchanges. It uses a SOAP
major benefits of this approach can be summarized into the message-header element to attach the security information to
following: messages, in the form of tokens conveying different types of
 Open access to biomedical image mining claims (which can include names, identities, keys, groups,
functionality without specific requirements privileges, capabilities, etc.) along with encryption and
 Total interoperability digital-signature information. On top of the WS-Security kit,
 Provision of a complete framework. the SSL [9] protocol has been used for the proper encryption
 Single access point of the data during transmission between the service
 Image mining consistency and reliability consumer and the Web Service itself.

III. TOOLS AND METHODS C. The Rapid-Miner Classification and Workflow


This section provides more details regarding the tools and Management Tool
methods utilized for developing the presented image mining RapidMiner [13] is a flexible, modular, and extensible
framework. data mining and data processing solution. Being an open
A. Image Acquisition and Processing source project it is available to the scientific community and
has a large user base. Written entirely in Java it provides an
The following aspects of image processing are considered:
open API and can be easily integrated into any existing
 Acquisition. The framework provides support for the
application. In its standard version, RapidMiner comes with
acquisition, sampling and writing of image files,
hundreds of operators for machine learning, data
complete program control and easy integration into
manipulation, filtering, format conversion, etc. As of version
image-enabled applications that utilize databases.
5.0, the user interface and process execution engine has been
 Transformations. It includes functions for transform an
image from space to frequency dimension (Fourier, completely rewritten as a generic workflow execution
DCT) and wavelet transforms. engine. Based on this, a set of new operators that integrate
 Image Enhancement. Functionalities for image transparently with the Web Services presented in the
enhancement such as subtraction, background preceding section have been developed. Using these
correction, denoising, smoothing, spatial and median operators, it is possible to design data mining workflows that
filtering, histogram equalization etc., are provided. connect seamlessly to the image mining tools described
 Color Processing. It refers to pixel based processing that above (see Fig. 3). The image mining extension
will allow users to gather colour information (i.e. the encompasses the following (groups of) operators:
counting of the number of unique colors within an List Images: This operator simply lists all image files
image, or finding the dominant color are particularly found in a particular directory.
useful functions as it enables one analyze an image). Upload Images: The listed images are uploaded to the
 Image Analysis and Feature Extraction. The toolbox image mining server, and a reference to the images is
supports the extraction of the produced quantitative obtained and stored. Subsequently, only this reference is
features in XML files that may be directly stored into a used and no intermediate up- or download of image data is
database for further processing and exploitation (e.g., necessary.
classification using the Rapid Miner tool). Visualize Image: If desired, intermediate results can be
visualized within RapidMiner. This requires the download
B. Image Mining through Web Services & Secure Access of the processed images.
Web Services are emerging as a promising technology to Image Transformation: Using the image references,
build distributed applications. It is an implementation of various image transformation algorithms are performed. The
SOA [7] that supports the concept of loosely-coupled, open- transformed images are stored on the server.
standard, language - and platform-independent systems. The Feature Extraction: Using image references, various

4109
features features can be extracted and transformed into a The GLCM is a tabulation of how often different
tabular format which can be further used by RapidMiner for combinations of pixels brightness values (grey
data mining and processing. levels) occur in an image.
The groups of image transformation and feature i = the row number
extraction operators constitute the most relevant part. Due to j = the column number
self-description methods of the Web Service, RapidMiner is Pi,j = the element i, j of the normalized symmetrical
able to detect the set of provided algorithms automatically. GLCM
N 1
(1)
Hence, new image mining algorithms can be deployed ASM   iP 2
i, j
automatically without a need to update the RapidMiner i, j  0

components. Whereas RapidMiner is not aware of image  The Contrast, which gives a measure of how sharp
files, formats, etc., the image mining operators can be the structural variations in the image are.
n 1
combined flexibly to transform images into a tabular format C ontrast   Pi j ( i  j ) 2 (2)
well-suited for data mining and further processing. i, j0

 The Correlation, which is a measure of grey – level


IV. THE FRAMEWORK ARCHITECTURE linear dependency of the image’s segment.
n 1
(3)
The proposed framework is the based on the services- Correlation  P ij
( i   )( j   )
2
oriented architecture model as illustrated in Fig. 2. The main i, j0

component is the Image Processing Web Services Core that  The Inverse Difference Moment, which gives a
hosts all the functionality exposed to the client measure of the local homogeneity of the segment of
communication through SOAP messages and the the image.
HTTP/HTTPS protocol. Appropriate classes and functions ID M    1
1 ( i  j )2
Pi j (4)
implement the aforementioned functionality utilizing any i j

essential application programming interfaces (APIs) that  The Entropy, which is a measurement of randomness
provide access to advanced functionality (e.g., data Entropy    (5)
management, image processing, etc.) or to data repositories i
P j
ij  log( Pij )

and computational resources. The Web Services Core is


The above features have been selected as the most
hosted by an appropriate service container (i.e., usually an
appropriate in order to characterize and classify complex
application server). The latter usually resides among with
biomedical images, like the kidney biopsy images ([10],
additional resources (e.g., databases) in the framework
[12]. All processing steps are available as Web Service
container (physically can be a framework server). The
operators and can be inserted into the RapidMiner tool for
communication with the RapidMiner tool is performed
creating an automated workflow procedure.
through SOAP calls. This type of architecture is modular
and allows the easy integration of new services. B. Data Classification
The features described in previous section are obtained
through the appropriate workflow process designed and
V. SALIENT OBJECTS DETECTION IN OBSTRUCTIVE implemented in RapidMiner. Classification has been
NEPHROPATHY IMAGES performed using the k-Nearest Neighbor, the Naïve Bayes
and the Support Vector Machines (SVM) ([11]) classifiers.
A. Image Processing and Feature Extraction
Afterwards, ten-fold cross validation has been applied for
The characterization of obstructive nephropathy images evaluation purposes.
requires initially appropriate image processing and feature
extraction. Images are firstly enhanced by histogram
equalization and then converted to 8-bit. A segmentation
window of 40x40 pixels is applied for gridifying the initial
image into smaller parts. This step is performed in order to
increase the resolution of the dataset. Texture analysis
follows by calculating features like:
 The Mean, which gives the average value of the
segment’s pixels, along with the Standard
Deviation, which gives the value of the dispersion
of the values around the Mean.
 The Angular Second Moment (ASM), which gives a
measurement related to orderliness. ASM is
calculated using the Grey Level Co-occurrence Fig. 2. Illustration of the Framework Architecture
Matrix (GLCM) of the segment of the specific ROI.

4110
workflow tool. The framework enables experts to utilize
image mining techniques without any requirements for
specific image processing or data mining knowledge. Initial
evaluation results are quite promising. Future work includes
more extensive evaluation of the platform using new
datasets.

ACKNOWLEDGMENT
This work is funded by Information Society Technology
program of the European Commission “e-Laboratory for
Fig. 3. Screenshot of the RapidMiner [13] interface illustrating a workflow Interdisciplinary Collaborative Research in Data Mining and
for processing, feature extraction and classification of Obstructive
Nephropathy Images.
Data-Intensive Sciences (e-LICO)” (IST-2007.4.4-231519).
Authors would also like to thank Joost Schanstra and Julie
VI. INITIAL EVALUATION RESULTS Klein from INSERM for the provision and annotation of the
biopsy images.
In order to evaluate the accuracy of the image mining
framework for the characterization of obstructive
REFERENCES
nephropathy images, an initial dataset of 6 Kidney biopsy
images has been utilized. The images have been provided by [1] Klahr S, “Obstructive nephropathy”, Internal medicine, vol. 39, no 5.,
INSERM (France). They have been obtained from healthy 2000, pp. 355-361.
and pathogenic kidney biopsies of mice, and have been [2] Biological Web Services, June 2009, available online at:
http://maurobio.infobio.net/bws/biows.htm.
treated following Masson's trichrome staining technique in
[3] Tom Oinn, Matthew Addis, Justin Ferris, Darren Marvin, Martin
order to disclose the most important structures (see Fig. 4). Senger, Mark Greenwood, Tim Carver, Kevin Glover, Matthew R.
A magnification of 200, aperture of 0.5, 10 ms exposition Pocock, Anil Wipat and Peter Li, “Taverna: a tool for the composition
and gain of 1.0 have been used as shooting settings. In order and enactment of bioinformatics workflows”, Bioinformatics, vol. 20,
no. 17, pp. 3045-3054, June 2004.
to overcome the issue with the small dataset, the images [4] Shadbolt, N. Lewis, P., Dasmahapatra S., Dupplaw D., Hu B. and
have been gridified (See Section V) resulting into a quite Lewis H., “MIAKT: Combining Grid and Web Services for
larger dataset. Collaborative Medical Decision Making”, In Proc. of AHM2004 UK
eScience All Hands Meeting, September 2004, Nottingham, UK.
After applying the aforementioned image processing and [5] Todica V., Vaida M.F., “SOA-based medical image processing
the k – Nearest Neighbor classifier, 260 over 283 non- platform”, in Proc. of IEEE International Conference on Automation,
pathogenic Glomerulus (see Fig. 4) have been successfully Quality and Testing, Robotics, 2008, vol. 1, pp. 398-403, May 2008.
[6] D. Perez, J. Crespo, A. Anguita, J. Ordonez, J. Dorado, G. Bueno, V.
recognized (i.e. 91,87% accuracy) , whereas all pathogenic Feliu, A. Estruch, J. Heredia, “Biomedical Image Processing
Glomerulus were successfully classified. integration through INBIOMED: A Web Services-based Platform”,
presented at the 6th International Symposium on Biological and
Medical Data Analysis (ISBMDA 2005).
[7] Newcomer, Eric; Lomow, Greg (2005). Understanding SOA with Web
Services. Addison Wesley. ISBN 0-321-18086-0.
[8] Kyle Gabhart, “Secure, Reliable Web Services with Apache”,
available online at: http://www.xml.com/pub/a/2007/05/02/sure-
reliable-web-services-with-apache.html.
[9] The OpenSSL Project, information available online at:
http://www.openssl.org/.
[10] Ilias Maglogiannis, Charalampos Doukas, “Overview of Advanced
Computer Vision Systems for Skin Lesions Characterization”, IEEE
Transactions on Information Technology in Biomedicine, vol 13, no 5,
Fig. 4. Annotation of important structures that determine pathogenesis in a
pp. 721-733, Sept. 2009, DOI: 10.1109/TITB.2009.2017529..
Kidney biopsy image. Strong line: Glomerulus, Dashed line: Tubulus
[11] I. Maglogiannis, E. Zafiropoulos: “Utilizing Support Vector Machines
for the Characterization of Digital Medical Images” BMC Medical
The Bayesian classifier achieved 67,82% accuracy in Informatics and Decision Making 2004.
predicting pathogenic areas. SVM have reached an accuracy [12] Haralick M et al, “Textural Features for Image Classification”, IEEE
Transactions on systems man and cybernetics Vol. SMC-3 pp. 610-
of 76,87%, in addition to the fact that all non-pathogenic 621, 1973.
Glomerulus were successfully predicted. [13] Miersw Ingo, Wurst Michae, Klinkenberg Ralf, Scholz Martin and
Euler Timm. “YALE: Rapid Prototyping for Complex Data Mining
Tasks”, in Proceedings of the 12th ACM SIGKDD International
VII. CONCLUSION Conference on Knowledge Discovery and Data Mining (KDD-06),
This paper has presented an open image mining 2006.
framework for the characterization of Obstructive
Nephropathy images. It is based on image processing and
feature extraction operators available as Web Services and
their integration in RapidMiner, an open data mining
4111