You are on page 1of 20

IMAGE PROCESSING AND MACHINE LEARNING FOR THE DIAGNOSIS OF MELANOMA CANCER

Arushi Raghuvanshi

In the United States in 2010 there were only about 70,000 new cases of melanoma cancer but 10,000 deaths, making it one of the deadliest types of cancer. Melanoma cancer can be cured. Then why so deadly? The biggest problem with melanoma is that it is not diagnosed early enough.

I created a computer program for at home use to help with the early diagnosis of melanoma cancer.

Abstract
Melanoma cancer is one of the most dangerous and potentially deadly types of skin cancer; however, if diagnosed early, it is nearly one-hundred percent curable [UnderstMel09]. Here I propose an efficient system which helps with the early diagnosis of melanoma cancer. Different image processing techniques and machine learning algorithms are evaluated to distinguish between cancerous and noncancerous moles. Two image feature databases were created: one compiled from a dermatologist-training tool for melanoma from Hosei University and the other created by extracting features from digital pictures of lesions using a software called Skinseg. I then applied various machine learning techniques on the image feature database using a Python-based tool called Orange. The experiments suggest that among the methods tested, the combination of Bayes machine learning with Hosei image feature extraction is the best method for detecting cancerous moles. Then, using this method, a computer tool was developed to return the probability that an image is cancerous. This is a very practical application as it allows for at-home findings of the probability that a mole is cancerous. This does not replace visits to a

Current Methods of Diagnosis


Currently, the initial diagnosis of melanoma is based on manual inspection by a trained dermatologist using what is known as the ABCDE method. If there is a high probability that a mole is cancerous from this manual inspection, then the dermatologist performs an excisional biopsy and looks at it at the microscopic level to distinguish if the mole is malignant or benign. If the mole is malignant, the surgeon performs a Sentinel Lymph Node or SLN biopsy. If melanoma is still present in the body after this step, the surgeon may remove nearby lymph nodes to keep the

The ABCDE Method


The ABCDE method is the current method of manual inspection used by dermatologists to distinguish between moles that are cancerous and benign. ABCDE is an acronym for the following: Asymmetry, Boarder Irregularity, Color, Diameter, Evolving. When a mole is asymmetrical or the boarders are uneven, it is more likely to be malignant. Similarly, if it has two or more colors, a large diameter, or evolves over time, there is a high probability that the mole is cancerous. The diagnosis of melanoma is not based on just one of these factors but a combination of all of them. My project uses these image attributes as

Source: The Ear, Nose, and Throat Alliance: http://www. allianceent.net/index.php?section=3&pid=198

My Steps for Detecting Melanoma


Image acquisition: I first need to acquire images of moles that may be cancerous. For this project, I used images taken from a standard household digital camera in order to make the tool available for at-home use. I received a database of images online and from a dermatologist. Feature extraction: Feature extraction is deciding what features are important and extracting those features from an image. For this project, the important features corresponded with the ABCDE method for melanoma detection. Machine learning: Machine learning algorithms can be used to determine trends based on the relationship between features found in the previous step. These trends can be used to predict whether a new image will be

Procedure (1)
Step 1 Acquire Images
Acquire images of both cancerous and non-cancerous moles from dermatologists and from the internet

Step 2 Create feature database using different image feature extraction software tools
i) Skinseg tool - Skinseg segments a given image to isolate the portion of interest (i.e. the mole) and extracts a set of features from this segment.
After compiling a set of images, open each image individually, and segment it. If automatic segmentation doesnt work, do semi-automatic or manual segmentation.

Procedure (2)
Once all of the feature files are saved, use a python script to create a database of the following features: Dominant HSI intensity Asymmetry Entropy Boundary irregularity Energy Average RGB intensity Inertia Dominant RGB intensity Homogeneity Average HSI intensity This creates a database of the features in a TAB delimited file.

ii) Hosei tool: The Hosei Tool has predetermined features for given images. Most likely, these features were determined by physician inspection.
Using this dermatologist-training website, compile a set of the following features: Globules Symmetry Atypical Pigment Borders Blue Whitesh Veil Color Atypical Vascular Pattern Pigment Network Irregular Streaks Branched Steaks Irregular Pigmentation Homogenous Regression Structures Dots

Procedure (3)
Step 3. Machine Learning
The databases created in step 2 are now used for machine learning. There are various methods for machine learning. I worked with the following:
Majority Learning Bayes Learning Decision Trees kNN (Nearest Neighbor)

Evaluate the above methods using a Python based software tool called Orange. I wrote code in this program to test the percent accuracies of different sets of data for a given machine learning method and feature extraction method.

Image Acquisition
Taken by normal digital camera Acquired images from multiple sources Dr. Kristin Stevens, MD, Dermatologist, Providence Medical Group, Portland, OR Dr. Sandhya Koppula, MD, Dermatologist, Cornell Dermatology Clinic, Beaverton, OR Internet

Feature Extraction Tools


Skinseg: Skinseg is a program developed by Wright State University. This tool segments a given image and then extracts a set of features based from the segment. Hosei Tool: Created by Hosei University in Japan, this tool provides predetermined features for given images. Most likely, these features were determined by manual inspection by dermatologists. CVIP tools: This tool, developed by Southern Illinois University at Edwardsville, is very powerful but mostly interactive, so I did not use it for my project. It is possible to create a computer program which does this in a more automatic way, but I chose to use Skinseg and the Hosei tool instead. In the future, I plan to experiment more with this tool. Mole Expert Micro: This is a commercial software for the feature extraction of melanoma images. I was able to receive an evaluation version of this software for free from the founder of the company. Unfortunately, this software required a certain pixel per millimeter count which was not available for my images. Open CV: This tool from Intel would be very powerful in completely automating the process of feature extraction; however, it is not specifically for melanoma images. In the future, I plan to use Open CV or get the source

Machine Learning
Machine Learning Tools
I used Orange, a python-based tool, which had libraries of multiple machine learning algorithms. I wrote programs in Orange to test a variety of different machine learning algorithms (listed below) MVSIS is another machine learning tool which I explored but didnt test

Machine Learning Algorithms


Majority Learning
Majority learning, is a basic technique which gives a probability of a given mole being cancerous based on the distribution of cancerous and non-cancerous entries in the database.

Bayes Learning
In Bayes learning, Bayesian networks are created which represent the relationship between a given feature and the probability that the mole is cancerous. Combined, these networks can give a probability for whether or not the mole is cancerous.

Decision Trees
This machine learning method creates a tree based on the training data. There are a variety of different techniques to how to create the best tree and to distinguish which features are important and which are not.

kNN (k-Nearest Neighbor)


Nearest Neighbor is a machine learning method which creates a ndimensional space corresponding to n features. It then calculates the

Sample Feature Database

Creating Feature Database


Save to text file with feature list and classification (melanoma- yes/no?)

Original image

Segmented Extracted image Features Process Next Image Ye s

More images to process?

Run Python script (convert.py) providing all the feature text files as input and creating a TAB file Each input file gets represented by a row in TAB file

No

skinsegd b.tab

Analyzing a User-Submitted Image


Save to text file with feature list

User-Submitted Image

Segmented image

Extracted Features

skinsegd b.tab

Run Python script in Orange for machine learning which applies various machine learning methods to provide a probability for whether or not the user-submitted image has a cancerous mole.

Decision Trees

Majority Learning

Bayes Learning

Nearest Neighbor

Proposed Web-based Melanoma Diagnostic System


Web Page
Submit . jpg image of mole Display Result Feature Extraction

Features

Machine Learning

Diagnosis

Results

Machine Learning Algorithm

Machine Learning Algorithm

Analysis & Conclusion


I concluded that using image processing and machine learning tools I can create an effective algorithm to assist with the early diagnosis of melanoma cancer. This algorithm can be implemented in a website which can be used by people at home. The best learning method on my data set was using Bayes learning and the Hosei tool for feature extraction. This was different from my original hypothesis. The accuracy of classification was less than I predicted, but I only had about 130 images in my database. With more images, the accuracy will grow to a larger number. I did some experimentation and found that accuracy increases with the size of the database. I eliminated some variables from the machine learning database to help keep my results consistent. I eliminated number of pixels, perimeter, and area, because each image used a different scale. I also eliminated the file name because that had no effect on whether or not the image was cancerous. Next steps include automating the process, using a larger

Acknowledgements
Dr. A. Goshtasby, Wright State University on Skinseg Image processing tool Dr. Scott E Umbaughs, Southern Illinois University, Edwardsville on CVIP tools Dr. Alan Mishchenko, University of California Berkeley for MVSIS Mr. Holger Ldtke, founder of MoleExpert for providing access to an evaluation version of the MoleExpert Micro tool. Ms. Iris Cheng, University of California Berkeley for sharing her research on image processing Dr. B.J. Shrestha, Missouri University of Science and Technology Dr. Kristin Stevens, MD, Dermatologist, Providence Medical Group, Portland. Field expert for current methods of diagnosis; also provided images of melanoma Dr. Sandhya Koppula, MD, Dermatologist, Cornell

You might also like