Professional Documents
Culture Documents
1
3.3 Summary of the Waterfall Model........................................................................................ 29
3.4 The Waterfall Model ........................................................................................................... 29
3.4.1 Requirements Definition............................................................................................... 31
3.4.2 System and Software Design ........................................................................................ 31
3.4.3 Implementation and Unit Testing ................................................................................. 32
3.4.4 Integration and System Testing .................................................................................... 33
3.4.5 Operation and Maintenance .......................................................................................... 33
3.5 Justification of Selected Methodology ................................................................................ 33
3.6 Technologies and framework .............................................................................................. 34
3.7 Summary ............................................................................................................................. 35
CHAPTER 4 – PROJECT MANAGEMENT............................................................................... 37
4.1 Introduction ......................................................................................................................... 37
4.2 Risk and Quality Management ............................................................................................ 37
4.3 Risk Analysis/ Risk Register ............................................................................................... 38
4.3.1 Risk Register................................................................................................................. 38
4.4 Effort Costing Model .......................................................................................................... 39
4.5 Effort Calculations for Project ............................................................................................ 40
4.6 Scheduling and Work plan .................................................................................................. 43
4.7 Summary ............................................................................................................................. 45
CHAPTER 5 – CONCLUSION ................................................................................................... 46
References ..................................................................................................................................... 47
2
DECLARATION
I hereby declare that the project proposal entitled “Breast Cancer Diagnosis with
Machine Learning” submitted for the course “ICT 431” is my original work and the project has
not formed basis for the award of any degree, associate ship, fellowship or any other similar
tittle. I recognize that failure to acknowledge the material which is acquired from other sources
maybe or can be considered plagiarism.
Student Supervisor
3
ABSTRACT
Breast cancer being one of the biggest, challenging diseases with a mortality rate where
21% of all breast cancer deaths worldwide are attributable to alcohol use, overweight and
obesity, and physical inactivity with the proportion higher in high-income countries (27%), and
the most important contributor was overweight and obesity and in low- and middle-income
countries, the proportion of breast cancers attributable to these risk factors was 18%, and
physical inactivity was the most important determinant (10%) and is most infectious in women
compared to men, needs an easier and faster way to be diagnosed. The manual techniques of
diagnosing breast cancer are effective but slow. A diagnosis system with the capability of giving
accurate diagnosis in a key to achieving the goal of easier and faster breast cancer diagnosis.
In this paper we propose a breast cancer diagnosis system that will make diagnosis easier and
faster.
4
LIST OF FIGURES
Figure 1 Multi-layer feed forward NN (Daniel Graupe, 2013) .................................................... 19
Figure 2 A processing neuron ....................................................................................................... 20
Figure 3 waterfall lifecycle consists of several non-overlapping stages....................................... 30
Figure 4 waterfall lifecycle consists of several non-overlapping stages....................................... 30
Figure 5 sub-activities system development ................................................................................. 32
Figure 6 the risk management life cycle ....................................................................................... 38
Figure 7 Information Domain Value(umd, 2018) ......................................................................... 40
Figure 8 functional point(umd, 2018) ........................................................................................... 40
Figure 9 Lines of Code(umd, 2018) .............................................................................................. 41
Figure 10 Effort cost and Duration (umd, 2018) .......................................................................... 41
Figure 11 effort costing using COCOMO II calculation (csse, 2018) .......................................... 42
Figure 12 result from the COCOMO calculation (csse, 2018) ..................................................... 43
Figure 13 proposal Gantt Chart..................................................................................................... 44
Figure 14 project Gantt Chart ....................................................................................................... 45
5
LIST OF TABLES
Table 1 Attribute Information ..................................................................................................................... 17
Table 2 the risk register............................................................................................................................... 39
Table 3 Breakdown of costs/expenditure .................................................................................................... 42
6
ACRONYMS AND ABBREVIATIONS
BC Breast Cancer
CBE Clinical Breast Examination
CDH Cancer Disease Hospital
COCOMO Constructive Cost Model
DLL Dynamic-link libraries
GUI Graphical User Interface
KNN K Nearest Neighbor
ML Machine Learning
MRI Magnetic Resonance Imaging
NB Naïve Bayes
NN Neural Networks
PET Position-Emission Tomography
RAD Rapid Application Development
RS_SVM Rough Set based Support Vector Machine
UTH University Teaching Hospital
USPSTF United States Preventive Services Task Force
WBCD Wisconsin Breast Cancer Dataset
7
CHAPTER 1 – INTRODUCTION
1.1 Introduction
Breast cancer is the most common invasive cancer among women and is the and second
main cause of death after lung cancer. Advances in screening and treatment have improved
survival rates dramatically since 1989.
Being aware of the symptoms of breast cancer and early screening allows the prevention
of breast cancer.
Alongside the common screening tools / techniques for detecting breast cancer, scientists
are looking to enhance the screening techniques with the help of computer aided diagnosis. A
number of methods and algorithms for detecting breast cancer are being developed and used.
Several chance factors for most breast cancers were properly documented. but, for most of
the people of the female sex having breast cancer, it isn't feasible to discover unique risk elements
( (IARC, 2008); (Lacey JV Jr et al, 2009)).
A familial history of breast cancer increases the risk by a factor of two or three. Some
mutations, particularly in BRCA1, BRCA2 and p53 result in a very high risk for breast cancer.
However, these mutations are rare and account for a small portion of the total breast cancer burden.
The contribution of various modifiable risk factors, excluding reproductive factors, to the
overall breast cancer burden has been calculated by Danaei et al. (Danaei G et al, 2005).
The overall aim of this project is to quantify existing service delivery capacity and to
identify gaps, challenges, and priority areas for building, setting-appropriate and a sustainable
breast cancer control service system in Zambia.
8
1.2 Problem Statement
In this project we will quantify existing service delivery capacity and to identify gaps,
challenges, and priority areas for building, setting-appropriate and a sustainable breast cancer
control service system in Zambia.
1.3 Aim
The overall aim is to quantify existing service delivery capacity and to identify gaps,
challenges, and priority areas for building setting-appropriate and a sustainable breast cancer
control service system in Zambia.
1.4 Objectives
Research on the best Machine Language Model to use for breast cancer diagnosis
Build a diagnosis system based on the Machine Learning Model
Test the Machine Learning based application with user interface
1.7 Summary
Breast cancer is the most common invasive cancer among women and is the and second
main cause of death after lung cancer. Advances in screening and treatment have improved
survival rates dramatically since 1989. The overall aim of this project is to quantify existing service
delivery capacity and to identify gaps, challenges, and priority areas for building, setting-
appropriate and a sustainable breast cancer control service system in Zambia. This project will
9
consist of trained Machine Language Model as its core for easier and quicker diagnosis and project
will be completed by May, 2019 and will have a GUI for easy usability.
10
CHAPTER 2 – LITERATURE REVIEW
2.1 Introduction
2.1.1 Mortality Rate
21% of all breast cancer deaths worldwide are attributable to alcohol use, overweight and
obesity, and physical inactivity. This proportion was higher in high-income countries (27%), and
the most important contributor was overweight and obesity. In low- and middle-income countries,
the proportion of breast cancers attributable to these risk factors was 18%, and physical inactivity
was the most important determinant (10%).
The differences in breast cancer incidence between developed and developing countries
can partly be explained by dietary effects combined with later first childbirth, lower parity, and
shorter breastfeeding (Peto J, 2001). The increasing adoption of western life-style in low- and
middle-income countries is an important determinant in the increase of breast cancer incidence in
these countries.
The global burden of cancer is growing steadily, with much of this burden falling on
developing countries, where nearly 80% of disability adjusted life years lost to cancer occurs.
Although it is rising, breast cancer incidence in developing nations is much lower than that in
developed nations. Death rates, however, remain the same.
System level barriers to breast cancer care in these environments have been well
documented and are primarily centered around the lack of accessible and affordable screening,
early detection, diagnostic, and treatment facilities.
Other barriers include lack of awareness of the early signs and symptoms of breast cancer,
the belief that cancer has a supernatural origin and is always fatal, the use of traditional therapies
before or in lieu of seeking more modern treatment, and fear of spousal abandonment following
mastectomy.
11
2.1.2 Causes of Breast Cancer
i. Exogenous hormones
Ovarian hormones are commonly taken exogenously, either for contraception, or as
‘replacement’ therapy for symptoms believed to be due to low levels of the natural
products, usually during or after menopause. When oral contraceptives were introduced
in the early 1960s there was considerable speculation, based on experimental work, that
they might increase the risk of breast cancer.
Replacement hormones are another matter. In 1976, Hoover et al published the first
evidence of increased risk among women taking replacement estrogens. In a large
gynecologic practice, there was a 30% excess of breast cancer among women taking
Premarin, a kind of estrogen stew derived from the urine of pregnant mares, and among
those taking the medication for 15 years or more, the risk was doubled.
ii. Ionizing Radiation
Mammary tissue is quite susceptible to malignant transformation by ionizing radiation.
Excess Breast cancer has been observed in patients given multiple fluoroscopes.
Radiotherapy for ankylosing spondylitis or enlargement of the thymus gland, and in
survivors of the atomic bombing, painters of radium watch faces and X-ray technicians.
iii. Alcohol
The findings on beverage alcohol are summarized in a joint analysis by the oxford
Group of data from 53 epidemiologic studies. Women who had an average daily
consumption of 4 or more drinks a day had a 50% higher breast cancer risk than those
who did not drink alcohol.
12
mortality with mammography; they concluded that screening for breast cancer with mammography
is unjustified. The USPSTF performed a meta-analysis using data from the same trials. The
researchers concluded that the flaws in some of the studies did not significantly influence
outcomes; therefore, they included pooled effects from seven valid studies. The resulting
recommendation was for screening mammography every one to two years for women 40 years and
older (Knutson & Steiner, 2007).
Ultrasonography
Because mammography is less sensitive and breast tissue is denser in younger women,
ultrasonography has been considered as a screening tool for younger women who are at high risk
for breast cancer. A consensus statement published by the European Group for Breast Cancer
Screening concluded that there is no evidence to support the use of ultrasonography for screening
at any age (Knutson & Steiner, 2007).
The use of MRI as a screening test for breast cancer was first reported in the 1980s, and
studies have demonstrated its benefits and limitations. Studies using MRI in high-risk women
report that MRI is significantly more sensitive than mammography, and mammographic screening
with or without ultrasonography is probably an insufficient screen for persons with a known
genetic predisposition for breast cancer (Knutson & Steiner, 2007).
Scintimammography
Positron-emission tomography
13
reasonably sensitive and specific, but it is limited in detecting some breast tumors based on size,
metabolic activity, and histologic subtype.39 There is no evidence demonstrating a clear advantage
over other adjuvant imaging studies, and the high cost has limited its use as a routine diagnostic
tool (Knutson & Steiner, 2007).
14
technique in conjunction with 3 features was used to detect a micro calcification pattern
and a neural network was used to classify it into benign/malignant. The system was
developed on a Microsoft windows platform. It is an easy-to-use intelligent system that
gives the user options to diagnose, detect, enlarge, zoom and measure distances of areas
in digital mammograms.
2.4.2 Dataset
This breast cancer database was obtained from the University of Wisconsin Hospitals,
Madison from Dr. William H. Wolberg.
Sources:
Past Usage:
15
Attributes 2 through 10 have been used to represent instances.
Relevant Information:
16
Table 1 Attribute Information
# Attribute Domain
1 Sample code number id number
2 Clump Thickness 1 - 10
3 Uniformity of Cell Size 1 - 10
4 Uniformity of Cell Shape 1 - 10
5 Marginal Adhesion 1 - 10
6 Single Epithelial Cell Size 1 - 10
7 Bare Nuclei 1 - 10
8 Bland Chromatin 1 - 10
9 Normal Nucleoli 1 - 10
10 Mitoses 1 - 10
11 Class (2 for benign, 4 for malignant)
Class distribution:
17
Instead, they automatically generate identifying characteristics from the learning material that they
process.
Neural Networks is based on the collection of connected units or nodes called artificial
neurons, which loosely model actual biological neurons. Each collection can transmit a signal to
another. Warren McCulloch and Walter Pitts (1943) created a computational model for neural
networks based on mathematics and algorithms called threshold logic. This model paved the way
for neural network research to split into two approaches. One approach focused on biological
processes in the brain while the other focused on the application of neural networks to artificial
intelligence. This work led to work on nerve networks and their link to finite automata.
Equation 1 𝑔𝑖 (𝑥)
that can further be decomposed into other functions. This can be conveniently represented as a
network structure, with arrows depicting the dependencies between functions. A widely used type
of composition is the nonlinear weighted sum, where
Figure 1 presents a feed forward NN, with one hidden layer. Except for the input layer
neurons, every neuron is a computational element with an activation function. The principle
mechanism of the NN is that when data is presented to the input layer, the network neurons run
computations in the subsequent layers until an output value is yielded at each of the neurons in
the output layer
18
.
where 𝑎𝑎𝑗𝑗 is the activation of neuron j, which is equal to the sum of the weighted sum of the
inputs 𝑥1, 𝑥2,…., 𝑥𝑝 and the threshold 𝜃𝑗 , 𝑤𝑗𝑖 is the connection weight from neuron i to neuron j,
𝑓𝑗 is the activation function for the 𝑗𝑡ℎ neuron and 𝑦𝑗 is the output. Figure 2 shows a graphical
representation of how a neuron processes information.
19
Figure 2 A processing neuron
The sigmoid function is popularly used as the activation function and is defined as:
1
Equation 4 𝑓(𝑡) = 1+ 𝑒 −𝑡
A single neuron in a multi-layer NN is able to linearly separate the input space into
subspaces by means of a hyper plane defined by the weights and the threshold, where the weights
define the direction of the hyper plane and the threshold offsets it from the origin (David Gil,
2012).
Euclidean Distance
Minkowski Distance
Mahalanobis Distance
KNN creates local models (or neighborhoods) across the feature space with each space defined
by a subset of the training data. Implicitly a ‘global’ decision space is created with boundaries
between the training data. One advantage of KNN is that updating the decision space is easy. KNN
is a nearest neighbor algorithm that creates an implicit global classification model by aggregating
local models, or neighborhoods.
20
Outliers can create individual spaces which belong to a class but are separated. This mostly relates
to noise in the data.
The solution is dilution of dependency of the algorithm on individual (possibly noisy) instances.
Once we have obtained the K-Nearest-Neighbors using the distance function, it is time for
the neighbors to vote in order to predict its class.
For each class l ∈ L we count the amount of k-neighbors that have that class.
Voting more mathematically, modification of the algorithm to return the majority vote
within the set of k nearest neighbors to a query q. Mk(q) is the prediction of the model M for query
q given the parameter of the model k.
Levels(l) is the set of levels (classes) in the domain of the target feature and l is an element
of this set. i iterates over the distance di in increasing distance from the query q.
Delta (ti, l) is the Knoecker Delta function which takes two parameters and returns 1 if they
are equal or 0 if not.
21
The naive Bayesian classifier works as follows:
1. Let T be a training set of samples, each with their class labels. There are k classes, C1, C2,
. . ., Ck. Each sample is represented by an n-dimensional vector, X = {x1, x2, . . .,xn},
depicting n measured values of the n attributes, A1, A2, . . . , An, respectively.
2. Given a sample X, the classifier will predict that X belongs to the class having the highest
a posteriori probability, conditioned on X. That is X is predicted to belong to the class 𝐶𝑖 if
and only if 𝑃(𝐶𝑖 |𝑋) > 𝑃(𝐶𝑗 |𝑋) for 1 ≤ j ≤ m, j ≠ i. Thus we find the class that maximizes
P(𝐶𝑖 |X). The class Ci for which P(𝐶𝑖 |X) is maximized is called the maximum posteriori
hypothesis. By Bayes’ theorem
3. As P(X) is the same for all classes, only P(X|𝐶𝑖 )P(𝐶𝑖 ) need be maximized. If the class a
priori probabilities, P(𝐶𝑖 ), are not known, then it is commonly assumed that the classes are
equally likely, that is, P(C1) = P(C2) = . . . = P(Ck), and we would therefore maximize
P(X|𝐶𝑖 ). Otherwise we maximize P(X|𝐶𝑖 )P(Ci). Note that the class a priori probabilities
may be estimated by P(𝐶𝑖 ) = freq(Ci, T)/|T|.
4. Given data sets with many attributes, it would be computationally expensive to compute
P(X|𝐶𝑖 ). In order to reduce computation in evaluating P(X|𝐶𝑖 ) P(𝐶𝑖 ), the naive assumption
of class conditional independence is made. This presumes that the values of the attributes
are conditionally independent of one another, given the class label of the sample.
Mathematically this means that
The probabilities P(x1|𝐶𝑖 ), P(x2|𝐶𝑖 ), . . . , P(xn|𝐶𝑖 ) can easily be estimated from the training
set. Recall that here xk refers to the value of attribute Ak for sample X.
22
b. If Ak is continuous-valued, then we typically assume that the values have a Gaussian
distribution with a mean µ and standard deviation σ defined by
1 (𝑥− 𝜇)2
Equation 8 𝑔(𝑥, 𝜇, 𝜎) = 𝑒𝑥𝑝 −
√2𝜋𝜎 2𝜎2
so that
We need to compute µ𝐶𝑖 and σ𝐶𝑖 , which are the mean and standard deviation of values of
attribute Ak for training samples of class 𝐶𝑖 .
In order to predict the class label of X, P(X|𝐶𝑖 )P(𝐶𝑖 ) is evaluated for each class 𝐶𝑖 . The
classifier predicts that the class label of X is 𝐶𝑖 if and only if it is the class that maximizes
P(X|𝐶𝑖 )P(𝐶𝑖 ).
2.4.4 Research on Machine Language Techniques for Breast Cancer Diagnosis
This section outlines how the best model for Breast Cancer diagnosis will be determined.
The models namely Neural Networks, K-NN (K-Nearest Neighbor) and Naïve Bayes will be in
this research.
Research will be done in MATLAB, using the Wisconsin Breast Cancer Database. The
Wisconsin Breast Cancer Database will be divided into three parts:
i. Training where 70% of the Wisconsin Breast Cancer Database will be used.
ii. Validation where 15% of the Wisconsin Breast Cancer Database will be used.
iii. Testing where 15% of the Wisconsin Breast Cancer Database will be used.
The models ability to produce accurate results will be determined using the following metric for
evaluating classification models:
23
For binary classification, accuracy can also be calculated in terms of positives and negatives as
follows:
𝑇𝑃+𝑇𝑁
Equation 11 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 𝑇𝑃+𝑇𝑁+𝐹𝑃 +𝐹𝑁
Given a set of positive results or findings from the classifier model, the results in which
the required class are, are the ones that are called precision or positive predictions. For example,
in a dataset of 12 pictures of a mixture of cats and dogs, in which the classifier has to recognize
the cats in the pictures. The classifier produces 8 positive results of 12 and out of the 8, only in 5
does the classifier recognize cats. The 5 are what we will pick as true positives while the rest are
false positives. In this case the precision of the classifier is 5/8 and the recall is 5/12.
Precision
In the field of information retrieval, precision is the fraction of retrieved documents that are
relevant to the query:
Recall
In information retrieval, recall is the fraction of the relevant documents that are successfully
retrieved:
For classification tasks, the terms true positives, true negatives, false positives, and false
negative compare the results of the classifier under test with trusted external judgments. The terms
24
positive and negative refer to the classifier's prediction (sometimes known as the expectation), and
the terms true and false refer to whether that prediction corresponds to the external judgment
(sometimes known as the observation).
𝑇𝑃
Equation 14 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = 𝑇𝑃+𝐹𝑃
𝑇𝑃
Equation 15 𝑟𝑒𝑐𝑎𝑙𝑙 = 𝑇𝑃+𝐹𝑃
In some other cases we might face a problem with imbalanced classes. E.g. we have two
classes, say A and B, and A shows up on 5% of the time. Accuracy can be misleading, so we go
for measures such as precision and recall. There are ways to combine the two.
Cohen’s kappa statistic is a very good measure that can handle very well both multi-class and
imbalanced class problems.
𝑃𝑜 − 𝑃𝑒 1− 𝑃𝑜
Equation 16 𝑘 = =1−
1 − 𝑃𝑒 1− 𝑃𝑒
where Po is the observed agreement among raters (identical to accuracy), Pe is the expected
agreement, the hypothetical probability of chance agreement, using the observed data to calculate
the probabilities of each observer randomly seeing each category. If the raters are in complete
agreement, then k = 1. If there is no agreement among the raters other than what would be expected
by chance (as given by Pe), k = 0. When the statistic is negative, it implies that there is no effective
agreement between the two raters or the agreement is worse than random.
25
2.5 Summary
This chapter reviewed how breast cancer is diagnosed and the existing computer based
systems around the world. We then discovered that there is no existing system in Zambia. We also
talk about the proposed system, the Machine Learning Models to be trained and tested, and how
we will determine the accuracy for the Machine Learning Models.
26
CHAPTER 3 - RESEARCH METHODOLOGY
3.1 Introduction
There are so many software process models for developing and implementing software.
The software process models have stages or phases that are used as a guide to developing the
software. Any of these software process models can be used to develop a software but the
appropriate software process models to use is dependent on the type of software being developed.
1. Specification.
2. Design.
3. Validation.
4. Evolution.
27
I. Spiral Model
The spiral model is similar to the incremental model, with more emphases placed
on risk analysis. The spiral model has four phases: Planning, Risk Analysis, Engineering
and Evaluation. A software project repeatedly passes through these phases in iterations
(called Spirals in this model). The baseline spiral, starting in the planning phase,
requirements are gathered and risk is assessed. Each subsequent spiral builds on the
baseline spiral.
28
is an iterative, trial-and-error process that takes place between the developer and the
user(s).
V. Rapid application development model (RAD)
This model is based on prototyping and iterative development with no specific
planning involved. The process of writing the software itself involves the planning required
for developing the product.
Rapid application development focuses on gathering the requirements through
workshops or focus groups, early testing of the prototypes by the customers using the
iterative concept and rapid delivery.
The waterfall model is the classical model of software engineering. This model is one of
the oldest models and is widely used in government projects and in many major companies. As
this model emphasizes planning in early stages, it ensures design flaws before they develop. In
addition, its intensive document and planning make it work well for projects in which quality
control is a major.
29
Figure 3 waterfall lifecycle consists of several non-overlapping stages (Munassar & Govardhan
A, 2010)
Figure 4 waterfall lifecycle consists of several non-overlapping stages (Munassar & Govardhan
A, 2010)
The following list details the steps for using the waterfall
1. System requirements: Establishes the components for building the system, including the
hardware requirements, software tools, and other necessary components. Examples include
decisions on hardware, such as plug-in boards (number of channels, acquisition speed, and
so on), and decisions on external pieces of software, such as databases or libraries.
2. Software requirements: Establishes the expectations for software functionality and
identifies which system requirements the software affects. Requirements analysis includes
30
determining interaction needed with other applications and databases, performance
requirements, user interface requirements, and so on.
3. Architectural design: Determines the software framework of a system to meet the specific
requirements. This design defines the major components and the interaction of those
components, but it does not define the structure of each component. The external interfaces
and tools used in the project can be determined by the designer.
4. Detailed design: Examines the software components defined in the architectural design
stage and produces a specification for how each component is implemented.
5. Coding: Implements the detailed design specification.
6. Testing: Determines whether the software meets the specified requirements and finds any
errors present in the code.
7. Maintenance: Addresses problems and enhancement requests after the software releases.
Document Analysis
This will require evaluating the documentation of a present or existing system. This can
assist when making AS-IS process documentation and also when driving the gap analysis for the
scoping of the project. This way, we can also establish or determine the requirements that drove
to making the existing system which can be the beginning point for documenting all current
requirements.
Interface Analysis
Integration with other external devices and systems is another interface. The user centric
design approaches are quite effective to ensure that you make usable software. Interface analysis
is vital to ensure that there are no overlook requirements that are not instantly visible to users.
31
In this phase, the complex activity of system development is divided into several smaller
sub-activities, which coordinate with each other to achieve the main objective of system
development as shown in figure below.
Identify Design Goals
System Decomposition
Identification of Concurrency
Hardware Allocation
System Design
Data Management
Boundary Condition
Creating a contingency, training, maintenance and operation plan and review the proposed
design. Ensure that the final design must meet the requirements stated in SRS document. Finally,
prepare a design document which will be used during next phases.
32
With inputs from the system design, the system will be first developed in small programs called
units, which will be integrated in the next phase. Each unit will be developed and tested for its
functionality, which is referred to as Unit Testing.
The design will be implemented into source code through coding. Combine all the modules
together into training environment that detects errors and defects. A test report which contains
errors is prepared through test plan that includes test related tasks such as test case generation,
testing criteria, and resource allocation for testing.
Thus, maintenance changes the existing system, enhancement adds features to the existing
system, and development replaces the existing system. It is an important part of system
development that includes the activities which corrects errors in system design and
implementation, updates the documents, and tests the data.
It is easy to manage due to the rigidity of the model, each phase has specific deliverables
and a review process. Also the model phases are processed and completed one at a time.
33
It allows for departmentalization and control. A schedule can be set with deadlines for each
stage of development and a product can proceed through the development process model phases
one by one.
Easy to manage due to the rigidity of the model. Each phase has specific deliverables and
a review process.
Works well for projects where requirements are very well understood.
Analyze data
Develop algorithms
34
The language, apps, and built-in math functions enable you to quickly explore multiple
approaches to arrive at a solution. MATLAB lets you take your ideas from research to production
by deploying to enterprise applications and embedded devices, as well as integrating with
Simulink and Model-Based Design.
MATLAB Doesn’t require compiler to execute like C, C++. It just executes each sentence as
it is written in code. This increase productivity and coding efficiency. It is higher level language.
Using MATLAB Coder the codes written in MATLAB can be converted to C++, Java, Python,
.Net etc. This makes this language more versatile. So, scientific theories can be implemented in
other languages also. And those library files, or Dynamic-link libraries (dlls) can be directly
implemented in other languages.
MATLAB has inbuilt rich library of Neural Network, Fuzzy Logic, Simulink, Power System,
Hydrolins, Electricsl , Communication, Electromagnetics etc.
Thus developing any scientific simulation is easy to do using such rich library.
3.7 Summary
There are so many software process models for developing and implementing software.
The software process models have stages or phases that are used as a guide to developing the
software.
1. Specification.
2. Design.
3. Validation.
4. Evolution.
1. Waterfall model
2. Prototype model.
3. Rapid application development model (RAD).
4. Evolutionary development: Specification, development and validation are interleaved.
35
5. Incremental model.
6. Iterative model.
7. Spiral model.
8. Component-based software engineering: The system is assembled from existing
components.
The waterfall model which will be adopted for the project is the classical model of software
engineering. This model is one of the oldest models and is widely used in government projects and
in many major companies. As this model emphasizes planning in early stages, it ensures design
flaws before they develop.
The waterfall model is easy to manage due to the rigidity of the model, each phase has
specific deliverables and a review process. Also the model phases are processed and completed
one at a time.
The framework to be used for the project is MATLAB which is a programming platform
designed specifically for engineers and scientists. The programming platform is easy to use for
research and app development because of the built-in models and the integration with other
programming languages.
36
CHAPTER 4 – PROJECT MANAGEMENT
4.1 Introduction
The risks in project management refers to a range of probabilities that cause an adverse
event and therefore the results prior to the event. Risks in project management can be identified,
estimated, assessed and controlled. Management of project risk management can be described as
a complex process of planning, identification, analysis, evaluation and control of project risks. The
risk in project management refers to a range of probabilities that cause an adverse event and
therefore the results prior to the event. Risks in project management can be identified, estimated,
assessed and controlled risk management activities of the project. Management of project risk
management can be described as a complex process of planning, identification, analysis,
evaluation and control of project risks.
The risk management life cycle are the steps or phases taken to manage risk in a project. The risk
management life cycle includes:
1. Identifying risks
2. Evaluating risks and their impact
37
3. Identifying responses
4. Selecting responses
Identify risks
Identify
Responses
38
Table 2 the risk register
Functional point
The functional point for the project is calculated as shown in the figures below. The
calculations where done using basic COCOMO.
39
Figure 7 Information Domain Value(umd, 2018)
The figure below shows the Lines of Code for this project
40
Figure 9 Lines of Code(umd, 2018)
41
Figure 11 effort costing using COCOMO II calculation (csse, 2018)
The result from the COCOMO II calculation shown in the figure below are an estimate of how
long and how much it will take and cost to finish the project.
Breakdown of Costs
42
Figure 12 result from the COCOMO calculation (csse, 2018)
This chart shows the stages taken to come up with the proposal for this project. Each task was
started after the task before it is completed.
43
Figure 13 proposal Gantt Chart
This chart shows the phases and durations for each phase. The phases have sub tasks and
each task has a duration. A task can only start if the previous task has been completed, once all
tasks in a phase are completed then the next phase can start.
44
Figure 14 project Gantt Chart
4.7 Summary
In this chapter we determined the risks, their impact on the project, and how to manage or
handle the risks. We also calculated effort cost for this project using basic COCOMO where we
obtained the functional point, lines of code for the project, and the duration of the project. We later
on used COCOMO II to get an estimate of how much the project will cost. We came up with a
Gantt chart for the project which shows when each phase in developing the system will start and
how long it will take to for each phase to finish.
45
CHAPTER 5 – CONCLUSION
The techniques of diagnosing breast cancer in Zambia are still manual techniques. From
the literature, we discovered there is no known existing computer based system for breast cancer
diagnosis in Zambia. This project is breast cancer diagnosis system with the core of diagnosis, a
Machine Language Model which will be determined through research on which ML Model is best
suited for the project. The research will be training and testing of the ML Model and the ML Model
are; Neural Networks, Naïve Bayes and K Nearest Neighbor. Training and testing will be done
using the dataset obtained from the University of Wisconsin Hospitals, Madison from Dr. William
H. Wolberg which has ten (10) attributes and two (2) classes: benign and malignant. The ML
Model will be used based on how accurate it will be during testing. Accuracy will be done using
the following matrix: Accuracy Rate, Precision ad Recall and Cohen’s Kappa statistic. Effort costs
and duration for the project have been estimated using COCOMO II and basic COCOMO. Later
the project will be developed using the waterfall model.
46
References
Carla Chibwesha. (2015). A Comprehensive Assessment of Breast and Cervical Cancer Control in
Zambia.
Chen, H.-L. (2011). A support vector machine classifier with rough set-based feature selection for
breast cancer diagnosis. Expert Systems with Applications, 9014-9022.
csse. (2018, november 29). COCOMO II - Constructive Cost Model. Retrieved from COCOMO
II: http://csse.usc.edu/tools/COCOMOII.php
Danaei G et al. (2005). Comparative risk assessment of nine behavioural and environmental risk
factors. Causes of cancer in the world, 1784-1793.
David Gil, J. L.-T. (2012). Predicting seminal quality with artificial intelligence methods. Expert
Systems with Applications.
IARC. (2008). World cancer report. Lyon: International Agency for Research on Cancer.
KNUTSON, D., & STEINER, E. (2007). Screening for Breast Cancer. Current Recommendations and
Future Directions, 1-7.
Lacey JV Jr et al. (2009). Breast cancer epidemiology according to recognized breast cancerrisk
factors in the Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial
Cohort. BMC Cancer.
47
Munassar, N. M., & Govardhan A. (2010). A Comparison Between Five Models Of Software
Engineering. In IJCSI International Journal of Computer Science Issues (pp. 94-101).
www.IJCSI.org .
Peto J. (2001). Cancer epidemiology in the last century and the next decade. 390-395.
umd. (2018, november 29). Basic cocomo model. Retrieved from umd.umich.edu:
http://groups.umd.umich.edu/cis/course.des/cis525/js/f00/gamel/cocomo.html
U.S., P. S. (2009). Screening for Breast Cancer. U.S. Preventive Services Task Force
Recommendation Statement, 716-726.
Venkataraman Ray R, J. P. (2018). Cost and value management in projects. New Jersey: John
Wiley & Sons Inc.
Wilson, B. (2012, June 25). The Machine Learning Dictionary. Retrieved from
http://www.cse.unsw.edu.au/~billw/mldict.html#activnfn
48