You are on page 1of 7

International Journal of Scientific & Engineering Research, Volume 1, Issue 3, December-2010 1

ISSN 2229-5518

Computer Aided screening for Early

Detection of Breast Cancer using BRCA
Gene as an Indicator
S. Ranichandra, T. K. P. Rajagopal

ABSTRACT : A mass of breast tissue that is developing in an abnormal, uncontrolled way is the cancerous breast tumor.
The early detection of breast cancer is a key for survival because of its association with augmented treatment options.
Mammography screening and MRI are some of the existing breast cancer detection methods. MRI has problem of
resulting more number of false positives. Mammogram has disadvantages like expensive, false positives for patients with
dense breast tissues, detects only if tumor size bigger than 5mm and painful. Hence there is a need to develop more
convenient and accurate method. In this proposed approach, we analyzed gene expression patterns in blood cells for
detecting the breast cancer in the early stage. BRCA gene is a tumor suppressor gene which all people have. The BRCA
DNA sequences from patients are generated by PCR method and used as input in the local sequence alignment program
which is the implementation of Smith waterman algorithm. It compares the patient's gene sequence with the reference
BRCA gene sequence to determine the cancer risk at a very early stage.

KEYWORDS: Breast cancer, early detection, Tumor suppressor genes, BRCA, blood sample, PCR method, DNA
sequencing, gene sequence, Local sequence alignment algorithm, Smith waterman.

—————————— ——————————

the most reliable method to detect breast

1. INTRODUCTION cancer in asymptomatic patients. It is
extremely important to catch breast cancer
Breast cancer is a malignant tumor that starts at an early stage. Few of the main
from cells of the breast. A malignant tumor symptoms of breast cancer are:
is a group of cancer cells that may grow into Change in the size or shape of a
(invade) surrounding tissues or spread breast
(metastasize) to distant areas of the body. Dimpling of the breast skin
The disease occurs almost entirely in The nipple becoming inverted
women, but men can get it, too [3]. It is the Swelling or a lump in the armpit
second most common cancer that causes
death in white, black, Asian/Pacific Islander,
and American Indian/Alaska Native 2. PROBLEM DESCRIPTION
women. According to the report 2009-2010,
in 2009, an estimated 192,370 new cases of Although highly effective, it has
invasive breast cancer was diagnosed among significant limitations like, in the absence
women and approximately 40,170 women of micro calcification, mammography often
were expected to die from breast cancer fails to detect tumors that are less than 5
[2]. Staging is a method that has been mm in size, and also mammograms of
developed to describe the extent of cancer women with dense breast tissue are difficult
growth. In general, the lower the stage, to interpret. For example, in a study of over
better the person's prognosis [5].Early 11,000 women with no clinical symptoms of
detection of breast cancer can improve the breast cancer, the sensitivity of
chances of successful treatment and mammography was only 48% for the
recovery. Mammography screening is subset of women with extremely dense

IJSER © 2010
International Journal of Scientific & Engineering Research, Volume 1, Issue 3, December-2010 2
ISSN 2229-5518

breasts, compared with 78% sensitivity for BRCA genes provides instructions for
the entire sample of women in the study. So making a protein that is directly involved
there is a need to develop more accurate, in repairing damaged DNA. By helping
convenient and objective detection method repair DNA, BRCA plays a role in
[12].Comparing patient BRCA gene with the maintaining the stability of a cell's genetic
original gene is the identified method in the information [13].
proposed approach. Gene is a stretch of
DNA, so DNA sequences are compared to It is identified that more than 1,000
identify cancer risk. The sequence mutations in theBRCA1 gene and 800
comparison is executed using the dynamic mutations in the BRCA2 gene are
programming algorithm for local alignment possible, many of which are associated
between two DNA sequences proposed by with an increased risk of breast cancer.
Smith and Waterman called smith Most of these mutations lead to the j
waterman algorithm is a very well known production of an abnormally short version
and versatile algorithm [16]. of the BRCA1 protein, or prevent any
protein from being made from one copy of
3. EXPERIMENTAL STUDY OF BRCA the gene. Other BRCA1 mutations change
GENES single j protein building blocks (amino
acids) in the protein or delete large
The official name of BRCA 1 gene and
segments of DNA from the BRCA1 gene.
BRCA2 gene are breast cancer susceptibility
Many BRCA2 mutations insert or delete a
gene 1 and breast cancer susceptibility gene
small number of DNA building blocks
2, respectively. The BRCA genes belong to a
(nucleotides) in the gene. Researchers believe
class of genes known as tumor suppressor
that a defective or missing BRCA1 protein is
genes [10]. Like many other tumor
unable to help repair damaged DNA or fix
suppressors, the protein produced from
mutations that occur in other |genes. As
the BRCA genes helps prevent cells from
these defects accumulate, they can allow
growing and dividing too rapidly or in an
cells to | grow and divide uncontrollably
uncontrolled way. There is no strong
and form a tumor [8, 9].
homology between BRCA1 and BRCA2,
although both genes have a large exon 11 4. SEQUENCE COMPARISON
which seems to be crucial for function.
However, the function of the two genes Sequence comparison can be defined as
seems to be similar [14, 20]. The BRCA the problem of J finding which parts of the
genes provides instructions for making a sequences are similar and which parts are
protein that is directly involved in different. Generally, a measure of how
repairing damaged DNA. By helping similar they are is also desirable. A typical
repair DNA, BRCA 1 plays a role in approach to solve this problem is to find a
maintaining the stability of a cell's genetic good and plausible alignment between the
information [13]. It is identified that more two sequences. Then, given an appropriate
than 1,000 mutations in the both genes have scoring scheme, their similarity can be
a large exon 11 which seems to be crucial computed. Generally, sequence comparisons
for function. However, the function of the involve aligning sections of the two
two genes seems to be similar [14, 20]. The sequences in a way that exposes the

IJSER © 2010
International Journal of Scientific & Engineering Research, Volume 1, Issue 3, December-2010 3
ISSN 2229-5518

similarities between them [7]. The idea of precision (such as searching a database for
aligning two sequences (of possibly sequences with high similarity to a query).
different sizes) is to write one on top of the The three primary methods of producing
other, and break them into smaller pieces by pair wise alignments are dot-matrix
inserting spaces in one or the other so that methods, dynamic programming, and
identical subsequences are eventually word methods.
aligned in a one-to-one correspondence
naturally, spaces are not inserted in both Global alignment is achieved using the
sequences at the same position. The Needleman-Wunsch algorithm. The
objective of sequence alignment is to algorithm it tries to take all of one
match identical subsequences as far as sequence and align it with all of a second
possible. However, if the sequences are not sequence. Short and highly similar
identical, mismatches are likely to occur as subsequences may be missed in the
different letters are aligned together. The alignment because they are outweighed by
insertion of spaces produced gaps in the the rest of the sequence. Hence, one would
sequences. They are important to allow a like to create a locally optimal alignment
good alignment between the characters of [18]. Local alignments are more useful for
sequences. A gap in the first sequence is dissimilar sequences that are suspected to
considered an insertion of a character from contain regions of similarity or similar
the second sequence into the first one, sequence motifs within their larger
whereas a gap in the second sequence is sequence context. The Smith-Waterman
considered a deletion of a character of the algorithm is a general local alignment
first sequence. method also based on dynamic
programming. The dynamic
Once the alignment is produced, a score | programming approach to pair wise
can be assigned to each pair of aligned letters, sequence alignment is guaranteed to
called aligned pair, according to a chosen provide the optimal global or local pair
scoring scheme. The similarity of two wise alignment and score given a particular
sequences can be defined the best score scoring scheme [1]. In smith waterman
among all possible alignments between algorithm,
them. Sequence comparison is actually a 1. All symbols (residues) in the
well-know problem in computer science. two sequences have to be in the
Computational approaches to sequence alignment, and in the same
alignment generally fall into two order they appear in the
categories: global alignments and local sequences
alignments. Pair wise sequence alignment 2. We can align one symbol from
methods are used to find the best- one sequence with one from
matching piecewise (local) or global another
alignments of two query sequences. Pair 3. A symbol can be aligned with a
wise alignments can only be used between blank ('-')
two sequences at a time, but they are 4. Two blanks cannot be aligned
efficient to calculate and are often used for [6, 15, 17]
methods that do not require extreme
IJSER © 2010
International Journal of Scientific & Engineering Research, Volume 1, Issue 3, December-2010 4
ISSN 2229-5518

DNA sequencing is the first step of a suffix of b[l...j]

sequence analysis. DNA sequencing refers w(c,d), c, d € Z U ~[ ‘-‘],’-‘ is the gap-
to sequencing methods for determining the scoring scheme .
order of the nucleotide bases-adenine,
guanine, cytosine and thymine in a
molecule of DNA. DNA Sequencing can be
performed using PCR .Steps in DNA
sequencing are

1. Extract genomic DNA

2. Amplify known gene region
3. Verify successful PCR amplification
4. Clean PCR products
5. Quantify DNA concentration
6. Cycle sequence

Precipitate cycle- sequenced products and

submits them. Consecutively, the obtained
DNA sequence is used as Input
in the local sequence alignment program,
Smith waterman [11]. The system flow
diagram is shown in Fig 1.

5.1. Smith waterman Algorithm

A matrix H is built as follows:
H(i,0) = 0, 0 <= i <= m
H(0, j) = 0,0 <= j <= n

H (i , j) =

max { 0 Fig 1 System Flow Diagram

H(i-,j,j-l)+w(ai„bj) Match/Mismatch
H(i-1,j) + w(ai,- ) Deletion 5.2. Working principle of the algorithm
H(I,j-1) + w( , bi) Insertion 1. Assigns a score to each pair of bases
} 1<=i<=m, 1<= j<= n a. Uses similarity scores only
b. Uses positive scores for
related residues
Where: J
c. Uses negative scores for
substitutions and gaps
a, b = Strings over the Alphabet S
m = length(a)
2. Initializes edges of the matrix with
n = length(b)
H(i j) - is the maximum Similarity-
3. As the scores are summed in the
Score between a suffix of a[l...i] and
IJSER © 2010
International Journal of Scientific & Engineering Research, Volume 1, Issue 3, December-2010 5
ISSN 2229-5518

matrix, any score below 0 is The program is implemented in JAVA.

recorded as 0 The DNA sequence generated by PCR is
4. Begins the trace back at the used as an input for the proposed
maximum value found anywhere detection method. The implemented
in the matrix program will read the content of the file
5. Continues until the score falls to 0. and compares the input DNA sequence
This algorithm will give the place of with the reference gene sequence file. If
mismatch and with those results the mismatch is null, the sequence is same
presence or absence of cancer can be I therefore the patient is healthy which is
confirmed [4, 19]. evident from Fig 2.

6. IMPLEMENTATION AND RESULT Otherwise, there is mismatch value and so

ANALYSIS patient is in cancer risk therefore patient
has to be recommended for treatment, is
shown in Fig 3. The Graph (Fig 4) signifies
the early detection of cancer and survival

Fig 4 survival rate of patients(5 year relative)

in early detection of breast cancer
(Source: American cancer Society)


The proposed approach for early

detection of breast cancer using local
sequence alignment technique identifies
Fig. 3 shows that patient sequence does not
match with original sequence and hence
whether patient is affected by cancer or not.
patient has cancer risk Furthermore, the risk level of cancer in
affected patients is also determined.

IJSER © 2010
International Journal of Scientific & Engineering Research, Volume 1, Issue 3, December-2010 6
ISSN 2229-5518

Consequently, this early detection method BRCA1 Breast cancer 1, early onset
using DNA sequencing has significantly BRCA2 Breast cancer 2, early onset
advantageous than other methods since PCR Polymerase Chain Reaction
cancer risks can be identified in the early DNA Deoxyribonucleic acid
stage, even before the symptoms are
clearly observable. Moreover, this method REFERENCES
is beneficial as a consequence of its ease of [1]. EC Rouchka "Aligning DNA sequencing
use, economical with respect to laboratory using Dynamic Programming",ACM, 2006.
usage and reliable as genes are used for [2]. American Cancer Society, "Breast
detection. The proposed approach has Cancer Facts and Figures 2009-2010",
95% efficiency in detecting breast cancer American Cancer Society, 2009.
in early stage. This project assures more [3]. American Cancer Society, "What is
effective and accurate method and aims Breast Cancer", American Cancer Society,
towards breast cancer detection in early Sep. 18, 2009.
stage. [4]. Baylor college of Medicine HGSC,
"Smith waterman algorithm," Baylor college
8. FUTURE SCOPE of Medicine HGSC, Aug.01, 2002.
[5]. Breast Cancer, "Stages of Breast Cancer",
The current evaluation system has Breast Cancer, Jan.21, 2010.
potential outcome in observing the cancer
[6]. David W Mount, Bioinformatics:
risk of patient. The smith waterman
Sequence and genome analysis, 2nd ed, NY:
algorithm is effective for text string
Cold spring horbor laboratory press, 2000.
matching, but an assessment is required to
[7]. Eugene W. Myers, "An Overview of
determine the proportional benefits of the
algorithm with the traditional techniques
Comparison Algorithms in Molecular Biology,"
and other sequencing algorithms. Thought
Department of Computer Science, The
smith waterman algorithm is very sensitive
University of Arizona, Arizona, Tech Rep
and accurate, it has more time complexity
91-29, December 20,1991.
and it needs large memory space. As the
[8]. Genetic Home Reference, "BRCA1",
biological sequencing data are rapidly
Genetic Home Reference, Aug, 2007
expanding, the memory requirement has
[9]. Genetic Home Reference, "BRCA2",
become a critical problem in the existing
Genetic Home I Reference, Aug, 2007.
smith waterman algorithm. The future
[10]. National Cancer Institute, "BRCA I and
work can target to use the upgraded Smith
BRCA2: Canarl Risk and Genetic Testing"
waterman algorithm, that has reduced
National Cancer Institute, May.29, 2009.
computational complexity to (N*(M+l)/2)
[11]. "Overview of steps in DNA Sequencing".
and less size and space complexity.
[Online]. .Apr.6 2010.
Moreover, risk level of cancer can also be
[12]. P. Sharma et al, "Early detection of breast
identified in further computational
cancer base on gene-expression patterns in
peripheral blood cells," Breast cancer research,
p. 634+, Jun 2005.
[13]. Ralph Scully, "Role of BRCA gene
ABBREVATIONS dysfunction in breast and ovarian cancer

IJSER © 2010
International Journal of Scientific & Engineering Research, Volume 1, Issue 3, December-2010 7
ISSN 2229-5518

predisposition," Breast Cancer research, July

2000. T.K.P. Rajagopal
[15]. S. A. de Carvalho Junior," Sequence M.C.A., M.Phil., M.A.,
Alignment Algorithms," M.S. thesis, King.s M.Phil., M.E.(CSE),
College London, University for London, M.Tech., is working as an
London, September 2003. Assistant Professor in the
[16]. S. Das and D.Dey, "A new algorithm for Department of Computer
localalignment in DNA sequencing, "in IEEE Science and Engineering at Kathir College
India National conference, 2004, pp 410-413. of Engineering, Coimbatore, India. He
[17]. "Smith Waterman algorithm" Oct. 4, has 11 years of teaching experience. His
2007 research areas are Network Security, Data
[18]. Wikipedia, "Smith Waterman algorithm," Mining and Digital Image Processing. He
WikipediaH April 2010. has published 3 books, 2 research papers
[20]. Wisegeek, "What is a tumor suppressor in international journals, Presented 2
gene wisegeek, 2009. papers in International Conference, 13
papers in National level conferences and
AUTHORS PROFILE has attended 12 Workshops. He is a life
time member of various professional
S.Ranichandra M.S.c.,
societies like ISTE, DOEACC, CSTA,
M.Phil., M.B.A., is
serving as a Sr.Lecturer
in the Department of
Computer Science at
K.S.Rangasamy College
of Arts and Science,
Tiruchengode with 10 years of teaching
experience. Her area of specialization
includes Data structures and Algorithms,
Principles of Compiler Design, Digital
Image processing and Networking. She
has published 2 books, 2 papers in
International journal, presented 12 papers
in National level conferences, 3 papers in
International Level conference and has
attended 10 workshops. She is a life
member in CSTA, New York and ACS,
Bangalore, IACSIT (International
Association of Computer Science and
Information Technology), Singapore,
IAENG (International Association of
Engineers), Hong Kong.

IJSER © 2010