You are on page 1of 13

Received: 9 June 2017

DOI: 10.1002/mma.4764

S P E C I A L I S S U E PA P E R

Multiset topology via DNA and RNA mutation

M. M. El-Sharkasy1 Wafaa M. Fouda2 Mohamed S. Badr3

1
Department of Mathematics, Faculty of
Science, Tanta University, Tanta, Egypt Topology is the most important branch of modern mathematics, which plays an
2
Faculty of Pharmacy, South Valley important role in applications. In this paper, we use the concept of the topology,
University, Qena, Egypt based on the concept of multiset to solve an important problems in life (DNA
3
Department of Mathematics, Faculty of and RNA mutation) to detect diseases and help biologists in the treatment of dis-
Science, Assiut University, New Valley,
Egypt eases. Also, we introduce a new theory that explains if there is an existence of a
mutation or not, and we have a set of metric functions through which we exam-
Correspondence ine the congruence and similarity and dissimilarity between “types,” which may
Mohamed S. Badr, Department of
Mathematics, Faculty of Science, Assiut
be a strings of bits, vectors, DNA or RNA sequences, … , etc. Finally, we will
University, New Valley, Egypt. introduce a theory that which can be used to know the existence and place of
Email: m_shaban97@yahoo.com mutation.
Communicated by: H. Bakouch
K E Y WO R D S
DNA and RNA, metric space, multiset, mutation, topology
MSC Classification: 54A10; 54C50; 54C10

1 I N T RO DU CT ION

The notion of topological space and their generalizations are one of the most influential concepts in branches of sciences,
information systems and physics.1-5 “The biological application of topology to the study of DNA structure and interactions
that involve alterations of DNA topology is an essential aspect of the existence of every living cell.”
In biology, mutation is a permanent change in the DNA sequence that formats gene. Mutations range in size from one
DNA base to a whole chromosome change; mutations that occur within the protein-coding of a gene can be classified
into many kinds such as silent mutation, missense mutation, and nonsense mutation and so on.
A multiset (mset) is a collection of elements that may be repeated. The multiplicity of mset is the number of times an
element occurs in a mset.6,7 We can write an mset as a set of ordered pairs {number/element}. The set {l, l, l, m, m, n} has
an equivalent representation in mset {3∕l, 2∕m, 1∕n}.
The concept of msets can be using to solve many problems life, detection of diseases and help the biological scientists
to the treatment of diseases.
This paper begins with the introduction of a brief survey of mset and the mutation. After the introduction of msets and
mutation, we have linked the concept of mset and mutation through several examples, and then we have a theory that
can link between topology and mutation. We have a set of distance functions through which we examine congruence,
similarity, and dissimilarity. Finally, we will introduce a new method that which can be used to know the existence and
place of mutation.
One of our essential motive for this work is to find some mathematical solutions to identify the mutation and to prove
the validity of these biological solutions. Also, we can use a computer program to find these solutions.

2 PRELIMINARIES AND BASIC DEFINITIONS

2.1 Multiset and its mset topology


In this section, a brief survey of results and notations as introduced in literature7-15 is presented.

Math Meth Appl Sci. 2018;1–13. wileyonlinelibrary.com/journal/mma Copyright © 2018 John Wiley & Sons, Ltd. 1
2 EL-SHARKASY ET AL.

Definition 2.1. A mset M drawn from the set X is represented by a mapping CM (x) defined as CM (x) ∶ X → N, where
N is the set of positive integers. CM (x) is the number of occurrences of the element x in the mset M. We present the mset
M drawn from the set X = {x1 , x2 , … , xn } as M = {m1 ∕x1 , m2 ∕x2 , … , mn ∕xn } where mi is the number of occurrences
of the element xi , i = 1, 2, … , n, in the mset M . Clearly, a set is a special case of an mset.
Let M and N be 2 msets drawn from a set X. Then, the following are defined.6,13,16-18

(i) M = N if CM (x) = CN (x)∀x ∈ X.


(ii) M ⊆ N if CM (x) ≤ CN (x)∀x ∈ X.

The cardinality of an mset M is symbolized by Card (M) or |M| and is given by Card M = x∈X CM (x).

Definition 2.2. (Girish and John12 and Jena et al19 ; whole submsets)
A submset N of M is a whole submset of M with each element in N having full multiplicity as in M, ie, CN (x) = CM (x)
for every x in N.

Definition 2.3. (Girish and John12 and Jena et al19 ; partial whole submsets)
A submset N of M is a partial whole submset of M with at least one element in N having full multiplicity as in M, ie,
CN (x) = CM (x) for some x in N.

Definition 2.4. (Girish and John12 and Jena et al19 ; full submsets)
A submset N of M is a full submset of M if M and N having the same support set with CN (x) ≤ CM (x) for every x in N,
ie, M∗ = N∗ with CN (x) ≤ CM (x) for every x in N.

Example 2.5. (Girish and John12 )


Let M = {3∕a, 4∕b, 6∕c} be an mset. The following are some of submsets of M that are whole submsets, partial whole
submsets, and full submsets:

(a) A submset {3∕a, 4∕b} is a whole submset and partial whole submset of M.
(b) A submset {2∕a, 4∕b, 3∕c} is a partial whole submset and full submset of M.
(c) A submset {2∕a, 4∕b} is a partial whole submset of M.

Definition 2.6. (Previous studies9-12 )


Let M1 and M2 be 2 msets drawn from a set Y; then the Cartesian product of M1 and M2 is defined as M1 × M2 =
n
{(m∕x, n∕𝑦)∕mn ∶ x∈m M1 , 𝑦∈ M2 }.

Definition 2.7. (Girish and John9 and Girish and Sunil10 )


A submset R of M × M is called an mset relation on M if every member (m∕x, n∕y) of R has a count and product of
C1 (x, y) and C2 (x, y), We denote m∕x related to n∕y by m∕xRn∕y.

Definition 2.8. (Girish and John9 and Girish and Sunil10 )


(i) An mset relation on M is reflexive if n∕xRn∕x∀n∕xinM and irreflexive if m∕xRm∕x never holds.
(ii) An mset relation on M is symmetric if n∕xRm∕y implies m∕yRn∕x.
(iii) An mset relation on M is transitive if n∕xRm∕y and m∕yRk∕z then n∕xRk∕z.

Example 2.9. Let X = {2∕a, 3∕b, 5∕c, 8∕d} be an mset.R = {(2∕a, 2∕a)∕4, (5∕c, 5∕c)∕25, (2∕a, 8∕d)∕16, (8∕d, 2∕a)∕16,
(3∕b, 3∕b)∕9, (8∕d, 8∕d)∕64, (2∕a, 5∕c)∕10, (5∕c, 2∕a)∕10, (5∕c, 8∕d)∕40, (8∕d, 5∕c)∕40} is an equivalence mset relation.

Definition 2.10. (Girish and John9 and Girish and Sunil10 )


Let M ∈ [X]m and 𝜏 ⊆ P∗ (M). Then 𝜏 is called an mset topology if 𝜏 satisfies the following:

1. ∅ and M are in 𝜏.
2. The union of the elements of any subcollection of 𝜏 is in 𝜏.
3. The intersection of the elements of any finite subcollection of 𝜏 is in 𝜏.
EL-SHARKASY ET AL. 3

Definition 2.11. (Girish and John11,12 ; M-basis)


If M is an mset, an M-basis for an M-topology on M is collection B of partial whole submsets of M such that

(i) For each x ∈m M, for some m > 0, there is at least one M-basis element B ∈ B containing m∕x.
(ii) If m∕x belongs to the intersection of 2 M-basis elements M1 and M2 , then there is an M-basis element M3
containing m∕x such that M3 ⊆ M1 ∩ M2 .

Definition 2.12. (Girish and John11,12 )


Let R be an mset relation on M. The post-mset of x ∈m M is defined as m∕xR = {n∕y ∶ ∃ some k with k∕xRn∕y}, the
post-class P+ = {m∕xR ∶ x ∈m M}, the pre-mset of x ∈r M is defined as Rr∕x = {p∕y ∶ ∃ some q with p∕yRq∕x} and
pre-class P− = {Rm∕x ∶ x ∈m M}.

Example 2.13. (Girish and Jacob14 )


Let M = {3∕a, 4∕b, 2∕c, 5∕d} and R = {(3∕a, 3∕a)∕9, (2∕a, 3∕b)∕6, (2∕b, 4∕d)∕8, (2∕c, 3∕d)∕6, (2∕d, 2∕a)∕4} be an mset
relation on M,
then the post-msets are
(3∕a)R = {3∕a}; (2∕a)R = {3∕b}; (2∕b)R = {4∕d}; (2∕c)R = {3∕d}; (2∕d)R = {2∕a}.
The pre-msets are
R(3∕a) = R(2∕a) = {3∕a, 2∕d},
R(3∕b) = {2∕a},
R(2∕c) = ∅,
R(4∕d) = R(3∕d) = {2∕b, 2∕c}.
We get the post-class as P+ = {{3∕a, 3∕b}, {4∕d}, {3∕d}, {2∕a}}.
The pre-class P− = {{3∕a, 2∕d}, {2∕a}, {2∕b, 2∕c}}.
The M-bases 𝛽 + are obtained from the post-class
P+ ∶ 𝛽 + = {{3∕a, 3∕b}, {4∕d}, {3∕d}, {2∕a}, 𝜙, M}.
The M-bases 𝛽 − are gotten from the pre-class 𝛽 − = {{3∕a, 2∕d}, {2∕a}, {2∕b, 2∕c}, 𝜙, M}.
Hence, the M-topologies generated by the post-class 𝜏 1 and pre-class 𝜏 2 are
𝜏 1 = {𝜙, M, {3∕a, 3∕b}, {4∕d}, {2∕a}, {2∕a, 4∕d}, {3∕a, 3∕b, 4∕d}}
𝜏 2 = {𝜙, M, {3∕a, 2∕d}{2∕a}, {2∕b, 2∕c}, {2∕a, 2∕b, 2∕c}, {3∕a, 2∕d, 2∕b, 2∕c}}.

2.2 Gene mutations


Mutations are random changes to the genetic material of an organism based on biologist conceptions.17,20-22
Structurally, mutations can be classified as
• Small-scale mutations (affecting small gene in one or a few nucleotides)
• Large-scale mutations (affect chromosomal structure)

Small scale: One gene is affected by any change to the DNA sequence of a gene:Nucleotides/bases may be added,
missing, or changed.

1. Point mutation
2. Insertions
3. Deletions

Point mutation: mutations that occur within the protein-coding of a gene can be classified into many kinds such as

• Silent mutation through which there is code for the same amino acid.
• Missense mutation through which there is code for different amino acid.
• Nonsense mutation through which there is code for stop and can truncate the protein.
• Neutral mutation through which there is detectable change in the function of the protein.
Frameshift mutation
• They are usually caused by errors during replication of repeating elements.
4 EL-SHARKASY ET AL.

• Insertions in coding of a gene cause shift in reading frame (framshift).


• One or more bases (A, T, C, or G) are added or delete.

2.2.1 Methods preparation of standards for mutation analysis


Each of these methods is designed to detect the presence of a variation in sequence.23 DNA sequencing is performed to
identify the specific change.
• Enzymatic mutation detection
• Two-dimensional gene scanning
• Protein truncation test
• Denaturing high-performance liquid chromatography
• Single-strand conformation polymorphism analysis
In this paper, we determine whether there is a mutation or not by using mathematical methods. We determine the
location of the mutation in Theorem 5.13 and Appendix A.

2.2.2 Genetic code


RNA coding occurs in messenger RNA (mRNA) and is the coding that is actually read during the synthesis of
polypeptides.18,24 However, each mRNA molecule acquires its sequence of nucleotides by transcription from the corre-
sponding gene. Because DNA sequencing has become so rapid, the most genes are now being discovered at the level of
DNA before they are discovered as mRNA or as a protein product. The central dogma of molecular biology is an expla-
nation of the flow of genetic information within a biological system. It is often stated as (DNA → RNA → protein) as in
Figure 1.

FIGURE 1 Central dogma of biology [Colour figure can be viewed at wileyonlinelibrary.com]

FIGURE 2 A, Simple population of 8 individuals. B, MultiPopulation of 4 MultiIndividuals


EL-SHARKASY ET AL. 5

2.2.3 Multiset genetic algorithm


MuGA (Algorithm1) is a genetic algorithm in which populations that are represented by msets are called MultiPopulations
and individuals represented by pairs {copies, genotype} are called MultiIndividuals. Figure 2 shows a simple population
with 8 individuals of One Max Problem and the equivalent MultiPopulations with 4 MultiIndividuals.25
The One Max Problem26 (or BitCounting) is a simple problem consisting in maximizing the number of ones of a bit
string. Formally, this problem can be described as finding a string Y = {y1 , y2 , … , yn } where yi ∈ {0, 1}, which maximizes

the following equation: F(𝑦) = ni=1 𝑦i .
From Definition 2.6, we can define genetic Cartesian product as M1 RM2 = {(m∕x, n∕y), x ∈m M1 , y ∈n M2 }.
In this section, we will use the mset and mset relation to determine whether there is a mutation or not. We will express
the sense strand of DNA by M5/3/ and antisense strand of DNA by M3/5/ and suppose that M1 = M5/3/and M2 = M3/5/.

3 M ULT I S ET AN D MU TAT ION FOR WILD TYPE OF DNA O R RNA (AMIN O


AC ID) A ND GENES, S EQUENCES, AND STRINGS O F D NA OR RNA
( N U C L EOT I DE )

The genetic code can be expressed as either RNA coding or DNA coding. The mutation occur as a result of one or more
of the following changes as the substitution, addition, deletion of one or more nucleotide pairs in the gene sequence or
strings of DNA or RNA and change in coding for the amino acid as (Figure 3) and the next examples. This will be seen in
the next examples.
Example 3.1. In this example, we studied the mutation using mset and mset relation.
(a)
CTGCAG
GACGTC
(b)
ACTAG
CTAGA
′ ′
We suppose that M1 = M 3′ is the first tape in wild type and M2 = M 5′ is the second tape in wild type: For (a), M1 =
5 3
{2∕C, 1∕T, 2∕G, 1∕A}, M2 = {2∕G, 1∕A, 2∕C, 1∕T} and (b) M1 = {2∕A, 1∕C, 1∕T, 1∕G}, M2 = {1∕C, 1∕T, 2∕A, 1∕G}.
It is clear that from (a) and (b) we have M1 = M2 . But we note that M1 = mx = M2 = {m∕y} where if x = T ≡ U,
then y = A or if x = C then y = G and the conversely on order under this condition means there is a mutation on (b)
and no mutation on (a).
In general, the msets are not ordered as in (b), so we will use mset relation.
For (a), R = {(2∕C, 2∕G), (1∕T, 1∕A), (2∕G, 2∕C), (1∕A, 1∕T)}. Since ∀A ⇐⇒ T and G ⇐⇒ C, this rela-
tion does not contain an arrangement sequence. If we took the order into account, the relation becomes R =
{(1∕C, 1∕G), (1∕T, 1∕A), (1∕G, 1∕C), (1∕A, 1∕T), (1∕G, 1∕C)}. But this will take a longer time in complete gene.
For (b), R = {(1/A, 1/C), (1/C, 1/T), (1/T, 1/A), (1/A, 1/G), (1/G, 1/A)}. Therefore, (b) contains a mutation.

FIGURE 3 Types of mutation in a gene coding sequence [Colour figure can be viewed at wileyonlinelibrary.com]
6 EL-SHARKASY ET AL.

Example 3.2. Consider the types of mutation in a gene coding sequence as in Figure 3.
• Nonsense mutation
GUCAUGUUUAGCUCAAUCAGGAAGUGU
GUCAUGUUUAGCUCAAUCAGGAAGUGU
First tape GUCAUGUUUAGCUCAAUCAGGAAGUGU.
Val Met Phe Ser Ser Iie Arg Lys Cys
M1 = {1∕Val, 1∕Met, 1∕Phe, 2∕Ser, 1∕Iie, 1∕Arg, 1∕Lys, 1∕Cys}
M2 = {7∕G, 9∕U, 4∕C, 7∕A}
Second tape GUCAUGUUUAGCUAAAUC AGG AAG UGU
Val Met Phe Ser Stop Iie Arg Lys Cys
M3 = {1∕Val, 1∕Met, 1∕Phe, 1∕Ser, 1∕stop, 1∕Iie, 1∕Arg, 1∕Lys, 1∕Cys}
M4 = {7∕G, 9∕U, 3∕C, 8∕A}
Hence, M1 ≠ M3 changes coding for the amino acid serine to stop codon and M2 ≠ M4 for nucleotide pairs change of
C to A.
• Frameshift mutation
GUCAUGUUUAGCUCAAUCAGGAAGUGU
Val Met Phe Ser Ser Iie Arg Lys Cys
M1 = {1∕Val, 1∕Met, 1∕Phe, 2∕Ser, 1∕Iie, 1∕Arg, 1∕Lys, 1∕Cys}
M2 = {7∕G, 9∕U, 4∕C, 7∕A}
GUC AUG UUU AAG CUC AAU CAG GAA UGU
Val Met Phe Lys Leu Asn Gin Giu Val
M3 = {2∕Val, 1∕Met, 1∕Phe, 1∕Lys, 1∕Leu, 1∕Asn, 1∕Gin, 1∕Giu}
M4 = {6∕G, 9∕U, 4∕C, 8∕A}
Hence, M1 ≠ M3 changes all coding of the next coding in the mRNA strand and M2 ≠ M4 for nucleotide pairs
addition of A.

Remark 3.3.
A mutation occurs if one or more of the following conditions occurred:
1. If |M1 | ≠ |M2 |.
2. If M1 = {m∕x} ≠ M2 = {m∕y, y = xc } since x = {T ≡ U, G}, y = {A, C} or conversely on order.
3. If n(M1 ) ≠ n(M2 ), where n(M1 ) and n(M2 ) are the numbers of elements that exists in mset M1 and M2 .
From RNA coding, we have M = {Leu, Asn, Gin, Giu, Lys, Phe, Met, Ser, Arg, Tyr, Ile, Cys, Thy, Trp, Asp, Pro, Giy, His, Val,
Ala}.
And the mset relations of amino acids is R = {(2∕Phe,2∕Lys),(2∕Leu, 2∕Asn),(2∕Ser, 2∕Arg),(2∕Ser, 2∕Ser), (1∕Tyr, 1∕Ile),
(1∕Tyr, 1∕met), (2∕Cys, 2∕Thy), (2∕Leu, 2∕Giu), (2∕Leu, 2∕Asp), (4∕Pro, 4∕Giy), (2∕His, 2∕Val), (2∕Gin, 2∕Val), (4∕Arg,
4∕Ala), (2∕Lys, 2∕Phe), (2∕Asn, 2∕Leu), (2∕Arg, 2∕Ser), (2∕Ser, 2∕Ser), (1∕Ile, 1∕Tyr), (1∕met, 1∕Tyr), (2∕Thy, 2∕Cys),
(2∕Giu, 2∕Leu), (2∕Asp, 2∕Leu), (4∕Giy, 4∕Pro), (2∕Val, 2∕His), (2∕Val, 2∕Gin), (4∕Ala, 4∕Arg)}.
But if we neglected the symmetry in the relation, then we will obtain a new relation R = {(2∕Phe, 2∕Lys), (2∕Leu,
2∕Asn), (2∕Ser, 2∕Arg), (2∕Ser, 2∕Ser), (1∕Tyr, 1∕Ile), (1∕Tyr, 1∕met), (2∕Cys, 2∕Thy), (2∕Leu, 2∕Giu), (2∕Leu, 2∕Asp), (4∕
Pro, 4∕Giy), (2∕His, 2∕Val), (2∕Gin, 2∕Val), (4∕Arg, 4∕Ala)}.
From these relations, we can identify amino acids that can be linked together and determine whether there is a mutation
or not.

4 MUTATION A ND TO POLO GY

In this section, we used the concept of topology to know the occurrence of a mutations or not.
Theorem 4.1. (Girish and John13 )
If R is a mset relation on M, then the post-class P+ and pre-class P− form a sub M-base.
EL-SHARKASY ET AL. 7

Proof. Let 𝛽 + be the collection of all finite intersections of members of P+ . Then M∈1 𝛽+ and condition (1) of M-basis
holds. To establish (2) of M-basis, let A, B∈1 𝛽+ and let 𝑝 ∈k A ∩ B. Now, each A and B is a finite intersection of the
members of P+ and hence A ∩ B is a finite intersection of the members of P+ . It follows that p ∈k K = A ∖ B and K ∈ 𝛽 + ,
so that condition (2) of M-basis holds. Thus, the post-class P+ forms a sub M-basis.
Similarly, we can prove that pre-class P− forms a sub M-basis.

Example 4.2. (Silent mutation)


GUC AUG UUU AGC UCA AUC AGG AAG UGUGUC AUG UUC AGC UCA AUC AGG AAG UGU
where M1 represents the first stripe for the types and M2 the second stripe for the types
M1 = {7∕G, 9∕U, 4∕C, 7∕A}
M2 = {7∕G, 8∕U, 5∕C, 7∕A}
Then M = {7∕G, 9∕U, 5∕C, 7∕A}
R = {(7∕G, 7∕G)∕49, (8∕U, 8∕U)∕64, (1∕U, 1∕C)∕1, (4∕C, 4∕C)∕16, (7∕A, 7∕A)∕49}.
Then the post-msets and pre-msets are
(7∕G)R = {7∕G}, (8∕U)R = (1∕U)R = {8∕U, 1∕C},
(4∕C)R = {4∕C}, (7∕A)R = {7∕A}, R(7∕G) = {7∕G},
(7∕A)R = {7∕G}, R(8∕U) = {8∕U}, R(4∕C) = R(1∕C) = {8∕U, 1∕C}.
The post-class P+ = {7∕G, 8∕U, 1∕C, 4∕C, 7∕A}
The pre-class P− = {7∕G, 8∕U, 1∕U, 4∕C, 7∕A}
The M-bases for the P+
𝛽 + = {7∕G, 8∕U, 1∕C, 4∕C, 7∕A, 1∕C, 𝜙, M}
The M-bases for the P−
𝛽 − = {{7∕G}, {8∕U}, {1∕U, 4∕C}, {7∕A}, {1∕U}, 𝜙, M}
The M-topology for P− is
𝜏 1 = {𝜙, M, {7∕G}, {8∕U}, {1∕U, 4∕C}, {1∕U}, {7∕G, 1∕U, 4∕C}, {7∕G, 8∕U}, {7∕G, 7∕A}, {7∕G, 1∕U},
{8∕U, 4∕C}, {7∕A, 8∕U}}.
The M-topology for P+ is
𝜏 2 = {𝜙, M, 7∕G, 8∕U, 1∕C, 4∕C, 7∕A, 1∕C, 7∕G, 8∕U, 1∕C, 7∕G, 4∕C, 7∕G, 7∕A, 7∕G, 1∕C, 8∕U, 4∕C}.

Example 4.3. (a)


GGATCC
CCTAGG
M1 = {2∕G, 1∕A, 1∕T, 2∕C}
M2 = {2∕G, 1∕T, 1∕A, 2∕C}
M = {2∕G, 1∕A, 1∕T, 2∕C}
R = {(2∕G, 2∕C)∕4, (1∕A, 1∕T)∕1, (2∕C, 2∕G)∕4, (1∕T, 1∕A)∕1}
Then the post-msets and pre-msets are
(2∕G)R = {2∕C}, (1∕A)R = {1∕T}, (2∕C)R = {2∕G},
(1∕T)R = {1∕A} , R(2∕G) = {2∕C}, R(1∕A) = {1∕T},
R(2∕C) = {2∕G}, R(1∕T) = {1∕A}.
The post-class and pre-class P+ = {{2∕C}, {1∕T}, {2∕G}, {1∕A}} = P−
The M-bases for the P+ and P− are 𝛽 + = {{2∕C}, {1∕T}, {2∕G}, {1∕A}, 𝜙, M} = 𝛽 − .
The M-topology for the P+ and P− are
𝜏 1 = {𝜙, M, {2∕C}, {1∕T}, {2∕G}, {1∕A}, {2∕G, 1∕T}, {2∕G, 1∕A}, {2∕G, 2∕C}, {1∕T, 1∕A}, {1∕T, 2∕C},
{1∕A, 2∕C}, {2∕G, 1∕T, 1∕A}, {2∕G, 1∕T, 2∕C}, {2∕G, 1∕A, 2∕C}, {1∕T, 1∕A, 2∕C} = 𝜏 2 = D ≡ Discrete topology.
(b)
GATC
AGCT
M1 = {1∕G, 1∕A, 1∕T, 1∕C} , M2 = {1∕A, 1∕G, 1∕C, 1∕T}
M = {1∕G, 1∕A, 1∕T, 1∕C}
R = {(1∕G, 1∕A)∕1, (1∕A, 1∕G)∕1, (1∕T, 1∕C)∕1, (1∕C, 1∕T)∕1}
8 EL-SHARKASY ET AL.

Then the post-msets and pre-msets are


(1∕G)R = {1∕A}, (1∕A)R = {1∕T}, (1∕C)R = {1∕T}, (1∕T)R = {1∕C}
R(1∕G) = {1∕A}, R(1∕A) = {1∕G}, R(1∕C) = {1∕T}, R(1∕T) = {1∕C}.

The post-class and pre-class ∶ P+ = {{1∕C}, {1∕T}, {1∕G}, {1∕A}} = P− The M-bases for the P+ and P− are 𝛽 + =
{{1∕C}, {1∕T}, {1∕G}, {1∕A}, 𝜙, M} = 𝛽 − The M-topology for the P+ and P− are𝜏 1 = 𝜏 2 = D ≡Discrete topology.
It is clear that GATC AGCT contains a mutation, but it gives a discrete topology, but this is a difficult example to happen
and the length of the type is too short, if the increased length of type 4 will give discrete topology.
Theorem 4.4. If the length of the genes more than four nucleotides and does not contain any mutations, then the resulting
topology is a discrete topology.

Proof. Let M = {n1 ∕G, n2 ∕A, n2 ∕T, n1 ∕C}

R = {(n1 ∕G, n1 ∕C)∕n1 n1 , (n2 ∕A, n2 ∕T)∕n2 n2 , (n2 ∕T, n2 ∕A)∕n2 n2 , (n1 ∕C, n1 ∕T)∕n1 n1 } since the type does not have
mutation.
Then the post-msets and pre-msets are (n1 ∕G)R = {n1 ∕C}, (n2 ∕A)R = {n2 ∕T}, (n2 ∕T)R = {n2 ∕A}, (n1 ∕C)R =
{n1 ∕G}

R(n1 ∕G) = {n1 ∕C}, R(n2 ∕A) = {n2 ∕T}, R(n2 ∕T) = {n2 ∕A}, R(n1 ∕C) = {n1 ∕G}.

The post-class and pre-class are obtained from the post-msets and pre-msets P+ =
{{n1 ∕C}, {n2 ∕T}, {n1 ∕G}, {n2 ∕A}} = P− .

The M-bases obtained from the P+ and P− are


𝛽 + = {{n1 ∕C}, {n2 ∕T}, {n1 ∕G}, {n2 ∕A}, 𝜙, M} = 𝛽 − .
The M-topology generated by the P+ and P− are
𝜏 1 = 𝜏 2 = D ≡Discrete topology.

Remark 4.5. This theory can be applied to RNA by replacing the nucleotide T to the nucleotide U.

5 GENETIC D ISTANCE

El-Sharkasy and Badr27,28 study the percentage of change between the types and the similarity and differences between
them as follows:
Theorem 5.1. (El-Sharkasy and Badr27 )
|M1 ∩M2 |
Let M1 , M2 be a nonempty mset of “types,” and the distance between the types is d(M1 , M2 ) = 1− Max{|M1 |,|M2 |}
, then is
a metric space.

Theorem 5.2. (El-Sharkasy and Badr28 )


Let X be a nonempty set of “types,” which may be strings of bits, vectors, DNA or RNA sequences, … , etc, and A, B be
nonempty subsets of X, and the distance between A, B is

|cl(A) cl(B)|
𝜇(A, B) = 1 − ,
max{|cl(A)|, |cl(B)|}

which is a metric space.


For a better result in comparison, the similarities and avoid minor errors. We will define some functions that help us
1 ∑
define the correlation coefficient of these functions CAB = |A||B| xi ∈ A C(xi )C(𝑦i )and can prove it a metric function
𝑦i ∈ B
as follows:
EL-SHARKASY ET AL. 9

Proof.

1. Since A ≠ 𝜙 → |A| ≠ 0, B ≠ 𝜙 → |B| ≠ 0 and |A| ≥ CA (xi ), |B| ≥ CB (𝑦i ), then |A| |B| ≥ CA (xi )CB (𝑦i ); therefore,
0 ≤ CAB ≤ 1.
1 ∑ 1 ∑
2. CAB = |A||B| xi ∈ A C(xi )C(𝑦i ) = |B||A| 𝑦i ∈ B C(𝑦i )C(xi ) = CBA .
𝑦i ∈ B xi ∈ A
3. We have|A| ≥ CA (xi ), |B| ≥ CB (𝑦i ), |C| ≥ CC (zi ) and
1 ∑ 1 ∑
CAB = |A||B| xi ∈ A C(xi )C(𝑦i ), CAC = |A||C| xi ∈ A C(xi )C(zi ),
𝑦i ∈ B zi ∈ C
1 ∑
CBC = |C||B| z ∈ C C(zi )C(𝑦i ),
i
𝑦i ∈ B
1 ∑ 1 ∑
CBC + CAC = |A||C| xi ∈ A C (xi ) C (zi ) + |C||B| zi ∈ C C(zi )C(𝑦i )
zi ∈ C 𝑦i ∈ B
1 ∑
≥ 2 ≥ |A||B| xi ∈ A C(x i )C(𝑦 i ).Hence, C BC + C AC ≥ CAB . Then, CAB is a semimetric space.
𝑦i ∈ B

1 ∑ 2 1 ∑ 2
We can define CAA = |A|2 xi ∈A CA (xi ) CBB = |B|2 xi ∈B CB (𝑦i ) , and then it can be defined as the correlation coefficient
CAB
𝜌(A, B) = √ .
CBB .CAA

Remark 5.3. n(A) is the number of elements that exist in mset A, if the mset “types ” of genes n(A) = 4 at most.
|M1 ∩M2 |
Previous function “d(M1 , M2 ) = 1− Max{|M1 |,|M2 |}
” is useful, but you must find a function that gives the best results;
therefore, we will define function in the following theory.
Theorem 5.4. LetA, B be 2 msets drawn from set X of “types,” n(A) = n(B). The distance function (correlation coefficient)

xi ∈ A C(xi )C(𝑦i )
C 𝑦i ∈ B
between A, B can be defined as𝜌(A, B) = √ AB = √∑ ∑ , a semimetric space.
CBB .CAA 2. 2
xi ∈A (CA (xi )) xi ∈B (CB (𝑦i ))

Proof. Obvious.

Remark 5.5. We used Cauchy-Schwarz inequality.

Remark 5.6.
1. We write A = {n1 ∕G, n2 ∕A, n3 ∕T, n4 ∕C}B = {m1 ∕C, m2 ∕T, m3 ∕A, m4 ∕G}.
{ }
1 No mutation
2. 𝜌(A, B) = otherwise mutation .
3. We designed a computer program to calculate the function 𝜌(A, B) (code 1) and to determine the location of the
mutation if found.

Example 5.7. CAD2 (Arabidopsis thaliana gamma-glutamylcysteine synthetase gene)


Name: CAD2 (A thaliana gamma-glutamylcysteine synthetase gene29 )
Tair Accession: Sequence: 1005028114
GenBank Accession: AF068299
Sequence Length (bp): 5272
5′ ATCGATATGTAACACAAT … TGTATGTTTTT3′
3′ TAGCTATACATTGTGTTA … ACATACAAAAA5′
A = {1019∕G, 1543∕A, 1859∕T, 856∕C}, |A| = 5272
B = {1019∕C, 1543∕T, 1859∕A, 856∕G}, |B| = 5272
CAB = 1019×1019+1543×1543+1859×1859+856×856
5272×5272
7607827
= 5272×5272 = 0.2737
10 EL-SHARKASY ET AL.

CAA = 0.2737, CBB = 0.2737


𝜌(A, B) = 1.00, 1 − 𝜌 (M1 , M2 ) = 1 − 1.00 = 0.00.
The same gene was applied using code 1 (found in Appendix A), and the result was 𝜌(A, B) = 1.00.

Example 5.8. If we make a mutation in A thaliana gamma-glutamylcysteine synthetase gene,


A = {1014∕G, 1539∕A, 1859∕T, 860∕C}, |A| = 5272
B = {1014∕C, 1543∕T, 1859∕A, 856∕G}, |B| = 5272

𝜌(A, B) = .999
the position of mutation
[2568, 2578, 2595, 2609, 2639, 5076]
[C, T, C, C, G, C]
[T, T, C, T, A, T]
If n(A) ≠ n(B) then there exist a mutation, and in this case the function 𝜌(A, B) and statistical T test Fail to compare A
|M ∩M |
and B. The problem can be resolved here by using the function d(M1 , M2 ) = 1 − Max{ M1 , 2M } or we write the smallest
| || | ( )
1 ∑ 1 2 ∑
mset as partial whole submsets and used 𝜌∗ (A, B) or we write CAB asCAB = |A|+|B| 1≤i≤n CA (xi ) 1≤𝑗≤m CB 𝑦𝑗 ,CAA =
1 ∑ ∑ ( ) 1 ∑ ∑
C (x
2|A| 1≤i≤n A ) 1≤𝑗≤m A
C 𝑦𝑗 ,CBB = 2|B| 1≤i≤n CB (xi ) 1≤𝑗≤m CB (𝑦𝑗 ).

Proposition 5.9. Let A, B be 2 msets drawn from set X of “types,” n(A) ≠ n(B), |A| ≠ |B|. The distance function
(correlation coefficient) between A, B can be defined as

C 2 |A||B|
1. 𝜌∗ (A, B) = √ AB = |A|+|B| , a semimetric space
CBB .CAA
C
2. 𝜌∗∗ (A, B) = 1 − √ AB , a semimetric space.
CBB .CAA

Proof. Obvious.
Remark 5.10.
1. We used 𝜌∗ (A, B) to determine the deletion and addition.
2. 𝜌∗ (A, B) depended on cardinality for A, B.

Example 5.11. Comparing DNA sequences


1. GATACCCCCCGG.
2. GATACGACCCGG.
3. GATACGCCCCGG.
4. CATACGACTCGG.
5. GATAGACTCGG.
A = {3∕G, 2∕A, 1∕T, 6∕C}, |A| = 12
B = {4∕C, 1∕T, 3∕A, 4∕G}, |B| = 12
C = {5∕C, 1∕T, 2∕A, 4∕G}, |C| = 12
D = {4∕C, 2∕T, 3∕A, 3∕G}, |D| = 12
E = {3∕C, 2∕T, 3∕A, 4∕G}, |E| = 12
d(A, A) = 12∕12 = 1.00d(A, B) = 10∕12 = 0.83d(A, C) = 11∕12 = 0.916d(A, D) = 10∕12 = 0.83d(A, E) = 9∕12 = 0.75
and 𝜌(A, B) = 0.89, 𝜌(A, C) = 0.90, 𝜌(A, D) = 0.84, 𝜌(A, E) = 0.91.
But according to NCBI and NCBI BLAST websites,29 the proportion of the similarity between (A, A) = 1, (A, B) = .83,
(A, C) = .92, (A, D) = .67, and (A, E) = .75.

Example 5.12. GGGCAGUCUCCCGGCGUUUAAGGGAUCCUGAACUUCGUCG.


CUCCCAUCCAAUCAGUCCGCCUCACGGAUGGAGUUGCUCC.
M1 = {13∕G, 11∕C, 6∕A, 10∕U}, |M1 | = 40
M2 = {16∕C, 8∕G, 9∕U, 7∕A}, |M2 | = 40
M1 ∩ M2 = {8∕G, 11∕C, 6∕A, 9∕U}, |M1 ∩ M2 | = 34
EL-SHARKASY ET AL. 11

d(M1 , M2 ) = 34∕40 = 0.85.1 − d(M1 , M2 ) = 1 − 0.85 = 0.15


CM1 M2 = 13×16+11×8+6×9+10×7
40×40
429
= 40×40 420
= 40×40 = 0.2625
CM1 M1 = 0, 266, CM2 M2 = 0.281
𝜌 (M1 , M2 ) = 0.960, 1 − 𝜌 (M1 , M2 ) = 1 − 0.960 = 0.040.
In this theorem. we determine the position of mutation.
Theorem 5.13. If the nucleotide of the gene is G = 0, C = 3, A = 1, T = 2 and we can write the 5′ ...3′ of the gene as a
matrix (M) of order 1 n and 3′ ...5′ as a matrix (N) of order 1 n, then the gene has no mutation if M + N =L, where L is a
matrix of order 1 n and each element of L is equal 3. Otherwise, we can have a mutation..

Proof. Obvious.

Example 5.14. NM 000518.4 Homo sapiens hemoglobin subunit beta (HBB), mRNA.29
ACATTTGCTTCTGACACAACTGTGTTCACTAGCAACCTCAAACA
GACACCATGGTGCATCTGACTCCTGAGGAGAAGTCTGCCGTTAC
TGCCCTGTGGGGCAAGGTGAACGTGGATGAAGTTGGTGGTGAGG
CCCTGGGCAGGCTGCTGGTGGTCTACCCTTGGACCCAGAGGTTC
TTTGAGTCCTTTGGGGATCTGTCCACTCCTGATGCTGTTATGGG
CAACCCTAAGGTGAAGGCTCATGGCAAGAAAGTGCTCGGTGCCT
TTAGTGATGGCCTGGCTCACCTGGACAACCTCAAGGGCACCTTT
GCCACACTGAGTGAGCTGCACTGTGACAAGCTGCACGTGGATCCT
GAGAACTTCAGGCTCCTGGGCAACGTGCTGGTCTGTGTGCTGGCC
CATCACTTTGGCAAAGAATTCACCCCACCAGTGCAGGCTGCCTAT
CAGAAAGTGGTGGCTGGTGTGGCTAATGCCCTGGCCCACAAGTATC
ACTAAGCTCGCTTTCTTGCTGTCCAATTTCTATTAAAGGTTCCTTT
GTTCCCTAAGTCCAACTACTAAACTGGGGGATATTATGAAGGGCCT
TGAGCATCTGGATTCTGCCTAATAAAAAACATTTATTTTCATTGC
Through MATLAB and from Theorem 5.13, we can find all elements of L are equal to 3 meaning that there is no
mutation.

6 CO N C LUSION S

Topology plays an important role in applications, for example, to find the mutations of the DNA and RNA. In this paper,
we present several methods to detect the mutation by using mset, mset relation, and metric space. We designed a computer
program to calculate the function 𝜌(A, B) (code 1) to determine the location of the mutation and applied on A thaliana
gamma-glutamylcysteine synthetase gene. Developing of computer programs based on our findings will help in the early
detection of mutations and will help in the treatment of diseases. In the future, we will define the interior and closure
mset mutation (multimutation set) and determine amino acid causing the mutation.

ORCID

M. M. El-Sharkasy http://orcid.org/0000-0002-4922-9442
Mohamed S. Badr http://orcid.org/0000-0001-5826-8171

REFERENCES
1. Flapan E. When Topology Meets Chemistry: A Topological Look at Molecular Chirality. New York: Cambridge University Press; 2000.
2. Lashin EF, Kozae AM, Abo Khadra AA, Medhat T. Rough set theory for topological spaces. Int J Approx Reason. 2005;40.1-2:35-43.
3. Lashin EF, Medhat T. Topological reduction of information systems. Chaos Solitons Fractals. 2005;25.2:277-286.
4. Skowron A. On topology in information systems. Bull Pol Ac Sciences Math. 1988;36.7-8:477-479.
5. Zhu W. Topological approaches to covering rough sets. Inf Sci. 2007;177.6:1499-1508.
6. Blizard WD. Multiset theory. Notre Dame J Formal Log. 1988;30.1:36-66.
7. Yager RR. On the theory of bags. Int J Gen Syst. 1986;13.1:23-37.
12 EL-SHARKASY ET AL.

8. Chakrabarty K. Bags with interval counts. Found Comput Decis Sci. 2000;25.1:23-36.
9. Girish KP, John SJ. Relations and functions in multiset context. Inf Sci. 2009;179.6:758-768.
10. Girish KP, Sunil JJ. General relations between partially ordered multisets and their chains and antichains. Math Commun.
2009;14.2:193-205.
11. Girish KP, John SJ. Rough multisets and information multisystems. Adv Decis Sci. 2011:495392.
12. Girish KP, John SJ. Rough multiset and its multiset topology. Trans Rough Sets. 2011;14:62-80.
13. Girish KP, John SJ. Multiset topologies induced by multiset relations. Inform Sci. 2012;188:298-313.
14. Girish KP, Jacob JS. On multiset topologies. Theory Appl Math Comput Sci. 2012;2.1:37-52.
15. Pawlak LZ. Some issues on rough sets. Transactions on Rough Sets. 2004;I 3100:1-58.
16. Chakrabarty K, Gedeon T. On bags and rough bags. In: Proceedings Fourth Joint Conference on Information Sciences, Vol. 1; 1998; North
Carolina, USA:60-63.
17. Freese E. The specific mutagenic effect of base analogues on phage T4. J Mol Biol. 1959;1.2:87-105.
18. Nirenberg MW, Heinrich Matthaei J. The dependence of cell-free protein synthesis in E. coli upon naturally occurring or synthetic
polyribonucleotides. Proc Natl Acad Sci. 1961;47.10:1588-1602.
19. Jena SP, Ghosh SK, Tripathy BK. On the theory of bags and lists. Inf Sci. 2001;132.1:241-254.
20. Freese E. The difference between spontaneous and base-analogue induced mutations of phage T4. In: Proceedings of the National Academy
of Sciences, Vol. 45.4; 1959; USA. 622-633.
21. Gontier N. Reticulated evolution: symbiogenesis, lateral gene transfer, hybridization and infectious heredity; 2015.
22. Kerwin SM. Nucleic Acids: Structures, Properties, and Functions By Victor A. Bloomfield, Donald M. Crothers, and Ignacio Tinoco, Jr.,
with contributions from John E. Hearst, David E. Wemmer, Peter A. Kollman, and Douglas H. Turner. University Science Books, Sausalito,
CA. 2000. ix+ 794 pp. 17 25 cm. ISBN 0-935702-49-0; 2000:4721-4722.
23. Andrulis IL et al. Comparison of DNA-and RNA-based methods for detection of truncating BRCA1 mutations. Hum Mutat.
2002;20.1:65-73.
24. Crick F, Anderson PW. What mad pursuit: a personal view of scientific discovery. Phys Today. 1989:42-68.
25. Manso AMR, Correia LMP. A multiset genetic algorithm for real coded problems. In: Proceedings of the 13th Annual Conference
Companion on Genetic and Evolutionary Computation. New York: ACM; 2011:153-154.
26. Schaffer J, Eshelman LJ. David On crossover as an evolutionarily viable strategy. ICGA. 1991;91:61-68.
27. El-Sharkasy MM, Badr MS. Topological spaces via phenotype—genotype spaces. Int J Biomath. 2016;9.04:1650054.
28. El-Sharkasy MM, Badr MS. Modeling DNA&RNA mutation using multiset and topology. IJB-D-17-00111, preprint.
29. https://www.ncbi.nlm.nih.gov/. Accessed on October 2017.

How to cite this article: El-Sharkasy MM, Fouda WM, Badr MS. Multiset topology via DNA and RNA mutation.
Math Meth Appl Sci. 2018;1–13. https://doi.org/10.1002/mma.4764

APPENDIX A: COD1
EL-SHARKASY ET AL. 13

You might also like