Bioinformatics For Mycobacterium Tuberculosis

Chapter One
Introduction
1.0 Background of the study
Safeguarding our body's well-being should stand as a top priority in our pursuit of good health,
To achieve this, it's crucial to possess an understanding of how vulnerable we are to bacteria,
parasites, and disease-causing organisms. Mycobacterium tuberculosis is an unfriendly species of
bacteria that causes tuberculosis, as stated by (world life expectancy 2023) Based on the latest
data from the WHO, tuberculosis claimed the lives of 127,335 individuals in Nigeria during
2020, constituting approximately 8.60% of all reported deaths. On a global scale, Nigeria holds
the sixth position with a death rate of 99.13 per 100,000 people, after adjusting for age-related
factors. According to (Center For Disease Control and Prevention 2023) Claiming 1.6 million
lives annually, tuberculosis (TB) remains among the most lethal infectious diseases globally.
(Bakula et al (2022)) In 2019, Nigeria secures a spot among the leading trio. Predicting 440,000
new TB infections and 154,000 TB-related fatalities, Nigeria further stands out with rates
approaching approximately 11 and 23 per 100,000 individuals, signifying one of the most
substantial prevalence rates for MDR/RR TB and TB/HIV coinfection worldwide. (Poonam
2023) A perilous infection, typically affecting the lungs, is tuberculosis (TB). It can also spread
to other bodily regions such as the brain and spine. This infection is triggered because of the
bacteria Mycobacterium tuberculosis. This specific bacterium is believed to have existed for over
three million years. Even ancient Greece and Rome had knowledge of this ailment. At the onset
of the 1900s, tuberculosis, formerly known as consumption, stood as the primary reason for
mortality in the US. Despite the effective management of tuberculosis (TB) today, over a million
1
people still succumb to it annually within the US. Mycobacterium tuberculosis is a medically and
scientifically significant bacterium due to its role in causing tuberculosis and its impact on global
public health. Researchers and healthcare professionals continue to study and combat this
bacterium to improve diagnosis, treatment, and prevention strategies for tuberculosis.
Researchers use various approaches, including molecular biology, genomics, and bioinformatics,
to study the bacterium's genetics, drug resistance mechanisms, and interactions with the human
immune system. (leverage edu 2023) Bioinformatics finds its applications in medicine across a
spectrum, spanning from drug and gene exploration to safeguarding against diseases. The
contribution of bioinformatics researchers in pharmaceutical advancement, particularly in
studying infectious ailments, is paramount. Moreover, bioinformatics strides forward in shaping
personalized medicine, providing fresh perspectives into crafting medicines attuned to an
individual's genetic makeup. This amalgamation of computer science, statistics, and biology
spread its influence across diverse domains, encompassing agricultural progress and veterinary
investigation. In this study, our contribution will extend to advancing the health sector through
the utilization of Machine Learning and bioinformatics in the realm of drug discovery. The
fusion of bioinformatics and Artificial Intelligence (AI), particularly Machine Learning (ML),
holds great promise in revolutionizing the healthcare landscape. This synergistic combination
enables the extraction of meaningful insights from vast and complex biological data, ultimately
leading to more accurate diagnoses, effective treatments, and enhanced personalized care.
2
1.1 Aims and Objectives
This project aims to employ bioinformatics and machine learning techniques to expedite the
discovery of novel drug candidates targeting Mycobacterium tuberculosis, with the overall goal
of contributing to more effective treatment strategies against tuberculosis.
The objectives of the study are:
This study aims to achieve a multifaceted set of objectives: delving into the existing literature on
bioinformatics, Machine learning applications, and Mycobacterium tuberculosis; conducting a
thorough comparative analysis of a minimum of three regression algorithms to identify the most
suitable solution for aiding drug discovery; implementing the selected algorithm that best aligns
with the specific problem; and rigorously evaluating the model's performance and results.
1.2 Statement of the problem
In the world of medical research and treatments, tuberculosis (TB) caused by Mycobacterium
tuberculosis is still a big problem. Some types of TB bacteria have become resistant to drugs,
making it urgent to find new ways to treat it. The usual methods of finding new drugs are slow
and cost a lot, which means we don't have many good drugs available. That's why it's really
important to find new, smarter ways to use computers and data to discover drugs that can fight
Mycobacterium tuberculosis better. The TB bacteria can change a lot, and this makes it hard to
find drugs that work against all of them. Trying to find the right targets for drugs and making
compounds that can attack different types of bacteria is tough. When scientists try to make new
drugs the old way, it takes a lot of time and money to test a large number of compounds in the
3
lab. This is a big challenge because it costs a lot and takes a long time. The way potential drugs
interact with the TB bacteria is very complicated. It's like trying to figure out how puzzle pieces
fit together, and the bacteria have many different shapes. This makes it hard to make computer
models that can predict how the drugs and the bacteria will interact. To deal with these problems,
scientists are turning to using computers and Ml techniques to help. To speed up the process of
finding new drugs that can work against different kinds of TB bacteria. By using computers to
make predictions about which compounds might be the best, scientists can save time and money.
1.3 Scope of the study
The scope of this project is to utilize bioinformatics and machine learning (ML) techniques to
determine the effectiveness of drugs against Mycobacterium tuberculosis. The primary goal is to
enhance our understanding of the drug discovery process targeted at inhibiting the bacterium's
growth.
1.4 Significant/Justification of the Study
The significance of this study resides in its potential to reshape the course of TB drug discovery.
By revolutionizing the identification of effective drug candidates, it directly addresses the
pressing requirement for improved TB treatments. This study bridges the gap between traditional
laboratory methods and contemporary computational approaches. The integration of
bioinformatics and machine learning paves the way for interdisciplinary collaboration and
stimulates innovation in drug discovery methodologies. By utilizing computational techniques,
we gain the ability to swiftly identify potential drugs with the right potency for treating
Tuberculosis. This study is of paramount importance given the urgency for enhanced TB
4
treatments, particularly against drug-resistant strains. Also, the methods to use can also help find
treatments for other diseases, and in making the whole process of finding new drugs better and
faster. Ultimately, it's about improving healthcare and finding cures more quickly.
1.5 Methodology Overview
To acquire insights into Plasmodium falciparum, regression algorithms, machine learning, and
bioinformatics, a comprehensive literature review will be conducted. A minimum of three
regression methods will undergo comparison to identify the most suitable approach for drug
discovery challenges. The dataset for constructing the model will be sourced from publicly
available online repositories like the chemBL database. The chosen model will be implemented
using Python and its associated machine-learning libraries. An in-depth evaluation of the model's
performance will be conducted. Subsequently, the predictive model will be seamlessly integrated
for user interaction within a web application developed using Streamlit—a dedicated Python
package designed for crafting machine learning applications. The entire workflow, encompassing
methodologies, outcomes, and notable discoveries, will be meticulously documented in the final
report of the project.
1.6 Definition Of Terms
 Mycobacterium tuberculosis: A type of bacterium that causes tuberculosis (TB), an
infectious disease primarily affecting the lungs.
5
 MDR/RR TB: Multi-Drug Resistant/Resistant to Rifampicin Tuberculosis, indicating
strains of TB bacteria that have developed resistance to commonly used antibiotics.
 TB/HIV coinfection: The presence of both tuberculosis and human immunodeficiency
virus (HIV) infections in an individual.
 Bioinformatics: The field that brings together biology, computer science, and statistics to
manage and analyze biological data, often used in genomics and drug discovery.
 Personalized Medicine: Medical treatment customized for an individual based on their
genetic information, medical history, and other relevant factors.
 Genomic Medicine: A medical domain employing genomic data to guide patient care,
including disease diagnosis, treatment selection, and prevention strategies.
 Drug Repurposing: The procedure of identifying new applications for pre-existing
medications, often leveraging computational methods to discover alternative uses.
 Public Health Management: The practice of safeguarding and enhancing community
health through coordinated efforts of both public and private organizations.
 ChemBL Database: A publicly available database containing information about
bioactive molecules, their properties, and their targets.
6
Chapter Two
Literature review
2.0 Introduction
The field of bioinformatics encompasses the storage, retrieval, and analysis of extensive
biological data. It involves a diverse range of experts, including biologists, molecular life
scientists, computer scientists, and mathematicians, working collaboratively in this highly
interdisciplinary domain (EMBL-EBI 2023). Computers play a vital role in bioinformatics,
helping manage the vast amounts of data scientists can now extract from living organisms. This
data can range from the simplicity of a single cell to the complexity of a person's immune
system. Bioinformatics is paving the way for future personalized medicine, enabling researchers
to decode the human genome, gain a comprehensive understanding of biological systems,
develop innovative biotechnologies, and refine legal and forensic tools. Enormous datasets
harbor patterns that might be challenging or even impossible for humans to manually uncover.
This capability is facilitated by advanced observation tools and increasingly powerful computers,
which work hand in hand to make this analysis feasible (PNNL 2023).
2.1 Importance of Bioinformatics
Bioinformatics has opened doors to research in previously uncharted domains through effective
data management and analysis. (PNNL 2023)By harnessing bioinformatics tools, the value of
historical data is significantly amplified, enabling researchers to sift through extensive datasets
from various studies and unveil novel correlations. These tools facilitate the synthesis of
information collected globally, even from sources contributed by researchers with limited prior
7
familiarity. Moreover, these tools have the potential to enhance ongoing research endeavors.
Conducting in silico experiments is relatively straightforward, allowing researchers to fine-tune
their experimental designs. Data analysis aids in selecting targets for exploration and determining
the necessary sample sizes to achieve statistically significant findings. (Matti and LLOYD 2023)
Through the utilization of bioinformatics, the enigmas of life can be unraveled. This
interdisciplinary realm fuses biology, computer science, and statistics to delve into the mysteries
of biological systems. By amalgamating data from diverse origins, bioinformatics offers a
comprehensive outlook on biological processes. This process aids in pinpointing new targets for
drug development and refining disease diagnosis and treatment strategies. The intricate
interactions within living systems can be illuminated by scrutinizing vast biological datasets,
which would be a formidable task if done manually. The dynamic field of bioinformatics has the
potential to revolutionize healthcare and agriculture by unveiling the mysteries of life and
uncovering innovative avenues to enhance human health and well-being.
2.1.2 Application areas of Bioinformatics
(Matti and Lloyd 2023) Bioinformatics finds application in various domains such as genomes,
proteomics, drug development, and personalized medicine. It plays a pivotal role in identifying
genes responsible for diseases, predicting potential adverse effects of therapies, and designing
new drugs with enhanced effectiveness and precision. Some of its application areas are ;
 Genomics
 Proteomics
 Drug Discovery
 Personalized Medicine
8
 Evolutionary Biology
 Agriculture
 Comparative Genomics
 Structural Bioinformatics
 Functional Genomics
 Metagenomics
 Systems Biology
 Transcriptomics
 Phylogenetics
 Medical Informatics
2.2 Bioinformatics in drug discovery
(Ioannis 2022 ) Bioinformatics applications such as gene sequencing, genetic statistics, and gene
expression level measurements have significantly enhanced the dosage response, toxicity
profiles, and overall effectiveness of medications aimed at treating a range of genetic disorders.
The drug development journey, spanning from discovery to final approval, is both prolonged and
financially demanding. Bioinformatics aids in expediting and streamlining the target discovery
and validation phases, ultimately bolstering the efficiency and cost-effectiveness of the approval
process by increasing the number of successful drug candidates. (Matt and Llyod 2023) Drug
discovery has historically been a challenging and intricate journey, often spanning years and
requiring investments of billions of dollars. However, recent advancements in bioinformatics
have brought about a fundamental shift in how we approach the quest for new medicines.
Through the utilization of state-of-the-art computational techniques to identify potential drug
9
candidates and predict their effectiveness and safety, bioinformatics has revolutionized the
landscape of drug discovery. With cutting-edge tools like molecular docking and virtual
screening, researchers can now sift through vast volumes of biological data to uncover the
intricate interactions between pharmaceuticals and biological systems. By amalgamating
information from diverse sources, bioinformatics provides a comprehensive view of therapeutic
targets, paving the way for the development of novel medications with enhanced specificity and
efficacy.
2.3 Tuberculosis
According to (Mayo Clinic 2023) Tuberculosis (TB) is a dangerous disease primarily affecting
the lungs. It is caused by a specific type of bacteria. When an individual with tuberculosis
coughs, sneezes, or even talks, the disease can spread. This occurs through tiny droplets
containing the bacteria that are released into the air. Another person can inhale these droplets,
leading to the entry of the bacteria into their lungs. Tuberculosis can spread rapidly in places
where people gather or live in crowded conditions. The risk of contracting tuberculosis is notably
higher among those with conditions such as HIV/AIDS and other immune system disorders,
compared to healthy individuals. Treatment for TB involves the use of antibiotic medications.
However, there are strains of the bacteria that have become resistant to these treatments. (WHO
2023). It is estimated that approximately a quarter of the global population has been infected by
the TB bacteria. However, only around 5-10% of those who contract the infection will
subsequently exhibit symptoms and progress to the active disease stage. (Mary 2022) The
disease has existed for most of human history and has at times posed a significant threat. In fact,
tuberculosis can be traced back over 5,000 years to ancient Egypt. Moreover, references to TB
10
are found in the biblical books of Deuteronomy and Leviticus, using the Hebrew term
"schachepheth," while Hippocrates mentions it in his writings as "phthisis." It's likely that more
people have succumbed to M. tuberculosis than any other pathogen. Throughout the 18th and
19th centuries, tuberculosis was rampant in industrialized regions of Europe and North America,
earning it the name "consumption."
2.3.1 Causes of Tuberculosis
According to (Cleveland clinic 2023 ) The bacterium Mycobacterium tuberculosis is the
causative agent of tuberculosis (TB). These bacteria are airborne and primarily target the lungs,
although they can also impact other areas of the body. TB is contagious, but its transmission is
not rapid. Generally, prolonged close contact with an infectious individual is required to contract
the disease. (WHO 2023) TB predominantly affects the respiratory system, particularly the lungs.
The transmission of this disease occurs when infected individuals cough, sneeze, or spit,
releasing the bacteria into the air.
2.3.2 Stages and Symptoms of Tuberculosis
According to (Mayo Clinic 2023) a good description of the stages and symptoms was discussed,
for a better understanding of how this terrible unfriendly disease showcases itself in its different
stages. A TB infection arises when tuberculosis (TB) bacteria persist and multiply in the lungs.
There are three stages of TB infection, each characterized by specific symptoms.
a) initial TB infection. During this stage, immune system cells identify and capture the
bacteria. While the immune system can effectively eliminate most of the bacteria, some
11
may remain and reproduce.In general, a primary infection doesn't exhibit noticeable
symptoms. However, a few individuals might experience flu-like symptoms, including:
 Mild fever.
 Fatigue.
 Coughing.
b) Latent tuberculosis, the stage that usually ensues after the primary infection, involves
immune system cells enclosing lung tissue containing TB bacteria with a protective
barrier. While the immune system successfully contains the bacteria, preventing further
damage, the bacteria remain present. During latent TB, no symptoms are apparent.
c) Active tuberculosis develops when the immune system is unable to suppress an
infection, allowing the illness to spread throughout the body, including the lungs. Active
TB can manifest shortly after the initial infection or may arise from a latent TB infection
that has persisted for months or years. Symptoms of lung active Tuberculosis typically
begin mildly and worsen over several weeks, possibly including:
 Coughing.
 Coughing up blood or mucus.
 Chest pain.
 Discomfort while breathing or coughing.
 Fever.
 Chills.
 Night sweats.
12
 Weight loss.
 Loss of appetite.
 Fatigue.
 Overall feeling of unwellness.
d) Extrapulmonary tuberculosis refers to the spread of TB infection beyond the lungs,
affecting various parts of the body. Symptoms can vary based on the infected area.
Common signs may encompass:
 Fever.
 Chills.
 Night sweats.
 Weight loss.
 Loss of appetite.
 Fatigue.
 Overall feeling of unwellness.
 Pain near the site of infection.
Active TB disease in children can manifest with varying symptoms based on age:
Teenagers: Symptoms resemble those seen in adults.
Children aged 1 to 12 years: Younger children might experience persistent fever and weight loss.
Infants: Infants may not gain weight as expected. They might also show signs of enlarged fluid
around the brain or spinal cord, such as lethargy, fussiness, vomiting, poor nutrition, delayed
responses, and a soft patch on the skull that has expanded. (Mayo Clinic 2023)
13
2.4 Life cycle of Mycobacterium tuberculosis
According to ( Lerm & Netea, 2016) they gave an intensive breakdown on the lifecycle of the
bacteria and a proper understanding of its functions in its life span.
Fig 2.0 Lifecycle of Mycobacterium tuberculosis
Alveolar macrophages serve as the primary target cells for M. tuberculosis infection. When
inhaled particles carrying the infection enter the alveolar space, these cells encounter the
pathogen and engulf it. To counteract the microbicidal activities of these cells, M. tuberculosis
employs various mechanisms, including phagosomal acidification, activation of proteolytic
enzymes within acidified phagolysosomes, production of antimicrobial peptides, and generation
of reactive oxygen and nitrogen metabolites. Consequently, if the macrophage defense is
ineffective, M. tuberculosis establishes itself within an intracellular niche. Initially, it resides in
the phagosome with diminished antimicrobial capability, and subsequently, it enters the cytosol
through the membrane-damaging toxin known as early secreted antigenic target (ESAT)-6.
14
( Lerm & Netea, 2016) Within the cytosol, mycobacteria replicate efficiently before the primary
host cell is eventually killed, a process in which ESAT-6 plays a significant role. The dying cell-
induced inflammation prompts the recruitment of additional monocytes/macrophages to the
infection site, further propagating the infection paradoxically. Interestingly, these cells can aid in
the propagation of theinfection within the tissue. The initial formation of granulomas also relies
on ESAT-6 and is likely tied to the inflammation induced by toxin-triggered necrosis. As the
inflammatory process advances, macrophage clusters termed granulomas emerge. Lymphocytes,
particularly CD4+ T helper type 1 (Th1) cells and CD8+ cytolytic T cells (CTLs), are attracted to
the developing granuloma, encircling the infected macrophages. Eventually, the inner core of the
granuloma undergoes necrosis, leading to its rupture. The life cycle of M. tuberculosis
culminates with the drainage of viable bacilli into the alveolar space, where coughing and
aerosol production facilitate the pathogen's dissemination to other individuals ( Lerm & Netea,
2016).
2.5 Machine Learning
(Flam, 2022) Machine learning algorithms provide significant benefits to the healthcare sector
by aiding in the interpretation of extensive volumes of healthcare data generated daily through
electronic health records. By utilizing machine learning techniques and algorithms, patterns and
insights within medical data can be discovered, surpassing the capabilities of manual
identification. The growing integration of machine learning in healthcare enables healthcare
providers to adopt a predictive approach to precision medicine. This transition has the potential
to establish a more cohesive system, characterized by improved treatment administration,
enhanced patient outcomes, and streamlined patient-centered operations. According to (Predik,
15
2023), "Machine learning," an emerging domain within computer science, employs algorithms to
empower computers with the ability to learn and autonomously make decisions. The realm of
machine learning (ML) offers the potential to enhance various aspects of businesses and is
currently demonstrating remarkable achievements across multiple industries. With the surge in
data volumes reaching unprecedented levels, ML is playing an increasingly pivotal role in the
corporate landscape. For instance, it assists enterprises in extracting insights and value from their
data.
2.5.1 Machine learning in Drug discovery
As stated by (Xia, 2017), the utilization of data-driven machine learning (ML) applications has
witnessed substantial growth in recent years, becoming an essential tool in the initial stages of
drug discovery. This rising trend is attributed to several factors, including the swift accumulation
of relevant experimental data from sources like DrugBank, ChEMBL, PDB, PubChem, and
PDBbind. Furthermore, the advancement of contemporary ML techniques, libraries, and the
availability of cost-effective computing power have contributed to this phenomenon. Virtually
every phase of the Small Molecule Drug Discovery and Development (SBDR) pipeline, along
with subsequent stages, can reap benefits from the integration of ML algorithms. These
algorithms can be applied to a range of tasks, encompassing drug screening, target screening,
prediction of target structures and binding sites, lead optimization, anticipation of drug-drug
interactions, and projection of ADMET (Absorption, Distribution, Metabolism, Excretion,
Toxicity) properties. Rather than directly computing properties based on a physics-based
understanding, several ML approaches, notably, aim to extract insights from existing data
(Wooller et al., 2017).
16
In the field of bioinformatics, methods hold the potential to identify the underlying causes of
cancer in individual patients, offering the opportunity to tailor cancer therapy to a more
personalized level. This avenue holds the promise of developing novel and repurposed
medications targeting specific proteins, thereby selectively eliminating or incapacitating only the
affected cells. Additionally, bioinformatics plays a pivotal role in the domain of translational
drug discovery for infectious diseases. For instance, distinct gene expression patterns are
triggered within cells due to the presence of bacterial or viral infections. By comparing these
genetic profiles with those associated with other disorders and influenced by pharmaceutical
interventions, there exists the potential to repurpose existing drugs (Wooller et al., 2017).
2.5.2 Types of Machine Learning
Fig 2.1 Types of Machine Learning
1. Supervised Learning:
Training an algorithm using labeled data to make accurate predictions or classifications on
new, unseen data.
2. Unsupervised Learning:
Finding patterns and relationships in data without labeled outcomes is often used for
clustering or dimensionality reduction.
17
3. Semi-Supervised Learning:
Combining labeled and unlabeled data to enhance model accuracy, useful when obtaining
fully labeled datasets is challenging.
4. Reinforcement Learning:
Training algorithms to make decisions in an environment to maximize rewards, and
learning through experimentation and feedback.
2.5.3 Supervised Machine Learning
Fig 2.2 Supervised Machine learning workflow
18
(Gillis 2023) A technique for constructing artificial intelligence (AI), termed supervised
learning, entails training a computer system using labeled input data to predict a particular
output. The model is iteratively trained until it can discern the inherent relationships and patterns
connecting the input and output labels. This empowers the model to generate accurate
predictions when presented with previously unseen data. (IBM 2023) Supervised machine
learning distinguishes itself by the method it employs to train computers for accurately
classifying data or predicting outcomes using labeled datasets. The model adjusts its weights as
input data is fed into it, ensuring proper fitting during the cross-validation process. Applications
like segregating spam emails into separate folders from regular emails exemplify how supervised
learning assists enterprises in finding scalable solutions to various real-world issues.
In supervised learning, a training set is employed to guide models in generating desired
outcomes. This training dataset comprises both appropriate inputs and corresponding outputs,
enabling the model's progressive improvement. The loss function serves as a metric for the
algorithm's accuracy, and iterations are executed until the error is sufficiently minimized. In the
context of data mining, supervised learning can be categorized into two main types: regression
and classification.
When applying data mining techniques, supervised learning can be categorized into two main
types: regression and classification.
 Classification: This approach employs algorithms to accurately categorize test data into
distinct classes. It identifies specific entities within the dataset and strives to determine
their appropriate labels. Commonly used classification techniques encompass decision
19
trees, k-nearest neighbors, random forests, support vector machines (SVM), linear
classifiers, and SVMs (IBM 2023).
 Regression: Regression is employed to understand the relationship between dependent
and independent variables. It is often utilized to generate estimates, such as predicting a
company's sales revenue. Well-known regression algorithms include linear regression,
logistic regression, and polynomial regression."
2.5.4 Regression Algorithm
When the output variable possesses a real or continuous value, regression is employed. In this
scenario, a change in one variable corresponds to a change in another, as there exists a
connection between two or more variables. For instance, examples include salary linked to
employment history or weight influenced by height.
.(Mathworks 2023) To forecast continuous outcomes, like the values of financial assets or
challenging-to-quantify physical attributes such as battery state-of-charge or grid load. Usual
applications encompass algorithmic trading, virtual sensing, and electrical load prediction.
When dealing with a dataset spanning a range of values or if your response falls within the
domain of real numbers, such as temperature or the time until equipment failure, employ
regression methods. Presented below are some of the prevalent algorithms for conducting
regression.
20
Fig 2.3 Regression analysis image
2.6 Related Works
(Yang et. al(2019)) The authors employed an extensive dataset spanning 16 countries and six
continents, encompassing tuberculosis patients. This dataset incorporated whole-genome MTB
isolate sequences and corresponding drug susceptibility testing outcomes. Their introduced
model, DeepAMR (Deep Autoencoder for Multiple Drug Resistance Classification), and its
clustering variant, DeepAMR_cluster, were aimed at multi-drug classification and latent data
space clustering, respectively. The study showcased DeepAMR's superiority over baseline and
other models, achieving impressive mean AUROC scores (94.4% to 98.7%) for predicting
resistance to four primary drugs, MDR-TB, and PANS-TB. DeepAMR excelled in sensitivity as
well, with best rates seen for isoniazid (94.3%), ethambutol (91.5%), pyrazinamide (87.3%), and
MDR-TB (96.3%). However, some cases showed slightly lower sensitivity compared to baseline,
such as rifampicin and PANS-TB, with 0.7% and 1.9% reduction, respectively. The study also
21
detailed cross-resistance patterns, notably between INH and RIF, and examined multi-label vs.
single-label models, with DeepAMR's success attributed to abstract data use and non-linear
reduction. Although predicting resistance for specific drugs (e.g., EMB and PZA) posed
challenges, DeepAMR demonstrated significant improvement for these cases.( Nagamani &
Sastry 2021) study focused on addressing the challenges posed by drug-resistant strains of
Mycobacterium tuberculosis (M.tb) through the application of machine learning models and
computational drug repurposing. The authors highlight the urgent need for novel antitubercular
drugs due to the rapid evolution of drug-resistant M.tb strains. They point out that factors like
genetic mutations, the complex cell wall system of M.tb, and transporter systems contribute to
the ineffectiveness of many small molecules in arresting M.tb cell growth.The study's objective
was to overcome the permeability barriers of M.tb by developing machine learning models that
can distinguish between permeable and impermeable compounds. The authors utilized enzyme-
based (IC50) and cell-based (minimal inhibitory concentration) data to classify compounds based
on their permeability. The XGBoost machine learning model emerged as the top performer
compared to other algorithms like random forest, support vector machine, and naive Bayes. (
Deelder et. al(2019)) research employs a machine learning-driven methodology to address the
critical challenge of drug resistance prediction in Mycobacterium tuberculosis (M.tb). The
authors worked with an extensive dataset of 16,688 M.tb isolates, each having undergone whole-
genome sequencing (WGS) and laboratory drug-susceptibility testing (DST) across 14
antituberculosis drugs. A substantial portion of the samples demonstrated multidrug-resistant and
extensively drug-resistant profiles, underlining the significance of their study. the authors
employed advanced non-parametric classification-tree and gradient-boosted-tree models to
predict drug resistance and identify potential associated mutations. This approach allowed the
22
creation of separate models for each drug, and they also considered the influence of "co-
occurrent resistance" markers, known to cause resistance to drugs other than the one under
consideration. Predictive performance was meticulously evaluated using sensitivity, specificity,
and the area under the receiver operating characteristic curve, with DST outcomes serving as the
benchmark for evaluation. Notably, their models demonstrated particularly high accuracy in
predicting resistance to first-line drugs and several second-line drugs, with the area under the
receiver operating characteristic curve exceeding 96%. However, the performance was
comparatively lower for certain third-line drugs. The inclusion of co-occurrent resistance
markers notably enhanced the predictive capabilities of some drugs, leading to superior
outcomes compared to similar models used in other large-scale studies.
(Hadikurniawati et. al(2021)) The authors compare the performance of ML models using 10-fold
cross-validation. Contrary to previous research that favored certain models over others, this study
finds that the best-performing model is data-specific. This conclusion aligns with observations
from other studies. Nevertheless, the research achieves better results compared to recent studies
and notes that certain methods, particularly Logistic Regression and MD-WDNN, exhibit similar
performance levels. Additional parameter tuning is conducted using the scikit-learn library in
Python, reinforcing the data-specific nature of model performance. They further underscore the
successful application of ML techniques to predict MTB drug resistance based on DNA data.
With an impressive accuracy rate of up to 99% and high Area Under Curve (AUC) values, the
ML approach holds promise in tuberculosis drug resistance prediction. The study emphasizes the
data-specific nature of model performance and highlights the potential for slight improvements
through parameter tuning. Overall, the research contributes to the growing body of knowledge on
employing ML for tuberculosis drug resistance prediction, showcasing its potential as a valuable
23
tool in medical research and diagnostics. (Ye et. al(2021)) presents a research endeavor focused
on addressing the challenge of drug-resistant tuberculosis (TB) caused by Mycobacterium
tuberculosis (Mtb), a leading global cause of mortality. The emergence of extensively drug-
resistant TB has underscored the necessity for novel drug candidates. This study leverages
various machine learning (ML) algorithms, including support vector machine, random forest
(RF), extreme gradient boosting (XGBoost), and deep neural networks (DNN), to construct
classification models that distinguish Mtb inhibitors from non-inhibitors.The outcomes reveal
that the XGBoost model displays the most robust predictive performance. To enhance accuracy
further, two consensus strategies are employed by integrating predictions from multiple models.
The stacking model that combines predictions from RF, XGBoost, and DNN offers the highest
accuracy, with an area under the receiver operating characteristic curve (AUC) of 0.842 for the
10-fold cross-validated training set and 0.942 for the external test set. The authors also explore
the relationship between important molecular descriptors and bioactivities using the Shapley
additive explanations method.( Radchenko et. al (2023)) The authors establish a well-structured
foundation for their modeling approach, emphasizing the significance of a diverse dataset. They
draw upon publicly available data to create a dataset containing both target-based and cell-based
assay results. Their preprocessing methods and dataset preparation are thorough, reflecting the
complexities and challenges of working with large, heterogeneous datasets. The utilization of
fragmental descriptors and neural networks for modeling is well-justified, considering their prior
success in other QSAR and QSPR applications. The architecture of the neural network, with its
integration of feed-forward back-propagation and double cross-validation, demonstrates careful
design for robustness and validation. The presentation of their modeling process is detailed and
demonstrates a systematic exploration of hyperparameters, leading to a refined model with
24
enhanced predictivity. Comparisons with other models in the literature add to the paper's
credibility. (Deelder et. al (2022)) The authors address the growing concern of drug-resistant
Mycobacterium tuberculosis complicating the treatment and control of tuberculosis. They
emphasize the importance of incorporating whole genome sequencing and machine learning
techniques to predict drug resistance and identify genetic mutations associated with M.
tuberculosis. However, they highlight the limitations of applying generic machine-learning
approaches without tailoring them to the specific context of tuberculosis. To address these
challenges, the authors introduce a novel machine-learning approach, Treesist-TB, designed
specifically for tuberculosis. This approach focuses on extracting and analyzing genomic variants
across multiple studies to enhance genotypic profiling. The authors applied Treesist-TB to
predict drug resistance for well-known drugs like rifampicin, isoniazid, and ethambutol,
achieving predictive accuracy comparable to existing tools like TB-Profiler. (Hrizi et. al(2022))
The study's techniques are applied to computed tomography (CT) scans of TB patients, with a
division into training and testing sets. Feature extraction using the spatial gray-level dependence
method (SGLDM) is conducted. The results are presented in terms of hyper-parameter and
feature selection. The study employs Python, utilizing an RTX 2060 Graphics Card and 16 GB of
RAM. The ImageCLEF 2020 dataset is used, employing multi-label classification for lung
conditions. The metric of interest is accuracy. Experiments involve a range of machine learning
methods, with Sklearn as the toolkit for comparison. SVM hyper-parameter selection employs a
genetic algorithm, focusing on radial basis function (RBF) kernel performance. Performance is
compared with known classifiers like KNN, CART, NB, LDA, and RF.Focusing on tuberculosis
(TB), the study proposes an optimized machine learning-based model that extracts optimal
texture features from TB-related images and simultaneously fine-tunes classifier hyper-
25
parameters. The overarching objectives are to improve accuracy and reduce the number of
extracted characteristics, framed as a multitask optimization challenge. The proposed approach
involves a genetic algorithm (GA) for feature selection followed by a support vector machine
(SVM) classifier. Experimental results, using the ImageCLEF 2020 dataset, demonstrate
improved accuracy and outperforming state-of-the-art methods through the enhanced approach.
(Kuang et. al (2022)) concisely introduces the research's motivation, its novel deep learning
approach, and the cohort used for AMR prediction. The results section effectively presents the
data analysis, feature selection, training, and validation processes. The comparison of model
performance with a rule-based method provides clear insights. They further delve into the results,
emphasizing the substantial increase in F1-score achieved by the best ML classifiers compared to
the rule-based Mykrobe predictor. The performance of the 1D CNN model is slightly superior to
traditional ML algorithms, despite its higher computational resource requirements during
training. The impact of feature selection on reducing resource demands is noted. The potential
for hyperparameter optimization and the inclusion of novel variants for improved model
performance is discussed. The importance of managing imbalanced classes and the consideration
of sensitivity and specificity in clinical settings is acknowledged. The potential extension of the
model to bacteria with plasmid-mediated resistance is examined, along with the importance of
diverse datasets in managing overfitting. The study's reliance on the F1-score metric is discussed,
along with the introduction of the G-mean metric. Automation of the entire process into a
flexible pipeline is highlighted, enabling easy adaptation and expansion of the models for other
antibiotics and bacteria. The overall focus on accurate AMR prediction is emphasized. (Jamal et.
al (2020)) presents a computational framework that utilizes artificial intelligence (AI) and
machine learning (ML) methods to predict multi-drug resistance associated mutations in
26
Mycobacterium tuberculosis (M.tb) using high-throughput sequencing data. The authors focus on
specific genes related to drug resistance and utilize various ML algorithms to build prediction
models. The study includes dataset preparation, model evaluation, and the impact analysis of
predicted mutations on protein stability. They indicate the successful development of prediction
models for several genes associated with drug resistance, including rpoB, inhA, katG, pncA,
gyrA, and gyrB. The models exhibit good accuracy in predicting the susceptibility or resistance
of mutations, achieving approximately 70% accuracy on average in the training dataset. The
authors evaluate the models using non-redundant testing data, showcasing accuracy ranging from
66.66% to 100%. Performance varies among different genes, with artificial neural network
(ANN) models generally performing the best. Furthermore, the authors highlight the significance
of their approach in predicting drug resistance and classifying mutations. They emphasize the
importance of various features, such as changes in amino acid properties and stability
calculations, in accurately predicting mutation effects. The potential utility of the models for
clinical applications and the prediction of novel mutations is underscored.
2.7 Summary of the related works
S/ Author Topic Techniques Result Limitation
N Name & Used
Year
1. Radchenko Machine Artificial The outcome Some

et al. (2023) Learning neural indicates that ANN models
Prediction of network provides better failed to
Mycobacterial (ANN) accuracy (cross- recognize
Cell Wall validated balanced many
27
Permeability accuracy 0.768, penetrating
of Drugs and sensitivity 0.768, compounds
Drug-like specificity 0.769, due to bias
Compounds area under ROC dataset
curve 0.911). caused by
imbalance
data,
resulting in
many false
negatives.
2. Kuang et al. Accurate and Logistic In terms of F1- 1). data

(2022) rapid regression, scores (81.1 to imbalance
prediction of Random 93.8%, 93.7 to with more
tuberculosis forest and 96.2%, 93.1 to vulnerable
drug resistance 1D CNN 94.8%, 95.9 to isolates.
from genome 97.2%, and 97.1 to
2). Due to
sequence data 98.2% for
the study's
using ethambutol,
lack of
traditional rifampicin,
hyperparam
machine pyrazinamide,
eter
learning isoniazid, and
optimization
algorithms and ofloxacin,
, the 1D
CNN respectively), 1D
CNN
CNN models
architecture
outperformed LR
performed
and RF. CNN had
less than
the highest
traditional
accuracy (ranging
ML
from 90.0% -
methods
96.2%).
(LR and
RF).
3. Hrizi et al. Tuberculosis Support The outcome of The dataset

(2022) Disease vector comparing six contains
28
Diagnosis machine machine learning some
Based on an (SVM), K- algorithms (SVM, irrelevant
Optimized Nearest KNN, CART, NB, characteristi
Machine Neighbors LDA, and RF) cs that
Learning (KNN), showed that SVM increase the
Model Classificatio (0.84) classifier likelihood
n was more accurate that the
And Regress than the other learning
ion Tree classification models will
(CART), algorithms, while be overfit,
Naïve Bayes KNN (0.82), LDA complex,
(NB), Linear (0.82) and RF and
Discriminan (0.81) performed challenging
t Analysis better than CART to
(LDA), and (0.73) and NB understand,
Random (0.67). leading to
forest (RF). low
efficiency
and poor
performance
.
4. Deelder et A modified Decision The predictive N/A

al. (2022) decision tree tree accuracy of
approach to resistance from
improve the Treesist-TB was
prediction and comparable to that
mutation of the TB-Profiler
discovery for tool (RIF 97.5%
drug resistance vs. 97.6%; INH
in 96.8% vs. 96.5%;
Mycobacteriu EMB 96.8% vs.
m tuberculosis 95.8%).
5. Ye et al. Identification Support The XGBoost 1). The

(2021) of active vector model based on training
molecules machine MorganFP and dataset is
29
against (SVM), RDKitFP performs unbalanced.
Mycobacteriu Random the best with
2). For each
m tuberculosis forest (RF), AUC= 0.832.
scaffold, the
through Extreme Overall, all the
inhibitors
machine gradient models perform
and
learning boosting well, with the AUC
noninhibitor
(XGBoost) values all higher
s are
and Deep than 0.91. The
imbalanced.
neural stacking model
networks outperforms the
(DNN) other four
individual models,
with an average
AUC= 0.935 and
ACC= 0.878 for
the scaffold test
set.
6. Nagamani Mycobacteriu XGBoost, The accuracy N/A

and Sastry m tuberculosis Random values shown for
(2021) Cell Wall forest (RF), random forest
Permeability Support (RF), gradient
Model vector boosting model
Generation machine (GBM),
Using (SVM), and classification and
Chemoinform Naïve Bayes regression model
atics and (NB) (CART), Glmnet,
Machine support vector
Learning machine (SVM), k-
Approaches nearest neighbors
(KNN), naive
Bayes (NB), and
logistic regression
were 0.946, 0.939,
0.851, 0.927,
0.925, 0.925,
0.864, and 0.490,
30
respectively.
7. Hadikurnia Predicting C4.5, With an average N/A

wati et al. tuberculosis Random AUC of 0.979,
(2021) drug resistance Forest, and MD-WDNN and
using machine Logitboost. logistic regression
learning based showed the best
on DNA performance.
sequencing
data
8. Jamal et al. Artificial Naïve bayes Four ML N/A

(2020) Intelligence (NB), K- algorithms, NB,
and Machine nearest kNN, SVM, and
learning based neighbor ANN, were used to
prediction of (KNN), create learnt model
resistant and Support systems for genes
susceptible vector linked with the
mutations in machine first-line TB
Mycobacteriu (SVM) and medicines
m tuberculosis Artificial rifampicin (rpoB),
neural isoniazid (katG
network and inhA),
(ANN) pyrazinamide
(pncA), and
fluoroquinolones
(gyrA and gyrB).
The models were
extremely
accurate, with
average accuracies
of 88.86%,
85.22%, 88.0%,
87.30%, 78.88%,
and 86.88% for
rpoB, inhA, katG,
pncA, gyrA, and
31
gyrB, respectively.
9. Yang et al. DeepAMR for DeepAMR, DeepAMR 1). Each

(2019) predicting co- Random outperformed the label's class
occurrent forest (RF), baseline model and is
resistance of Support four machine unbalanced,
Mycobacteriu vector learning models in as are the
m tuberculosis machine predicting labels'
(SVM), resistance to four cooccurrenc
multi-label first-line e rates
K-nearest medicines, INH, among
neighbours EMB, PZA, and various
(MLKNN) MDR-TB, with drugs.
and AUROCs of
2). This
Ensemble 97.7%, 96.8%, and
study only
classificatio 94.4%,
considered
n chains respectively. The
cross-
(ECC) SVM has
resistance
sensitivity of
between
92.6%, 85.6%, and
four first-
78.6%, with
line
AUROC of 96.4%,
medications
92.1%, and 89.5%,
, ignoring
respectively. ECC
that of
outperformed ML
second-line
KNN, with
medications
specificities of
because (i)
99.0% and 96.3%
inaccurate
for INH and EMB,
phenotyping
respectively, and
for second-
F1 scores of 78.2%
line
and 72.7% for
medications
EMB and PZA.
would
introduce
significant
error in the
32
classificatio
n for first-
line
medications
, and (ii) a
small
number of
resistant
isolates
would
easily lead
to over-
fitting for
such a
complex
model.
3). The
permutation
feature is
unable to
distinguish
between
feature
relationship
s.
10 Deelder et Machine non- Overall, the There was

. al. (2019) Learning parametric performance of the insufficient
Predicts classificatio gradient-boosted phenotypic
Accurately n-tree and tree models was data to
Mycobacteriu gradient- superior than that include
m tuberculosis boosted-tree of the newly
Drug classification tree developed
Resistance models. In and
From Whole comparison to repurposed
Genome EMB (82.8%) and medicines
33
Sequencing PZA (69.7%), RIF such as
Data (88.8%) and INH bedaquiline,
(91.1%) had delamanid,
stronger GBT- and
CRM sensitivity. linezolid, as
CIP (85.7%), OFL well as
(81.0%), and MOX XDR-TB.
(53.3%) had the
highest
fluoroquinolone
sensitivity. The
injectables with the
highest sensitivity
were KAN
(82.2%), AMK
(80.5%), and CAP
(74.6%).
34
References
Applying Bioinformatics in Clinical Drug Discovery. (n.d.). Retrieved August 24, 2023, from
https://www.longdom.org/open-access/applying-bioinformatics-in-clinical-drug-
discovery.pdf
Bioinformatics | PNNL. (n.d.). Retrieved August 24, 2023, from https://www.pnnl.gov/explainer-
articles/bioinformatics
Deelder, W., Christakoudi, S., Phelan, J., Benavente, E. D., Campino, S., McNerney, R., Palla,
L., & Clark, T. G. (2019). Machine learning predicts accurately mycobacterium
tuberculosis drug resistance from whole genome sequencing data. Frontiers in Genetics,
10(SEP). https://doi.org/10.3389/fgene.2019.00922
Deelder, W., Napier, G., Campino, S., Palla, L., Phelan, J., & Clark, T. G. (2022). A modified
decision tree approach to improve the prediction and mutation discovery for drug
resistance in Mycobacterium tuberculosis. BMC Genomics, 23(1).
https://doi.org/10.1186/s12864-022-08291-4
Hadikurniawati, W., Anwar, M. T., Marlina, D., & Kusumo, H. (2021). Predicting tuberculosis
drug resistance using machine learning based on DNA sequencing data. Journal of
Physics: Conference Series, 1869(1). https://doi.org/10.1088/1742-6596/1869/1/012093
Hrizi, O., Gasmi, K., ben Ltaifa, I., Alshammari, H., Karamti, H., Krichen, M., ben Ammar, L.,
& Mahmood, M. A. (2022). Tuberculosis Disease Diagnosis Based on an Optimized
Machine Learning Model. Journal of Healthcare Engineering, 2022.
https://doi.org/10.1155/2022/8950243
Jamal, S., Khubaib, M., Gangwar, R., Grover, S., Grover, A., & Hasnain, S. E. (2020). Artificial
Intelligence and Machine learning based prediction of resistant and susceptible mutations
35
in Mycobacterium tuberculosis. Scientific Reports, 10(1). https://doi.org/10.1038/s41598-
020-62368-2
Kuang, X., Wang, F., Hernandez, K. M., Zhang, Z., & Grossman, R. L. (2022). Accurate and
rapid prediction of tuberculosis drug resistance from genome sequence data using
traditional machine learning algorithms and CNN. Scientific Reports, 12(1).
https://doi.org/10.1038/s41598-022-06449-4
Nagamani, S., & Sastry, G. N. (2021). Mycobacterium tuberculosis cell wall permeability model
generation using chemoinformatics and machine learning approaches. ACS Omega,
6(27), 17472–17482. https://doi.org/10.1021/acsomega.1c01865
Radchenko, E. v., Antonyan, G. v., Ignatov, S. K., & Palyulin, V. A. (2023). Machine Learning
Prediction of Mycobacterial Cell Wall Permeability of Drugs and Drug-like Compounds.
Molecules, 28(2). https://doi.org/10.3390/molecules28020633
Romano, J. D., & Tatonetti, N. P. (2019). Informatics and computational methods in natural
product drug discovery: A review and perspectives. Frontiers in Genetics, 10(APR),
442506. https://doi.org/10.3389/FGENE.2019.00368/BIBTEX
Tuberculosis (TB) | Cedars-Sinai. (n.d.). Retrieved August 24, 2023, from https://www.cedars-
sinai.org/health-library/diseases-and-conditions/t/tuberculosis-tb.html
Tuberculosis (TB): Symptoms, treatment, diagnosis, and more. (n.d.). Retrieved August 24,
2023, from https://www.medicalnewstoday.com/articles/8856#causes
Tuberculosis. (n.d.). Retrieved August 24, 2023, from https://www.who.int/news-room/fact-
sheets/detail/tuberculosis
Tuberculosis: Causes, Symptoms, Diagnosis & Treatment. (n.d.). Retrieved August 24, 2023,
from https://my.clevelandclinic.org/health/diseases/11301-tuberculosis
36
What is bioinformatics, and why is it important? (n.d.). Retrieved August 24, 2023, from
https://bioinformaticshome.com/blog/What_is_bioinformatics_why_%20important.html
What is bioinformatics? | Bioinformatics for the terrified. (n.d.). Retrieved August 24, 2023,
from https://www.ebi.ac.uk/training/online/courses/bioinformatics-terrified/what-
bioinformatics/
What Is Tuberculosis? Symptoms, Causes, Diagnosis, Treatment, and Prevention. (n.d.).
Retrieved August 24, 2023, from https://www.everydayhealth.com/tuberculosis/guide/
Yang, Y., Walker, T. M., Walker, A. S., Wilson, D. J., Peto, T. E. A., Crook, D. W., Shamout,
F., Zhu, T., Clifton, D. A., Arandjelovic, I., Comas, I., Farhat, M. R., Gao, Q.,
Sintchenko, V., Soolingen, D., Hoosdally, S., Cruz, A. L. G., Carter, J., Grazian, C., …
de Oliveira, R. S. (2019). DeepAMR for predicting co-occurrent resistance of
Mycobacterium tuberculosis. Bioinformatics, 35(18), 3240–3249.
https://doi.org/10.1093/bioinformatics/btz067
Ye, Q., Chai, X., Jiang, D., Yang, L., Shen, C., Zhang, X., Li, D., Cao, D., & Hou, T. (2021).
Identification of active molecules against Mycobacterium tuberculosis through machine
learning. Briefings in Bioinformatics, 22(5). https://doi.org/10.1093/bib/bbab068
37

Bioinformatics For Mycobacterium Tuberculosis

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Bioinformatics For Mycobacterium Tuberculosis

Uploaded by

Copyright:

Available Formats

Chapter One

1.0 Background of the study

parasites, and disease-causing organisms. Mycobacterium tuberculosis is an unfriendly species of

bacterium to improve diagnosis, treatment, and prevention strategies for tuberculosis.

contribution of bioinformatics researchers in pharmaceutical advancement, particularly in

studying infectious ailments, is paramount. Moreover, bioinformatics strides forward in shaping

personalized medicine, providing fresh perspectives into crafting medicines attuned to an

of contributing to more effective treatment strategies against tuberculosis.

The objectives of the study are:

bioinformatics, Machine learning applications, and Mycobacterium tuberculosis; conducting a

1.2 Statement of the problem

1.3 Scope of the study

1.4 Significant/Justification of the Study

By revolutionizing the identification of effective drug candidates, it directly addresses the

laboratory methods and contemporary computational approaches. The integration of

stimulates innovation in drug discovery methodologies. By utilizing computational techniques,

1.5 Methodology Overview

bioinformatics, a comprehensive literature review will be conducted. A minimum of three

report of the project.

1.6 Definition Of Terms

 Mycobacterium tuberculosis: A type of bacterium that causes tuberculosis (TB), an

infectious disease primarily affecting the lungs.

strains of TB bacteria that have developed resistance to commonly used antibiotics.

 TB/HIV coinfection: The presence of both tuberculosis and human immunodeficiency

virus (HIV) infections in an individual.

 Personalized Medicine: Medical treatment customized for an individual based on their

genetic information, medical history, and other relevant factors.

including disease diagnosis, treatment selection, and prevention strategies.

 Drug Repurposing: The procedure of identifying new applications for pre-existing

medications, often leveraging computational methods to discover alternative uses.

 Public Health Management: The practice of safeguarding and enhancing community

health through coordinated efforts of both public and private organizations.

 ChemBL Database: A publicly available database containing information about

bioactive molecules, their properties, and their targets.

scientists, computer scientists, and mathematicians, working collaboratively in this highly

interdisciplinary domain (EMBL-EBI 2023). Computers play a vital role in bioinformatics,

to decode the human genome, gain a comprehensive understanding of biological systems,

2.1 Importance of Bioinformatics

Conducting in silico experiments is relatively straightforward, allowing researchers to fine-tune

of biological systems. By amalgamating data from diverse origins, bioinformatics offers a

uncovering innovative avenues to enhance human health and well-being.

2.1.2 Application areas of Bioinformatics

2.2 Bioinformatics in drug discovery

requiring investments of billions of dollars. However, recent advancements in bioinformatics

Through the utilization of state-of-the-art computational techniques to identify potential drug

intricate interactions between pharmaceuticals and biological systems. By amalgamating

information from diverse sources, bioinformatics provides a comprehensive view of therapeutic

earning it the name "consumption."

2.3.1 Causes of Tuberculosis

According to (Cleveland clinic 2023 ) The bacterium Mycobacterium tuberculosis is the

releasing the bacteria into the air.

2.3.2 Stages and Symptoms of Tuberculosis

There are three stages of TB infection, each characterized by specific symptoms.

symptoms. However, a few individuals might experience flu-like symptoms, including:

c) Active tuberculosis develops when the immune system is unable to suppress an

begin mildly and worsen over several weeks, possibly including:

 Coughing up blood or mucus.

 Discomfort while breathing or coughing.

 Overall feeling of unwellness.

d) Extrapulmonary tuberculosis refers to the spread of TB infection beyond the lungs,

Common signs may encompass:

 Overall feeling of unwellness.

 Pain near the site of infection.