You are on page 1of 15

TRIBHUVAN UNIVERSITY

National College of Computer Studies (NCCS)


Paknajol, Kathmandu, Nepal

Project Report
Python Project – Machine Learning
Cervical Cancer Risk Classification
Submitted By: Submitted To:
Bipin Masal Mausam Rajbanshi
BIM 5th semester
Section: A
Roll no.: 02

i
Acknowledgment

In the pursuit of data classification for the Cancer dataset using Python, it is important to express
my gratitude to the various individuals and resources who have aided in the successful
completion of this endeavor. I also want to thank the scikit-learn library's developers and
contributors, whose dedication to open-source software has provided critical tools for this
project.

Furthermore, the collaborative efforts of the Python community, as well as the wealth of
educational resources available online, have been invaluable in guiding my efforts. I am grateful
for the guidance and support provided by academic and professional mentor, Mausam Rajbanshi.
Finally, throughout this journey, my colleagues and peers have provided invaluable assistance,
feedback, and collaborative insights. This recognition is a tribute to the collaborative spirit and
collective effort that have enriched my understanding and application of data classification
techniques.

ii
Table of Contents
Acknowledgment.............................................................................................................................ii
Introduction to Machine Learning...................................................................................................2
Machine Learning in Python........................................................................................................3
Background......................................................................................................................................4
Objective..........................................................................................................................................6
Implementation................................................................................................................................7
Conclusion.....................................................................................................................................13

1
Introduction to Machine Learning

The study of algorithms and statistical models that enable computer systems to acquire new skills
and enhance their performance on a given task without being explicitly taught is known as
machine learning, which is a branch of artificial intelligence. It basically involves educating
computers to learn from data, and then have them predict or decide based on that learning.

Here are the key machine learning components and concepts:

o Data: Data is the basis of machine learning. Machine learning models are taught patterns
and relationships using data. It is possible for this data to be structured (like tables and
databases) or unstructured (like text, photos, and audio). Performance of a machine
learning model is frequently significantly impacted by the quantity and quality of data.
o Training: To train a machine learning model, a dataset with input data (features) and
matching target values or labels is presented to the model. The model gains the ability to
spot patterns or relationships in data by modifying its internal parameters.
o Features: The variables or properties that are utilized to characterize the data are referred
to as features. Choosing, modifying, or developing useful characteristics that aid a
model's ability to learn and produce precise predictions is referred to as feature
engineering.
o Model: A mathematical representation of data patterns or correlations is referred to as a
machine learning model. Different models are employed for different tasks, including
neural networks for complicated tasks like image recognition, decision trees for
classification, and linear regression for regressive situations.
o Algorithm: To train models, machine learning algorithms use mathematical and
computational methods. These methods vary depending on the job at hand and the style
of learning (supervised, unsupervised, or reinforced).
o Evaluation: Depending on the job, different metrics and methodologies, such as accuracy,
precision, recall, F1-score, or mean squared error, are used to assess machine learning
models.
o Deployment: After training and evaluation, machine learning models can be used in real-
world applications to produce predictions or automate decision-making procedures.
o Continuous Learning: As new data becomes available, machine learning models can be
updated and enhanced. Online learning, often referred to as retraining, is essential for
preserving model accuracy over time.

Natural language processing (language translation and sentiment analysis), finance (fraud
detection and stock prediction), healthcare (diagnosis and prognosis), image recognition (object

2
detection and facial recognition), and many other fields make extensive use of machine learning.
It still plays a significant part in developing technologies and resolving challenging issues.
Machine Learning in Python

Python is a popular data science programming language due to its simplicity, versatility, and a
rich ecosystem of libraries and tools designed specifically for data analysis and machine
learning. Here's a quick rundown of how Python is used in data science:

o Data Manipulation and Analysis:


 NumPy: NumPy supports arrays and matrices, making it simple to perform
numerical operations on large datasets.
 Pandas: Pandas is a powerful data manipulation and analysis library. It provides
data structures such as Data Frames for dealing with structured data.
o Data Visualization:
 Matplotlib: Matplotlib is a popular library for creating static, animated, and
interactive plots and charts.
 Seaborn: Seaborn is based on Matplotlib and provides a higher-level interface for
creating visually appealing statistical graphics.
o Machine Learning:
 scikit-learn: scikit-learn is a large machine learning library that includes tools for
classification, regression, clustering, and model selection.

Python's rich library ecosystem, combined with its readability and ease of use, makes it an
excellent choice for data scientists looking to effectively explore, analyze, and model data.
Depending on the task, you can use a variety of libraries and tools to achieve your data science
objectives.

3
Background
Cancer classification using Python involves the use of machine learning and data analysis
techniques to categorize drugs based on various attributes and characteristics. This can have
important applications in healthcare, pharmacology, and drug discovery. Here's some
background information on drug classification using Python:

1. Importance of Cancer Classification:

 Cancer classification is a crucial task in the field of pharmacology and medicine.


 It helps in organizing drugs based on their properties, mechanisms of action, and
therapeutic uses.
 Proper classification aids in cancer discovery, prescription, and understanding cancer
interactions and side effects.
2. Machine Learning and Python:

 Python is a popular programming language for data analysis and machine learning due to
its extensive libraries and ease of use.
 Libraries like scikit-learn, TensorFlow, Keras, and PyTorch provide tools for building
and training machine learning models.
3. Data Collection:

 To classify cancer, you need a dataset containing drug-related information. This data can
include chemical properties, molecular structures, side effects, and therapeutic categories.
 Datasets can be obtained from public sources, pharmaceutical companies, or research
institutions.
4. Data Preprocessing:

 Raw data often needs preprocessing, including cleaning, feature extraction, and
normalization, to make it suitable for machine learning models.
 Feature engineering can involve extracting relevant features from drug structures, such as
molecular fingerprints or descriptors.
5. Machine Learning Models:

 Classification models like decision trees, random forests, support vector machines, and
neural networks are commonly used for cancer classification.
 These models learn patterns and relationships within the data to predict the class or
category of a cancer.
6. Evaluation Metrics:

 Classification models are evaluated using metrics like accuracy, precision, recall, F1-
score, and ROC-AUC to assess their performance.
 Cross-validation is often used to ensure model robustness.
7. Challenges:

 Cancer classification can be challenging due to the complexity of biological systems and
the diversity of cancer properties.

4
 Imbalanced datasets, where certain cancer classes are underrepresented, can also be a
challenge.
8. Applications:

 Cancer classification models have a wide range of applications, including:


 Predicting drug-drug interactions.
 Identifying potential side effects.
 Discovering novel cancer candidates.
 Personalized medicine by matching cancer to patients based on their genetic profiles.

5
Objective

The objective of a cancer classification project using Python is to develop a predictive model
capable of categorizing drugs into distinct classes or categories based on their inherent attributes
and characteristics. This project serves a multifaceted purpose within the realms of
pharmacology, healthcare, and drug discovery. At its core, the primary aim is to ensure precise
and reliable drug categorization, facilitating the organization of pharmaceutical agents according
to various criteria such as therapeutic applications, chemical properties, or mechanisms of action.
This categorization not only expedites drug discovery by identifying potential candidates with
similarities to existing medications but also enables the personalization of medical treatments,
aligning drug choices with patient-specific factors like genetic profiles and medical histories.
Additionally, it plays a pivotal role in predicting drug interactions and adverse effects, enhancing
drug safety, supporting research endeavors, and serving as a valuable decision support tool for
healthcare practitioners. Furthermore, drug classification contributes to public health monitoring
and education, enabling the analysis of drug usage trends and providing a vital resource for
healthcare professionals, researchers, and students. In pursuing these objectives, it is essential to
uphold ethical standards, addressing concerns related to patient privacy, model bias, and decision
transparency to ensure responsible and beneficial utilization of drug classification models.

6
Implementation

1)Necessary Data Import

2) Read the data set

df = pd.read_csv('risk_factors_cervical_cancer.csv')

3)Describing the data

df.describe()

7
#HISTOGRAM

# creating a histogram of Age of observations that also have a positive diagnosis


of cancer
cancer_yes.hist(color = 'blue')
plt.xlabel('Age')
plt.ylabel('Frequency')
plt.grid(False)
plt.title('Histogram of Age in Positive Dx of Cancer')

8
#Data preprocessing

9
# Data Correlation

#Heatmap for the Data Correlation

10
#Accuracy

11
#Classification Report

12
Conclusion
In conclusion, this report illustrates the successful implementation of a cancer classification
system using Python. The dataset, comprising crucial attributes like age, sex, blood pressure
(BP), cholesterol level, and sodium-to-potassium ratio (Na_to_K), provided a solid foundation
for my classification task.

Through the application of machine learning techniques and Python libraries, I was able to
construct a robust model that accurately predicts the appropriate cancer for a given set of patient
characteristics. The model's performance, as evidenced by its high accuracy, demonstrates its
effectiveness in handling real-world scenarios.

Furthermore, feature importance analysis was used to improve the model's interpretability. This
provided me with valuable insights into which attributes are important in drug classification.
Understanding these relationships is critical in the medical field, where treatment decisions are
made based on a patient's unique characteristics. Overall, this project demonstrates Python's
potential as a powerful tool in the field of healthcare analytics. I have laid the groundwork for
future advances in personalized medicine and pharmaceutical research by leveraging machine
learning algorithms and a well-curated dataset. This work represents a significant step forward in
the medical field toward more effective and tailored treatment approaches.

13

You might also like