You are on page 1of 40

ABSTRACT

This internship has been instrumental in solidifying my understanding of current technologies


and methodologies, refining my content creation skills, and providing practical insights into the
nuances of Machine Learning strategies. It has further fueled my passion for pursuing a career in
Artificial Intelligence & Machine learning. As per the regulations from various universities, we
deliver the various levels of internship training and development. The internship duration may be
from 4 -8 weeks. We provide solutions to the students that they can upgrade their knowledge and
apply, convert it into project development. We have proposed our internship program keeping in
focus of present demand of industry standards, so that we can make ready our students to fill the
gap between industry-academics and value added courses to enhance the skills which helps them
to build the strong career in the field of computer science. An internship during college studies
can help to enhance quality of higher education and to improve skills & competencies amongst
students. In Artificial Learning Starting with a strong foundation in computer science, and
Learning a programming language, like Python and basic algorithms and machine learning and
data science principles and Applying theoretical knowledge through AI project. In machine
learning, classification refers to a predictive modelling problem where a class label is predicted
for a given example of input data. Different types of learnings are Supervised Learning,
Unsupervised learning and Reinforcement learning. The outcome of the Internship programs are
Career management, Phase management, Employee guidance, Project completion, Gain valuable
work experience and job opportunities. We completed three projects on Machine learning
algorithms and also found the desired prediction results. The primary objectives encompassed
gaining hands-on experience in machine learning algorithms, understanding neural network
architectures, and applying AI techniques to real-world problems. This internship facilitated a
deep understanding of machine learning fundamentals, proficiency in Python programming for
AI, and practical experience in deploying models.
CHAPTER 1

ABOUT ZETACODING INNOVATIVE SOLUTIONS


ABOUT US

Zeta Coding Innovative Solutions is located in Bengaluru, Karnataka, India. We are service-
based company for Software Development and Training. Zetacoding has established itself as one
of the leaders in providing quality software solutions and services across a wide-ranging
technology to numerous software applications. Our professionals are well qualified, experienced
and proficient to deliver the quality product on time. We provide end-to-end solutions to Project
Management, Analysis, Development, Deployment and continuous support to the customer’s
satisfactions. Our motto is to provide future technology training and Project development skills.

MISSION
To help our future engineers to understand the upfront research challenges of fast emerging
technology age through advance techno-skilled education.

VISION
All our Engineers are endowed to E-Learning, Technology adaptive, self-growth & futuristic
skills and making them prepared to the innovative world and generating leaders for tomorrow.
VALUES

Our value follows our own Learning Technology Model (LEARN-IT)


 Learning & Listening
 Excellence & Empowerment
 Adaptability & Accompany
 Respect & Responsibility
 Notion & Nestle
 Integrity & Inspiration
SERVICES

 Software Training and Development


 IEEE Projects
 Ph. D Guidance & Assistance
 Internship Program
 Manpower Consultation & Placement

WHY TO CHOOSE INTERNSHIP WITH US

 Current Industry 5.0 technology


 Practical Lab Session Experience
 Problem Solving Skills
 Assistance after Internship

 Student Project Demonstrations

 Added Advantage in Portfolio

 Presentation Skills

BENEFITS OF INTERNSHIP SKILLS

 Career management
 Phase management
 Employee guidance
 Project completion
 Communication skills
 Diplomacy & Teamwork
 Positive attitude
 Team Work
CHAPTER 2

ROADMAP TO THE INTERNSHIP PROGRAM


2.1 TASK PERFORMED

As an intern, don’t expect to spearhead a critical project right off the bat…at least not yet. In the
beginning of your internship, you may your time simply trying to learn how the company works.
You may shadow an employee to get an understanding of their role. After a day or a few days of
learning the ins-and-outs of the company. You’ll start to assist and contribute more to the team.

Week 1 Activities
 Induction Program to the Internship Program

 Introduction to the Python Programming – Part1


 Conditional Programming
 Looping and Functions
 Data Structure in Python
 Python programming including OOPs –Part 2
 OOPs Concept, Class & Objects

Week 2 Activities
 Inheritance, Polymorphism & Encapsulations
 Advance Packages in Python Numpy and Pandas
 Introduction to Artificial Intelligence & Machine Learning
 Introduction to Machine Learning , Applications and Model development
 Supervised & Unsupervised Learning
 Classification and Regression
 KNN with explanation and Code Implementation
 Random Forest Algorithm with explanation and Code Implementation
Week 3 Activities
 Mini Project 1: Loan Approval Prediction
 Mini Project 2: Diamond Price Prediction
 Mini Project 3: Heart Disease Prediction
 Industry Ready Preparation- Roadmap to the Interviews Success

In a machine learning internship, we learnt learn a variety of valuable skills and gain hands-on
experience in several areas:

Fundamentals of Artificial Intelligence and Machine Learning: Understanding the artificial


intelligence technologies, core concepts, algorithms, and techniques used in machine learning,
such as regression, classification and clustering.

Programming Languages: Developing proficiency in programming languages commonly used in


machine learning like Python or R. This includes libraries such as Numpy, Pandas, Matplotlib,
sklearn and Seaborn.

Data Preparation: Learning how to clean, pre-process, and manipulate data for machine learning
models. This involves tasks like data cleaning, feature engineering, and normalization.

Model Development and Evaluation: Building machine learning models, tuning hyper
parameters, and assessing their performance using various evaluation metrics.

Career Guidelines: During the Internship, from the company they have conducted the Career
Guidelines program, which was very informative and we have taken lot of inputs from it and
adopting in our upcoming semesters.

Finally, this internship offered a chance to learn and grow, so being proactive, asking questions,
and seeking mentorship can greatly enhance your learning experience
CHAPTER 3
ABOUT THE INTERNSHIP

3.1 Benefits of internship

An internship is on most students’ minds — an opportunity to jumpstart their professional


careers and supplement their courses with hands-on experience. Graduating seniors who applied
for a full-time job and participated in an internship received 20% more job offers than those
without internship experience.

Internships benefit both the student and the employee. On-the-job learning reinforces what you
see in the classroom and teaches invaluable skills like time management, communication,
working with others, problem-solving, and, most importantly, the willingness to learn. For
employers, you can build relationships and prepare future employees.

3.2 Hands-on Experience

Internships provide practical, hands-on experience in a real-world work environment. This


allows interns to apply theoretical knowledge gained in the classroom to actual tasks and
projects.

3.3 Skill Development

Internships help individuals develop and enhance specific skills relevant to their field of study or
career interests. This can include technical skills, communication skills, problem-solving
abilities, and more.

3.4 Networking Opportunities

Internships allow individuals to build professional networks by interacting with colleagues,


supervisors, and professionals in their industry. Networking can open doors to future job
opportunities and mentorship.

3.5 Resume Enhancement


Having internship experience on a resume can make a candidate more attractive to employers. It
demonstrates practical experience, commitment, and the ability to apply knowledge in a real-
world setting.

3.6 Career Exploration

Internships provide an opportunity for individuals to explore different career paths within their
field of study. This hands-on experience can help interns clarify their career goals and make
informed decisions about their future.

3.7 References and Recommendations

Successful internships can lead to positive references and recommendations from supervisors
and colleagues. These references can be valuable when applying for future jobs or graduate
programs.

3.8 Professionalism and Workplace Etiquette

Internships expose individuals to professional work environments, helping them understand


workplace etiquette, communication norms, and other aspects of professional conduct.

3.9 Industry Knowledge

Internships provide a deep dive into the workings of a specific industry, giving interns a better
understanding of industry trends, challenges, and best practices.
CHAPTER 4

ARTIFICIAL INTELLIGENCE ND MACHINE LEARNING

4.1 Introduction to Python Programming


Python is an interpreted, high-level, general-purpose programming language. Its design
philosophy emphasizes code readability with its notable use of significant whitespace. Its
language constructs and object-oriented approach aim to help programmers write clear, logical
code for small and large-scale projects.

Python is dynamically typed and supports automatic memory management. It has a large and
comprehensive standard library.

Python is used in web development, software development, game development, machine


learning, and more.

Python is a versatile and widely used programming language known for its readability,
simplicity, and flexibility. It was created by Guido van Rossum and first released in 1991. Python
is an interpreted, high-level programming language that supports multiple programming
paradigms, including procedural, object-oriented, and functional programming. Here's a brief
introduction to key aspects of Python programming:

Here are some of the key features of Python:

Readability
One of Python's strengths is its clean and readable syntax. The language emphasizes code
readability, and its syntax allows developers to express concepts in fewer lines of code than
might be possible in other languages. This readability is facilitated by the use of indentation to
define code blocks, eliminating the need for braces or other delimiters.
Versatility
Python is a general-purpose language, meaning it can be used for a wide range of applications. It
is commonly used in web development, data science, artificial intelligence, machine learning,
automation, scientific computing, and more. The versatility of Python has contributed to its
popularity and widespread adoption.

Interpreted Language

Python is an interpreted language, which means that code is executed line by line by the Python
interpreter. This makes the development process more interactive and allows for quick testing
and debugging.

Object-Oriented

Python supports object-oriented programming (OOP), allowing developers to structure their code
using classes and objects. This paradigm promotes modularity, reusability, and a more organized
approach to software development

Easy to learn

Python is a very easy language to learn, especially for beginners. It has a simple syntax that is
easy to read and write.

Free and open-source

Python is a free and open-source language, which means that it is free to use and distribute. It is
also supported by a large and active community of developers.

4.2 Features of Python Programming

Powerful and versatile:

Python is a general-purpose programming language that can be used for a wide range of tasks,
including web development, data science, machine learning, and more.

Portable and scalable:


Python code can be run on various operating systems and platforms, making it a portable and
scalable choice for development projects.

Large and active community:

Python has a large and active community of developers who contribute to the language's
development and provide support to other users.

Here are some additional features of Python that make it a popular choice for programmers:

Dynamic Typing:

Python is dynamically typed, meaning that variable types are determined at runtime. This
provides flexibility but also requires careful attention to variable types during development to
avoid unexpected behavior.

Extensive Standard Library:

Python comes with a rich standard library that includes modules and packages for a wide range
of functionalities. This reduces the need for developers to write code from scratch for common
tasks, as they can leverage existing modules.

Community and Documentation:

Python has a large and active community of developers. This community contributes to the
language's growth, development, and the creation of numerous third-party libraries and
frameworks. Additionally, Python has extensive documentation, making it easy for developers to
find resources and solutions to problems.

Cross-Platform Compatibility:

Python is designed to be cross-platform, meaning that Python code can run on various operating
systems without modification. This portability makes it easier to develop and deploy applications
across different environments.

Open Source:
Python is an open-source language, allowing developers to access and modify the source code.
This fosters collaboration and continuous improvement within the Python community.Support
for third-party libraries:

Python has a large and active ecosystem of third-party libraries, which can be used to add
additional functionality to Python applications.

Figure 4.1 Features of Python Programming

4.3 Introduction to Artificial Intelligence and Machine Learning

Artificial intelligence (AI) is the ability of a computer program or machine to think and learn. It's
a field of study in computer science that focuses on developing intelligent machines. AI is also
known as machine intelligence.

AI is the simulation of human intelligence processes by machines. It allows computers to


perform tasks that are commonly associated with human intellectual processes, such as
reasoning. AI can also learn to make decisions and carry out actions on behalf of a human.

4.3.1 Artificial Intelligence


Artificial intelligence (AI) is the ability of a computer program or machine to think and learn. It's
a field of study in computer science that focuses on developing intelligent machines. AI is also
known as machine intelligence.

AI, or Artificial Intelligence, refers to the simulation of human intelligence in machines


programmed to think and learn like humans. The goal of AI is to develop systems that can
perform tasks that typically require human intelligence, such as visual perception, speech
recognition, decision-making, and language translation

There are two main types of AI:

Narrow or Weak AI

General or Strong AI

Some applications of AI include:

Expert systems

Natural language processing

Speech recognition

Machine vision
Fig 4.2 Human Robot

4.4 Supervised Learning

Supervised learning, also known as supervised machine learning, is a subcategory of machine


learning and artificial intelligence. It is defined by its use of labelled datasets to train algorithms
that to classify data or predict outcomes accurately. As input data is fed into the model, it adjusts
its weights until the model has been fitted appropriately, which occurs as part of the cross
validation process. Supervised learning helps organizations solve for a variety of real-world
problems at scale, such as classifying spam in a separate folder from your inbox.

Supervised learning uses a training set to teach models to yield the desired output. This training
dataset includes inputs and correct outputs, which allow the model to learn over time. The
algorithm measures its accuracy through the loss function, adjusting until the error has been
sufficiently minimized.
The prediction task is a classification when the target variable is discrete. An application is the
identification of the underlying sentiment of a piece of text.

The prediction task is a regression when the target variable is continuous. An example can be the
prediction of the salary of a person given their education degree, previous work experience,
geographical location, and level of seniority.

Figure 4.3 Supervised Learning

Classification

Classification is a type of supervised learning in machine learning where the goal is to assign
predefined labels or categories to input data based on its features. The algorithm learns from a
labelled training dataset, where each data point is associated with a known class or category. The
trained model can then be used to predict the class of new, unseen data.

Predicts discrete labels or categories. For example, classification can predict if something is true
or false, male or female, or spam or not spam.

Here are key concepts and components related to classification:

Classes or Labels:

In a classification problem, there are predefined classes or labels that the algorithm aims to
predict. For example, in a spam detection task, the classes might be "spam" and "not spam."
Features:

Features are the characteristics or attributes of the input data that the algorithm uses to make
predictions. The combination of features represents the input data in the feature space.

Training Data:

The training dataset consists of labeled examples used to train the classification model. Each
example includes both the input features and the corresponding class label.

Model:

The model is the mathematical representation or algorithm used to map input features to class
labels. Different classification algorithms use different approaches, such as decision trees,
support vector machines, logistic regression, or neural networks.

Decision Boundary:

The decision boundary is the dividing line or surface that separates different classes in the feature
space. The model uses this boundary to classify new, unseen data points.

Training Phase:

During the training phase, the algorithm adjusts its internal parameters based on the labeled
training data. The goal is to create a model that generalizes well to new, unseen data.

Prediction Phase:

Once trained, the model can be used to predict the class labels of new data points. The input
features are fed into the model, and the output is the predicted class.

Evaluation Metrics:

Classification models are evaluated using various metrics, depending on the specific problem.
Common metrics include accuracy, precision, recall, F1 score, and area under the Receiver
Operating Characteristic (ROC) curve.

Multi-Class Classification:
In some cases, there are more than two classes to predict. Multi-class classification involves
assigning one of several possible classes to each data point.

Imbalanced Classes:

Imbalanced classes occur when the number of instances in different classes is not evenly
distributed. Techniques such as oversampling, under sampling, or using different evaluation
metrics may be employed to address imbalanced class issues.

Hyperparameter Tuning:

Many classification algorithms have hyperparameters that need to be set before training.
Hyperparameter tuning involves finding the optimal values for these parameters to improve
model performance.

Cross-Validation:

Cross-validation is a technique used to assess the performance of a classification model. It


involves splitting the dataset into multiple subsets, training the model on some subsets, and
testing it on the remaining subset, repeating the process to ensure robust evaluation.

Classification is widely applied across various domains, including spam detection, image
recognition, sentiment analysis, medical diagnosis, and many other tasks where the goal is to
assign predefined categories to input data based on its features.
Figure 4.4 classification model

Regression

Regression is a statistical method used in finance, investing, and other disciplines that attempts to
determine the strength and character of the relationship between one dependent variable (usually
denoted by Y) and a series of other variables (known as independent variables).

Also called simple regression or ordinary least squares (OLS), linear regression is the most
common form of this technique. Linear regression establishes the linear relationship between two
variables based on a line of best fit. Linear regression is thus graphically depicted using a straight
line with the slope defining how the change in one variable impacts a change in the other. The y-
intercept of a linear regression relationship represents the value of one variable when the value of
the other is zero. Non-linear regression models also exist, but are far more complex.

Regression analysis is a powerful tool for uncovering the associations between variables
observed in data, but cannot easily indicate causation. It is used in several contexts in business,
finance, and economics. For instance, it is used to help investment managers value assets and
understand the relationships between factors such as commodity prices and the stocks of
businesses dealing in those commodities

Predicts continuous numerical values. For example, regression can predict home values, market
trends, or a condominium's selling price.

4.5 Classification Algorithms:

4.5.1 Support Vector Machines (SVM):

Explanation: SVM is a powerful algorithm for classification. It works by finding the hyperplane
that best separates different classes in the feature space. The hyperplane is chosen to maximize
the margin between classes.

Hyperplane:

In a two-dimensional space, a hyperplane is a line that separates data into two classes. In higher-
dimensional spaces, it becomes a hyperplane. The goal of SVM is to find the hyperplane that
maximizes the margin between classes.
Margin:

The margin is the distance between the hyperplane and the nearest data point from either class.
SVM aims to find the hyperplane with the maximum margin, providing a robust separation
between classes.

Support Vectors:

Support vectors are the data points that lie closest to the hyperplane and influence its position.
These are the critical points for determining the margin and the optimal hyperplane.

Kernel Trick:

SVM can handle non-linear decision boundaries through a technique called the kernel trick.
Kernels transform the input data into a higher-dimensional space, making it possible to find a
hyperplane in that space. Common kernel functions include linear, polynomial, and radial basis
function (RBF) kernels

Fig 4.5 SVM model

SVM

In the figure, the hyperplane is the line separating the two classes, and the dotted lines represent
the margins. SVM aims to find the hyperplane with the maximum margin.
4.5.2 Random Forest:

Bootstrapping:

Random Forest employs a technique called bootstrapping, where multiple subsets of the training
data are created by randomly sampling with replacement. Each decision tree is then built on one
of these subsets.

Feature Randomness:

Not all features are used to split each node of the decision tree. At each node, a random subset of
features is considered for the split. This further decorrelates the trees in the forest.

Voting (Classification) or Averaging (Regression):

For classification tasks, the class that receives the majority of votes from the individual trees is
the final predicted class. For regression tasks, the predictions from all trees are averaged to get
the final output.
Figure 4.6 Random forest example

Robustness:

Random Forest is known for its robustness and resistance to overfitting. The combination of
multiple trees with different subsets of data helps to generalize well to new, unseen data.

Feature Importance:

Random Forest can provide an estimate of the importance of each feature in making accurate
predictions. This can be useful for feature selection and understanding the underlying
relationships in the data.

Random Forest is widely used in practice due to its flexibility, ease of use, and strong
performance across various types of data. It is applicable in areas such as finance, healthcare, and
remote sensing, among others. If you have specific questions or if there's a particular aspect of
Random Forest you'd like to know more about, feel free to ask!

Explanation: Random Forest is an ensemble learning algorithm that constructs a multitude of


decision trees during training and outputs the mode of the classes (classification) or the mean
prediction (regression) of the individual trees.
The figure shows a Random Forest consisting of multiple decision trees. Each tree makes its
prediction, and the final prediction is the mode of all individual tree predictions (for
classification).

4.5.3 Regression Algorithms

Linear Regression:

Explanation: Linear Regression is a simple and widely used regression algorithm. It models the
relationship between the dependent variable and one or more independent variables by fitting a
linear equation to observed data.

Figure 4.7 Linear Regression

In the figure, the blue line represents the linear regression model, and the points are the actual
data. The algorithm aims to find the line that minimizes the sum of squared differences between
the predicted and actual values.

4.5.4 Decision Tree

A Decision Tree is a supervised machine learning algorithm used for both classification and
regression tasks. It works by recursively partitioning the data into subsets based on the values of
different features. The goal is to create a tree-like model that makes decisions at each node
regarding the predicted output.

Here are the key components and concepts associated with Decision Trees:

Root Node:

The topmost node in a decision tree is called the root node. It represents the entire dataset and is
split into subsets based on the values of a selected feature.

Decision Nodes (Internal Nodes):

These are the nodes where the dataset is split based on a particular feature. Each decision node
represents a decision based on the value of a specific feature.

Leaf Nodes (Terminal Nodes):

Leaf nodes are the final nodes in a decision tree. They represent the predicted output or class.
Each leaf node is associated with a specific outcome.

Branches:

The branches of a decision tree connect the nodes and represent the decision path. They show
how the data is split based on the values of features.

Splitting:

The process of dividing a node into subsets based on the values of a chosen feature is called
splitting. The goal is to create homogenous subsets that are more predictive of the target variable.

Entropy and Information Gain (for Classification):

In classification tasks, Decision Trees use metrics like entropy and information gain to decide the
best feature for splitting. Entropy measures the impurity or disorder in a dataset, and information
gain quantifies the effectiveness of a split in reducing entropy
Figure 4.8 Decision Tree Example

4.6 Unsupervised Learning


Unsupervised learning is a branch of machine learning where the model learns patterns and
structures from input data without explicit supervision or labeled outcomes. Unlike supervised
learning, where the algorithm is trained on labeled data, unsupervised learning explores
unlabeled data to uncover inherent patterns or relationships within the dataset. It's commonly
used for tasks such as clustering and dimensionality reduction. Here are some key concepts
within unsupervised learning:

Clustering: This involves grouping similar data points together in a dataset based on certain
features or characteristics. Algorithms like K-means, hierarchical clustering, and DBSCAN are
used to partition data into distinct groups.

Dimensionality Reduction: Techniques such as Principal Component Analysis (PCA), t-


Distributed Stochastic Neighbor Embedding (t-SNE), and autoencoders are employed to reduce
the number of features or variables in a dataset while preserving essential information. This helps
in visualizing high-dimensional data or speeding up subsequent computations.
Anomaly Detection: Identifying unusual patterns or outliers within a dataset. Unsupervised
learning can discover anomalies by detecting data points that significantly deviate from the
norm.

Association Rule Learning: Uncovering interesting relationships or associations between


variables in large datasets. Apriori algorithm and FP-Growth are examples used in market basket
analysis, recommendation systems, and more.

Generative Models: Algorithms like Variational Autoencoders (VAEs) and Generative


Adversarial Networks (GANs) are used to generate new data samples that resemble the original
dataset's distribution. These models have applications in image generation, data augmentation,
and creating synthetic data.

Unsupervised learning is valuable when dealing with large datasets lacking labeled information
or when exploring data to derive insights, identify patterns, or preprocess data before applying
supervised learning techniques.

4.6.1 Clustering Algorithms

K-Means Clustering:

Explanation: K-Means is a popular clustering algorithm that partitions data into K clusters based
on similarity. It minimizes the sum of squared distances between data points and the centroid of
their assigned cluster.

K-Means Clustering: The figure shows the iterative process of K-Means clustering. The
algorithm starts with initial centroids and assigns data points to the nearest centroid. It then
updates the centroids and repeats until convergence.
Figure 4.9 K-means Clustering

Hierarchical Clustering:

Explanation: Hierarchical Clustering builds a hierarchy of clusters. It can be agglomerative


(bottom-up) or divisive (top-down). Agglomerative starts with individual data points as clusters
and merges them, while divisive starts with one cluster and recursively splits it.

Figure 4.10 Hierarchal Clustering

Hierarchical Clustering

The figure illustrates agglomerative hierarchical clustering. Initially, each data point is a cluster.
The algorithm then merges the closest clusters iteratively until all points belong to a single
cluster.
Dimensionality Reduction Algorithm:

Principal Component Analysis (PCA):

Explanation: PCA is a dimensionality reduction technique that transforms data into a new
coordinate system, where the most significant variance lies along the first few principal
components. It helps capture the essential features while reducing the dimensionality.

Figure 4.11 Principal Components Analysis

PCA

In the figure, the red lines represent the principal components. PCA identifies the directions
(principal components) along which the data varies the most, reducing the dimensionality while
retaining the most important information.

These unsupervised learning algorithms are valuable for discovering patterns in data, grouping

similar instances together, and reducing the complexity of high-dimensional datasets.

Anomaly detection

It is also known as outlier detection, is a technique in machine learning used to identify patterns
or instances that deviate significantly from the norm within a dataset. These instances are
considered anomalies or outliers because they do not conform to the expected behavior of the
majority of the data. Anomaly detection is applied in various fields, including cybersecurity,
finance, healthcare, and industrial systems, where detecting unusual events or behaviors is
crucial.

Here are some common approaches and techniques for anomaly detection:

Statistical Methods:

Z-Score or Standard Score: This method measures how many standard deviations a data point is
from the mean. Data points with a z-score above a certain threshold are considered anomalies.

Quartile Range (IQR):

The interquartile range is the range between the first quartile (25th percentile) and the third
quartile (75th percentile) of the data. Data points outside a defined range are considered outliers.

Distance-Based Methods:

Euclidean Distance: Calculate the distance of each data point from the centroid or mean of the
dataset. Points with distances above a threshold are flagged as anomalies.

Mahala Nobis Distance: It accounts for correlations between variables and is particularly useful
when dealing with multivariate data.
CHAPTER 5

MINI PROJECTS IMPLEMENTATION


5.1 Loan Approval Prediction

Introduction

LOANS are the major requirement of the modern world. By this only, Banks get a major part of
the total profit. It is beneficial for students to manage their education and living expenses, and for
people to buy any kind of luxury like houses, cars, etc.

But when it comes to deciding whether the applicant’s profile is relevant to be granted with loan
or not. Banks have to look after many aspects.

So, here we will be using Machine Learning with Python to ease their work and predict whether
the candidate’s profile is relevant or not using key features like Marital Status, Education,
Applicant Income, Credit History, etc.

The dataset contains 13 features:

Gender: Gender may influence loan approval decisions in some regions or cultures due to
historical biases or societal norms. Some models may consider gender-neutral factors instead to
promote fairness.

Marital Status (Married): Married individuals might be seen as more stable and responsible,
which could positively impact loan approval chances.
Dependents: The number of dependents can affect an applicant's ability to repay a loan. More
dependents may indicate higher financial responsibilities and potentially reduce loan approval
chances.

Education: Applicants with higher education levels may have better job prospects and income
potential, potentially increasing their chances of loan approval.

Self-Employed: Self-employed individuals might face different approval criteria than those who
are employed by others, as their income can be less stable.

Algorithm used:

Accuracy comparison:
Predictions

5.2 Diamond Price Prediction


Introduction

Machine Learning is used across many ranges around the world. The healthcare industry is no
exclusion. Machine Learning can play an essential role in predicting presence/absence of
locomotors disorders, drug prediction and diamond price prediction and more.

A diamond's table refers to the flat facet of the diamond seen when the stone is face up.

The main purpose of a diamond table is to refract entering light rays and allow reflected light
rays from within the diamond to meet the observer’s eye.

Dataset for Diamond price prediction:

Algorithms used:
Accuracy comparisons:

Prediction
5.3 Heart Disease Prediction

Introduction

Many factors, such as diabetes, high blood pressure, high cholesterol, and abnormal pulse rate,
need to be considered when predicting heart disease. Often, the medical data available need to be
completed, affecting the results in predicting heart disease. Machine learning plays a crucial role
in the medical field. High blood cholesterol is defined as having too much cholesterol—a waxy,
fatty substance—in the blood. Having either high LDL cholesterol (“bad” cholesterol) or low
HDL cholesterol (“good” cholesterol)—or both—is one of the best predictors of your risk of
heart disease

Dataset
Accuracy Comparison

Prediction
CHAPTER 6

TOOLS LEARNT
6.1 Jupyter Notebook (Anaconda Navigator)

To open a Jupyter Notebook with Anaconda Navigator, follow these steps:

 Open Anaconda Navigator.

 Click on the Jupyter Notebook tab.

 Click on the Launch button.

A Jupyter file browser will open in a web browser tab. A new notebook will open as a new tab in
your web browser. Jupyter Notebook is an interactive web-based environment for creating and
sharing documents that contain live code, equations, visualizations, and narrative text. It is used
by data scientists, engineers, and students for a variety of tasks, including data cleaning and
analysis, machine learning, and scientific computing. Anaconda Navigator is a graphical user
interface for managing Anaconda installations and packages. It makes it easy to install, update,
and manage Python packages, including Jupyter Notebook.

Figure 6.1 Anaconda Navigator

6.2 Google Colab

Google Colab, or Colaboratory, is a free, cloud-based platform for writing and running Python
code in a collaborative environment. It's built around Project Jupyter code and hosts Jupyter
notebooks.

Education Ecosystem, Colab is well suited for: Machine learning, Data analysis, Education.
Colab provides access to GPU and TPU resources. It also stores notebooks in Google Drive,
where they can be easily shared. Colab is free to use, but there are paid options for larger
computing needs.

Google Colab, short for Google Colaboratory, is a cloud-based platform provided by Google that
offers free access to computational resources like GPUs (Graphics Processing Units) and TPUs
(Tensor Processing Units). It's primarily used for writing, running, and sharing Python code in a
Jupyter Notebook environment.
Colab notebooks are stored in Google Drive and can be easily shared and collaborated on in real
time. It's particularly popular among data scientists, machine learning engineers, and researchers
for its ability to run code that requires significant computational power without the need for
powerful hardware on the user's end.

Users can access Colab through a web browser, and it supports various libraries and frameworks
commonly used in machine learning, such as TensorFlow, PyTorch, and scikit-learn.
Additionally, Colab provides integration with other Google services and allows for the
installation of additional libraries using pip or apt-get commands.

CHAPTER 7

SYSTEM REQUIREMENT SPECIFICATION


The System Requirements Specification document describes all data, functional and behavioral
requirements of the software under production or development. A functional requirement document
defines the functionality of a system or one of its subsystems. It also depends upon the type of software,
expected users and the type of system where the software is used. Non-functional
requirement is a requirement that specifies criteria that can be used to judge the operation of a system,
rather than specific behaviors.

7.1 Software Requirements

Scripting language : Python Programming


Scripting Tool : Anaconda Navigator (Jupiter Notebook) & Google Collab
Operating System : Microsoft Windows 8/10/11
Dataset : loan prediction, Heart disease and Diamond Price prediction
Packages : NumPy, Pandas, Matplotlib, Seaborn etc.

7.2 Hardware Requirements

Processor : 3.0 GHz and Above


Output Devices : Monitor (LCD)
Input Devices : Keyboard
Hard Disk : 1 TB
RAM : 8GB or Above

CHAPTER 8

OUTCOMES OF INTERNSHIP
I recently completed a comprehensive internship program at Zetacoding Innovative Solutions in
Bengaluru, spanning a duration of 3 to 4 weeks. This immersive experience has significantly
contributed to my growth, particularly in the domain of Python programming. Here are the key
outcomes of my internship:

Mastery of Python Programming

I acquired a profound understanding of the Python programming language, encompassing


syntax, libraries, and best practices. The hands-on sessions allowed me to apply theoretical
knowledge in practical scenarios, enhancing my coding skills and proficiency.

Exposure to Professional Work Environments


Working within a professional setting provided me with valuable insights into teamwork,
communication, and time management. Exposure to industry practices and standards has
broadened my perspective on software development, preparing me for real-world challenges.

Boost in Confidence and Skills

The intensive nature of the program significantly boosted my confidence in coding, making me
more self-assured in my abilities. Engaging with a professional team allowed me to refine my
skills, motivating me to pursue a career in Python programming or related fields.

Real-world Application of Knowledge:

The internship served as a bridge between academic learning and practical application, allowing
me to translate theoretical knowledge into real-world solutions. I had the opportunity to work on
a tangible project, streamlining data management for a local nonprofit organization.

Dynamic Learning Environment

The dynamic nature of the environment necessitated quick adaptation to new tools and
technologies, fostering a continuous learning mindset. Interactions with diverse team members
exposed me to various problem-solving approaches and perspectives.

Hands-on Experience in Python Coding

A standout feature was the hands-on experience gained through working on Python coding
functionalities. Collaborating with experienced professionals provided insights into coding
practices, version control, and effective problem-solving within a team.

Soft Skills Development:

Beyond technical skills, the internship cultivated essential soft skills such as communication,
time management, and effective teamwork. These skills are invaluable in any professional setting
and complement the technical expertise gained during the internship.

Portfolio Enhancement:

By the conclusion of the internship, I had a tangible project to add to my portfolio, showcasing
my capabilities and practical application of learned skills.
Reinforcement of Passion for Programming:

The enriching experience solidified my passion for programming and emphasized the importance
of continuous learning in the ever-evolving tech industry.

In summary, my internship at Zetacoding Innovative Solutions was a transformative journey that


not only honed my technical skills but also equipped me with the professional acumen required
for success in the field of Python programming. The hands-on experiences, collaborative
environment, and exposure to industry practices have left a lasting impact on my professional

growth and aspirations.

REFERENCES
 https://www.google.com/searcherarchical+clustering

 https://towardsdatas

 https://www.techtchenterpriseai/definition

 https://www.target.com/ -learning-ML

 https://www.geeksfoeks.org/loan- /

 https://towardsdatascience.cctib4d86f606

 Python in a Nutshell: A concise reference book that was awarded as one of the best

regular expression books


 Learn Python 3 the Hard Way: By Zed A. Shaw

 Automating Boring Stuff with Python: By AL Sweigart

You might also like