Professional Documents
Culture Documents
Zeta Coding Innovative Solutions is located in Bengaluru, Karnataka, India. We are service-
based company for Software Development and Training. Zetacoding has established itself as one
of the leaders in providing quality software solutions and services across a wide-ranging
technology to numerous software applications. Our professionals are well qualified, experienced
and proficient to deliver the quality product on time. We provide end-to-end solutions to Project
Management, Analysis, Development, Deployment and continuous support to the customer’s
satisfactions. Our motto is to provide future technology training and Project development skills.
MISSION
To help our future engineers to understand the upfront research challenges of fast emerging
technology age through advance techno-skilled education.
VISION
All our Engineers are endowed to E-Learning, Technology adaptive, self-growth & futuristic
skills and making them prepared to the innovative world and generating leaders for tomorrow.
VALUES
Presentation Skills
Career management
Phase management
Employee guidance
Project completion
Communication skills
Diplomacy & Teamwork
Positive attitude
Team Work
CHAPTER 2
As an intern, don’t expect to spearhead a critical project right off the bat…at least not yet. In the
beginning of your internship, you may your time simply trying to learn how the company works.
You may shadow an employee to get an understanding of their role. After a day or a few days of
learning the ins-and-outs of the company. You’ll start to assist and contribute more to the team.
Week 1 Activities
Induction Program to the Internship Program
Week 2 Activities
Inheritance, Polymorphism & Encapsulations
Advance Packages in Python Numpy and Pandas
Introduction to Artificial Intelligence & Machine Learning
Introduction to Machine Learning , Applications and Model development
Supervised & Unsupervised Learning
Classification and Regression
KNN with explanation and Code Implementation
Random Forest Algorithm with explanation and Code Implementation
Week 3 Activities
Mini Project 1: Loan Approval Prediction
Mini Project 2: Diamond Price Prediction
Mini Project 3: Heart Disease Prediction
Industry Ready Preparation- Roadmap to the Interviews Success
In a machine learning internship, we learnt learn a variety of valuable skills and gain hands-on
experience in several areas:
Data Preparation: Learning how to clean, pre-process, and manipulate data for machine learning
models. This involves tasks like data cleaning, feature engineering, and normalization.
Model Development and Evaluation: Building machine learning models, tuning hyper
parameters, and assessing their performance using various evaluation metrics.
Career Guidelines: During the Internship, from the company they have conducted the Career
Guidelines program, which was very informative and we have taken lot of inputs from it and
adopting in our upcoming semesters.
Finally, this internship offered a chance to learn and grow, so being proactive, asking questions,
and seeking mentorship can greatly enhance your learning experience
CHAPTER 3
ABOUT THE INTERNSHIP
Internships benefit both the student and the employee. On-the-job learning reinforces what you
see in the classroom and teaches invaluable skills like time management, communication,
working with others, problem-solving, and, most importantly, the willingness to learn. For
employers, you can build relationships and prepare future employees.
Internships help individuals develop and enhance specific skills relevant to their field of study or
career interests. This can include technical skills, communication skills, problem-solving
abilities, and more.
Internships provide an opportunity for individuals to explore different career paths within their
field of study. This hands-on experience can help interns clarify their career goals and make
informed decisions about their future.
Successful internships can lead to positive references and recommendations from supervisors
and colleagues. These references can be valuable when applying for future jobs or graduate
programs.
Internships provide a deep dive into the workings of a specific industry, giving interns a better
understanding of industry trends, challenges, and best practices.
CHAPTER 4
Python is dynamically typed and supports automatic memory management. It has a large and
comprehensive standard library.
Python is a versatile and widely used programming language known for its readability,
simplicity, and flexibility. It was created by Guido van Rossum and first released in 1991. Python
is an interpreted, high-level programming language that supports multiple programming
paradigms, including procedural, object-oriented, and functional programming. Here's a brief
introduction to key aspects of Python programming:
Readability
One of Python's strengths is its clean and readable syntax. The language emphasizes code
readability, and its syntax allows developers to express concepts in fewer lines of code than
might be possible in other languages. This readability is facilitated by the use of indentation to
define code blocks, eliminating the need for braces or other delimiters.
Versatility
Python is a general-purpose language, meaning it can be used for a wide range of applications. It
is commonly used in web development, data science, artificial intelligence, machine learning,
automation, scientific computing, and more. The versatility of Python has contributed to its
popularity and widespread adoption.
Interpreted Language
Python is an interpreted language, which means that code is executed line by line by the Python
interpreter. This makes the development process more interactive and allows for quick testing
and debugging.
Object-Oriented
Python supports object-oriented programming (OOP), allowing developers to structure their code
using classes and objects. This paradigm promotes modularity, reusability, and a more organized
approach to software development
Easy to learn
Python is a very easy language to learn, especially for beginners. It has a simple syntax that is
easy to read and write.
Python is a free and open-source language, which means that it is free to use and distribute. It is
also supported by a large and active community of developers.
Python is a general-purpose programming language that can be used for a wide range of tasks,
including web development, data science, machine learning, and more.
Python has a large and active community of developers who contribute to the language's
development and provide support to other users.
Here are some additional features of Python that make it a popular choice for programmers:
Dynamic Typing:
Python is dynamically typed, meaning that variable types are determined at runtime. This
provides flexibility but also requires careful attention to variable types during development to
avoid unexpected behavior.
Python comes with a rich standard library that includes modules and packages for a wide range
of functionalities. This reduces the need for developers to write code from scratch for common
tasks, as they can leverage existing modules.
Python has a large and active community of developers. This community contributes to the
language's growth, development, and the creation of numerous third-party libraries and
frameworks. Additionally, Python has extensive documentation, making it easy for developers to
find resources and solutions to problems.
Cross-Platform Compatibility:
Python is designed to be cross-platform, meaning that Python code can run on various operating
systems without modification. This portability makes it easier to develop and deploy applications
across different environments.
Open Source:
Python is an open-source language, allowing developers to access and modify the source code.
This fosters collaboration and continuous improvement within the Python community.Support
for third-party libraries:
Python has a large and active ecosystem of third-party libraries, which can be used to add
additional functionality to Python applications.
Artificial intelligence (AI) is the ability of a computer program or machine to think and learn. It's
a field of study in computer science that focuses on developing intelligent machines. AI is also
known as machine intelligence.
Narrow or Weak AI
General or Strong AI
Expert systems
Speech recognition
Machine vision
Fig 4.2 Human Robot
Supervised learning uses a training set to teach models to yield the desired output. This training
dataset includes inputs and correct outputs, which allow the model to learn over time. The
algorithm measures its accuracy through the loss function, adjusting until the error has been
sufficiently minimized.
The prediction task is a classification when the target variable is discrete. An application is the
identification of the underlying sentiment of a piece of text.
The prediction task is a regression when the target variable is continuous. An example can be the
prediction of the salary of a person given their education degree, previous work experience,
geographical location, and level of seniority.
Classification
Classification is a type of supervised learning in machine learning where the goal is to assign
predefined labels or categories to input data based on its features. The algorithm learns from a
labelled training dataset, where each data point is associated with a known class or category. The
trained model can then be used to predict the class of new, unseen data.
Predicts discrete labels or categories. For example, classification can predict if something is true
or false, male or female, or spam or not spam.
Classes or Labels:
In a classification problem, there are predefined classes or labels that the algorithm aims to
predict. For example, in a spam detection task, the classes might be "spam" and "not spam."
Features:
Features are the characteristics or attributes of the input data that the algorithm uses to make
predictions. The combination of features represents the input data in the feature space.
Training Data:
The training dataset consists of labeled examples used to train the classification model. Each
example includes both the input features and the corresponding class label.
Model:
The model is the mathematical representation or algorithm used to map input features to class
labels. Different classification algorithms use different approaches, such as decision trees,
support vector machines, logistic regression, or neural networks.
Decision Boundary:
The decision boundary is the dividing line or surface that separates different classes in the feature
space. The model uses this boundary to classify new, unseen data points.
Training Phase:
During the training phase, the algorithm adjusts its internal parameters based on the labeled
training data. The goal is to create a model that generalizes well to new, unseen data.
Prediction Phase:
Once trained, the model can be used to predict the class labels of new data points. The input
features are fed into the model, and the output is the predicted class.
Evaluation Metrics:
Classification models are evaluated using various metrics, depending on the specific problem.
Common metrics include accuracy, precision, recall, F1 score, and area under the Receiver
Operating Characteristic (ROC) curve.
Multi-Class Classification:
In some cases, there are more than two classes to predict. Multi-class classification involves
assigning one of several possible classes to each data point.
Imbalanced Classes:
Imbalanced classes occur when the number of instances in different classes is not evenly
distributed. Techniques such as oversampling, under sampling, or using different evaluation
metrics may be employed to address imbalanced class issues.
Hyperparameter Tuning:
Many classification algorithms have hyperparameters that need to be set before training.
Hyperparameter tuning involves finding the optimal values for these parameters to improve
model performance.
Cross-Validation:
Classification is widely applied across various domains, including spam detection, image
recognition, sentiment analysis, medical diagnosis, and many other tasks where the goal is to
assign predefined categories to input data based on its features.
Figure 4.4 classification model
Regression
Regression is a statistical method used in finance, investing, and other disciplines that attempts to
determine the strength and character of the relationship between one dependent variable (usually
denoted by Y) and a series of other variables (known as independent variables).
Also called simple regression or ordinary least squares (OLS), linear regression is the most
common form of this technique. Linear regression establishes the linear relationship between two
variables based on a line of best fit. Linear regression is thus graphically depicted using a straight
line with the slope defining how the change in one variable impacts a change in the other. The y-
intercept of a linear regression relationship represents the value of one variable when the value of
the other is zero. Non-linear regression models also exist, but are far more complex.
Regression analysis is a powerful tool for uncovering the associations between variables
observed in data, but cannot easily indicate causation. It is used in several contexts in business,
finance, and economics. For instance, it is used to help investment managers value assets and
understand the relationships between factors such as commodity prices and the stocks of
businesses dealing in those commodities
Predicts continuous numerical values. For example, regression can predict home values, market
trends, or a condominium's selling price.
Explanation: SVM is a powerful algorithm for classification. It works by finding the hyperplane
that best separates different classes in the feature space. The hyperplane is chosen to maximize
the margin between classes.
Hyperplane:
In a two-dimensional space, a hyperplane is a line that separates data into two classes. In higher-
dimensional spaces, it becomes a hyperplane. The goal of SVM is to find the hyperplane that
maximizes the margin between classes.
Margin:
The margin is the distance between the hyperplane and the nearest data point from either class.
SVM aims to find the hyperplane with the maximum margin, providing a robust separation
between classes.
Support Vectors:
Support vectors are the data points that lie closest to the hyperplane and influence its position.
These are the critical points for determining the margin and the optimal hyperplane.
Kernel Trick:
SVM can handle non-linear decision boundaries through a technique called the kernel trick.
Kernels transform the input data into a higher-dimensional space, making it possible to find a
hyperplane in that space. Common kernel functions include linear, polynomial, and radial basis
function (RBF) kernels
SVM
In the figure, the hyperplane is the line separating the two classes, and the dotted lines represent
the margins. SVM aims to find the hyperplane with the maximum margin.
4.5.2 Random Forest:
Bootstrapping:
Random Forest employs a technique called bootstrapping, where multiple subsets of the training
data are created by randomly sampling with replacement. Each decision tree is then built on one
of these subsets.
Feature Randomness:
Not all features are used to split each node of the decision tree. At each node, a random subset of
features is considered for the split. This further decorrelates the trees in the forest.
For classification tasks, the class that receives the majority of votes from the individual trees is
the final predicted class. For regression tasks, the predictions from all trees are averaged to get
the final output.
Figure 4.6 Random forest example
Robustness:
Random Forest is known for its robustness and resistance to overfitting. The combination of
multiple trees with different subsets of data helps to generalize well to new, unseen data.
Feature Importance:
Random Forest can provide an estimate of the importance of each feature in making accurate
predictions. This can be useful for feature selection and understanding the underlying
relationships in the data.
Random Forest is widely used in practice due to its flexibility, ease of use, and strong
performance across various types of data. It is applicable in areas such as finance, healthcare, and
remote sensing, among others. If you have specific questions or if there's a particular aspect of
Random Forest you'd like to know more about, feel free to ask!
Linear Regression:
Explanation: Linear Regression is a simple and widely used regression algorithm. It models the
relationship between the dependent variable and one or more independent variables by fitting a
linear equation to observed data.
In the figure, the blue line represents the linear regression model, and the points are the actual
data. The algorithm aims to find the line that minimizes the sum of squared differences between
the predicted and actual values.
A Decision Tree is a supervised machine learning algorithm used for both classification and
regression tasks. It works by recursively partitioning the data into subsets based on the values of
different features. The goal is to create a tree-like model that makes decisions at each node
regarding the predicted output.
Here are the key components and concepts associated with Decision Trees:
Root Node:
The topmost node in a decision tree is called the root node. It represents the entire dataset and is
split into subsets based on the values of a selected feature.
These are the nodes where the dataset is split based on a particular feature. Each decision node
represents a decision based on the value of a specific feature.
Leaf nodes are the final nodes in a decision tree. They represent the predicted output or class.
Each leaf node is associated with a specific outcome.
Branches:
The branches of a decision tree connect the nodes and represent the decision path. They show
how the data is split based on the values of features.
Splitting:
The process of dividing a node into subsets based on the values of a chosen feature is called
splitting. The goal is to create homogenous subsets that are more predictive of the target variable.
In classification tasks, Decision Trees use metrics like entropy and information gain to decide the
best feature for splitting. Entropy measures the impurity or disorder in a dataset, and information
gain quantifies the effectiveness of a split in reducing entropy
Figure 4.8 Decision Tree Example
Clustering: This involves grouping similar data points together in a dataset based on certain
features or characteristics. Algorithms like K-means, hierarchical clustering, and DBSCAN are
used to partition data into distinct groups.
Unsupervised learning is valuable when dealing with large datasets lacking labeled information
or when exploring data to derive insights, identify patterns, or preprocess data before applying
supervised learning techniques.
K-Means Clustering:
Explanation: K-Means is a popular clustering algorithm that partitions data into K clusters based
on similarity. It minimizes the sum of squared distances between data points and the centroid of
their assigned cluster.
K-Means Clustering: The figure shows the iterative process of K-Means clustering. The
algorithm starts with initial centroids and assigns data points to the nearest centroid. It then
updates the centroids and repeats until convergence.
Figure 4.9 K-means Clustering
Hierarchical Clustering:
Hierarchical Clustering
The figure illustrates agglomerative hierarchical clustering. Initially, each data point is a cluster.
The algorithm then merges the closest clusters iteratively until all points belong to a single
cluster.
Dimensionality Reduction Algorithm:
Explanation: PCA is a dimensionality reduction technique that transforms data into a new
coordinate system, where the most significant variance lies along the first few principal
components. It helps capture the essential features while reducing the dimensionality.
PCA
In the figure, the red lines represent the principal components. PCA identifies the directions
(principal components) along which the data varies the most, reducing the dimensionality while
retaining the most important information.
These unsupervised learning algorithms are valuable for discovering patterns in data, grouping
Anomaly detection
It is also known as outlier detection, is a technique in machine learning used to identify patterns
or instances that deviate significantly from the norm within a dataset. These instances are
considered anomalies or outliers because they do not conform to the expected behavior of the
majority of the data. Anomaly detection is applied in various fields, including cybersecurity,
finance, healthcare, and industrial systems, where detecting unusual events or behaviors is
crucial.
Here are some common approaches and techniques for anomaly detection:
Statistical Methods:
Z-Score or Standard Score: This method measures how many standard deviations a data point is
from the mean. Data points with a z-score above a certain threshold are considered anomalies.
The interquartile range is the range between the first quartile (25th percentile) and the third
quartile (75th percentile) of the data. Data points outside a defined range are considered outliers.
Distance-Based Methods:
Euclidean Distance: Calculate the distance of each data point from the centroid or mean of the
dataset. Points with distances above a threshold are flagged as anomalies.
Mahala Nobis Distance: It accounts for correlations between variables and is particularly useful
when dealing with multivariate data.
CHAPTER 5
Introduction
LOANS are the major requirement of the modern world. By this only, Banks get a major part of
the total profit. It is beneficial for students to manage their education and living expenses, and for
people to buy any kind of luxury like houses, cars, etc.
But when it comes to deciding whether the applicant’s profile is relevant to be granted with loan
or not. Banks have to look after many aspects.
So, here we will be using Machine Learning with Python to ease their work and predict whether
the candidate’s profile is relevant or not using key features like Marital Status, Education,
Applicant Income, Credit History, etc.
Gender: Gender may influence loan approval decisions in some regions or cultures due to
historical biases or societal norms. Some models may consider gender-neutral factors instead to
promote fairness.
Marital Status (Married): Married individuals might be seen as more stable and responsible,
which could positively impact loan approval chances.
Dependents: The number of dependents can affect an applicant's ability to repay a loan. More
dependents may indicate higher financial responsibilities and potentially reduce loan approval
chances.
Education: Applicants with higher education levels may have better job prospects and income
potential, potentially increasing their chances of loan approval.
Self-Employed: Self-employed individuals might face different approval criteria than those who
are employed by others, as their income can be less stable.
Algorithm used:
Accuracy comparison:
Predictions
Machine Learning is used across many ranges around the world. The healthcare industry is no
exclusion. Machine Learning can play an essential role in predicting presence/absence of
locomotors disorders, drug prediction and diamond price prediction and more.
A diamond's table refers to the flat facet of the diamond seen when the stone is face up.
The main purpose of a diamond table is to refract entering light rays and allow reflected light
rays from within the diamond to meet the observer’s eye.
Algorithms used:
Accuracy comparisons:
Prediction
5.3 Heart Disease Prediction
Introduction
Many factors, such as diabetes, high blood pressure, high cholesterol, and abnormal pulse rate,
need to be considered when predicting heart disease. Often, the medical data available need to be
completed, affecting the results in predicting heart disease. Machine learning plays a crucial role
in the medical field. High blood cholesterol is defined as having too much cholesterol—a waxy,
fatty substance—in the blood. Having either high LDL cholesterol (“bad” cholesterol) or low
HDL cholesterol (“good” cholesterol)—or both—is one of the best predictors of your risk of
heart disease
Dataset
Accuracy Comparison
Prediction
CHAPTER 6
TOOLS LEARNT
6.1 Jupyter Notebook (Anaconda Navigator)
A Jupyter file browser will open in a web browser tab. A new notebook will open as a new tab in
your web browser. Jupyter Notebook is an interactive web-based environment for creating and
sharing documents that contain live code, equations, visualizations, and narrative text. It is used
by data scientists, engineers, and students for a variety of tasks, including data cleaning and
analysis, machine learning, and scientific computing. Anaconda Navigator is a graphical user
interface for managing Anaconda installations and packages. It makes it easy to install, update,
and manage Python packages, including Jupyter Notebook.
Google Colab, or Colaboratory, is a free, cloud-based platform for writing and running Python
code in a collaborative environment. It's built around Project Jupyter code and hosts Jupyter
notebooks.
Education Ecosystem, Colab is well suited for: Machine learning, Data analysis, Education.
Colab provides access to GPU and TPU resources. It also stores notebooks in Google Drive,
where they can be easily shared. Colab is free to use, but there are paid options for larger
computing needs.
Google Colab, short for Google Colaboratory, is a cloud-based platform provided by Google that
offers free access to computational resources like GPUs (Graphics Processing Units) and TPUs
(Tensor Processing Units). It's primarily used for writing, running, and sharing Python code in a
Jupyter Notebook environment.
Colab notebooks are stored in Google Drive and can be easily shared and collaborated on in real
time. It's particularly popular among data scientists, machine learning engineers, and researchers
for its ability to run code that requires significant computational power without the need for
powerful hardware on the user's end.
Users can access Colab through a web browser, and it supports various libraries and frameworks
commonly used in machine learning, such as TensorFlow, PyTorch, and scikit-learn.
Additionally, Colab provides integration with other Google services and allows for the
installation of additional libraries using pip or apt-get commands.
CHAPTER 7
CHAPTER 8
OUTCOMES OF INTERNSHIP
I recently completed a comprehensive internship program at Zetacoding Innovative Solutions in
Bengaluru, spanning a duration of 3 to 4 weeks. This immersive experience has significantly
contributed to my growth, particularly in the domain of Python programming. Here are the key
outcomes of my internship:
The intensive nature of the program significantly boosted my confidence in coding, making me
more self-assured in my abilities. Engaging with a professional team allowed me to refine my
skills, motivating me to pursue a career in Python programming or related fields.
The internship served as a bridge between academic learning and practical application, allowing
me to translate theoretical knowledge into real-world solutions. I had the opportunity to work on
a tangible project, streamlining data management for a local nonprofit organization.
The dynamic nature of the environment necessitated quick adaptation to new tools and
technologies, fostering a continuous learning mindset. Interactions with diverse team members
exposed me to various problem-solving approaches and perspectives.
A standout feature was the hands-on experience gained through working on Python coding
functionalities. Collaborating with experienced professionals provided insights into coding
practices, version control, and effective problem-solving within a team.
Beyond technical skills, the internship cultivated essential soft skills such as communication,
time management, and effective teamwork. These skills are invaluable in any professional setting
and complement the technical expertise gained during the internship.
Portfolio Enhancement:
By the conclusion of the internship, I had a tangible project to add to my portfolio, showcasing
my capabilities and practical application of learned skills.
Reinforcement of Passion for Programming:
The enriching experience solidified my passion for programming and emphasized the importance
of continuous learning in the ever-evolving tech industry.
REFERENCES
https://www.google.com/searcherarchical+clustering
https://towardsdatas
https://www.techtchenterpriseai/definition
https://www.target.com/ -learning-ML
https://www.geeksfoeks.org/loan- /
https://towardsdatascience.cctib4d86f606
Python in a Nutshell: A concise reference book that was awarded as one of the best