You are on page 1of 28

Medicaps University,Indore

Practical File

Enrollment No. : EN20CS306036


Name of Student : Amitesh Sharma
Department : Computer Science & Engineering
Faculty of : Engineering
Class : B. Tech. CSBS
Year/Sem : III year/ V Sem(Odd)
Course Name : Machine Learning
Course Code : CB3EL01
Faculty Name : Mr. Binod Kumar Mishra

1|Page EN20CS306007 Amitesh Sharma


Table of Content

S.No Name of Experiment Date Remark


1. Introduction to Python 22-08-22

2. NumPy and Pandas 29-08-22

3. Weka 05-09-22

4. R- Programming 12-09-22

5. Linear Regression Model 19-09-22

6. Support Vector Machine(SVM) 8-10-22

7. PCA 10-10-22

8. Decision Tree 17-10-22

PRACTICAL 1
2|Page EN20CS306007 Amitesh Sharma
Aim:- Introduction to Python

What is Python?
Python is a popular programming language. It was created by Guido van Rossum, and
released in 1991.

It is used for:

 web development (server-side),


 software development,
 mathematics,
 system scripting.

What can Python do?


 Python can be used on a server to create web applications.
 Python can be used alongside software to create workflows.
 Python can connect to database systems. It can also read and modify files.
 Python can be used to handle big data and perform complex mathematics.
 Python can be used for rapid prototyping, or for production-ready software
development.

Why Python?
 Python works on different platforms (Windows, Mac, Linux, Raspberry Pi, etc).
 Python has a simple syntax similar to the English language.
 Python has syntax that allows developers to write programs with fewer lines
than some other programming languages.
 Python runs on an interpreter system, meaning that code can be executed as soon
as it is written. This means that prototyping can be very quick.
 Python can be treated in a procedural way, an object-oriented way or a
functional way.

3|Page EN20CS306007 Amitesh Sharma


Python Syntax

Python Variabl
es

Python Lists

Python Tuples

Python Dictionaries

4|Page EN20CS306007 Amitesh Sharma


Python If ... Else
 Python Conditions and If statements
Python supports the usual logical conditions from mathematics:

 Equals: a == b
 Not Equals: a! = b
 Less than: a < b
 Less than or equal to: a <= b
 Greater than: a > b
 Greater than or equal to: a >= b

These conditions can be used in several ways, most commonly in "if statements" and
loops.

An "if statement" is written by using the if keyword.

 Elif
The elif keyword is pythons way of saying "if the previous conditions were not true,
then try this condition".

5|Page EN20CS306007 Amitesh Sharma


 Else
The else keyword catches anything which isn't caught by the preceding conditions.

Python While Loops
Python has two primitive loop commands:

 while loops
 for loops

The while Loop


With the while loop we can execute a set of statements as long as a condition is true.

Python For Loops
Python For Loops
A for loop is used for iterating over a sequence (that is either a list, a tuple, a
dictionary, a set, or a string).

6|Page EN20CS306007 Amitesh Sharma


This is less like the for keyword in other programming languages, and works more like
an iterator method as found in other object-orientated programming languages.

With the for loop we can execute a set of statements, once for each item in a list, tuple,
set etc.

Python Classes and Object


Python is an object oriented programming language.

Almost everything in Python is an object, with its properties and methods.

A Class is like an object constructor, or a "blueprint" for creating objects

Create a Class
To create a class, use the keyword class:

Create Object
Now we can use the class named MyClass to create objects:

7|Page EN20CS306007 Amitesh Sharma


NumPy – Creation of a universal data structure helpful in analysis and exchange of
algorithms; advanced mathematical operations on huge data sets

Pandas – Data manipulation, data analysis, data alignment, data set restructuring, and
segmentation

Scikit-Learn – Data analysis, data mining, statistical modeling

TensorFlow – Build and train neural networks; Pattern detection; Numerical


computing

PyTorch – Artificial intelligence, machine learning, and deep learning applications

These Python libraries make the implementation of AI and ML algorithms very easy.


This helps faster product development as it enables the developer to solve complex
problems without rewriting codes. Python is a binary platform-independent
programming language, which means it can run on a range of platforms and software
architectures. The developer can write the code, compile, and run on multiple
platforms. Python is popular for its high versatility and can run on any platform, be it
Windows, Macintosh, Linux, Solaris MacOS, Unix, and more. Integrating Python with
other languages like Java, .NET, C/C++, Perl, PHP, R, etc. is easy.
Machine learning deals with the study of experiences and facts and prediction is given
on the bases of intents provided. The larger the database the better the machine
learning model is. The flow of Machine Learning
 Cleaning the data
 Feeding the dataset
 Training the model
 Testing the dataset
 Implementing the model

8|Page EN20CS306007 Amitesh Sharma


PRACTICAL 2

Aim:- NumPy & Pandas

What is Pandas?
Pandas is defined as an open-source library that provides high-performance data
manipulation in Python. It is built on top of the NumPy package, which
means Numpy is required for operating the Pandas. The name of Pandas is derived
from the word Panel Data, which means an Econometrics from Multidimensional
data. It is used for data analysis in Python and developed by Wes McKinney in 2008.
Before Pandas, Python was capable for data preparation, but it only provided limited
support for data analysis. So, Pandas came into the picture and enhanced the
capabilities of data analysis. It can perform five significant steps required for
processing and analysis of data irrespective of the origin of the data, i.e., load,
manipulate, prepare, model, and analyze.

What is NumPy?
NumPy is mostly written in C language, and it is an extension module of Python. It is
defined as a Python package used for performing the various numerical computations
and processing of the multidimensional and single-dimensional array elements. The
calculations using Numpy arrays are faster than the normal Python array. The NumPy
package is created by the Travis Oliphant in 2005 by adding the functionalities of the
ancestor module Numeric into another module Numarray. It is also capable of
handling a vast amount of data and convenient with Matrix multiplication and data
reshaping.

Both the Pandas and NumPy can be seen as an essential library for any scientific
computation, including machine learning due to their intuitive syntax and high-
performance matrix computation capabilities. These two libraries are also best suited
for data science applications.

9|Page EN20CS306007 Amitesh Sharma


Difference between Pandas and NumPy:
There are some differences between Pandas and NumPy that is listed below:

10 | P a g e EN20CS306007 Amitesh Sharma


Importing library-

Defining Version-

Tuple/List-

11 | P a g e EN20CS306007 Amitesh Sharma


12 | P a g e EN20CS306007 Amitesh Sharma
13 | P a g e EN20CS306007 Amitesh Sharma
Creating Series-

14 | P a g e EN20CS306007 Amitesh Sharma


PRACTICAL 3

Aim:- Learn about Weka

WEKA – an open source software provides tools for data pre-processing,


implementation of several Machine Learning algorithms, and visualization tools so that
you can develop machine learning techniques and apply them to real-world data
mining problems. What WEKA offers is summarized in the following diagram −

WEKA is a purpose-built software platform and cloud computing environment purpose-


built for machine learning applications. With WEKA, you can harness the power of
hardware-accelerated cloud systems to drive advanced machine learning and neural
network research.
15 | P a g e EN20CS306007 Amitesh Sharma
With WEKA, you can build machine learning and AI applications with the following
features:
 Streamlined and fast cloud file systems to combine multiple sources into a
single high-performance computing system
 Industry-best GPUDirect performance (113 Gbps for a single DGX-2 and
162 Gbps for a single DGX A100)
 In-flight and at-rest encryption for governance, risk, and compliance
requirements
 Agile access and management for edge, core, and cloud development
 Scalability up to exabytes of storage across billions of files

The WEKA file system also works with Amazon Web Services (AWS), Google Cloud
Platform (GCP), Microsoft Azure, and Oracle Cloud Infrastructure (OCI) cloud
infrastructures.

Weka Machine Learning Algorithms


Weka has a lot of machine learning algorithms. This is great, it is one of the large
benefits of using Weka as a platform for machine learning.

They are divided into a number of main groups:

 bayes: Algorithms that use Bayes Theorem in some core way, like Naive Bayes.
 function: Algorithms that estimate a function, like Linear Regression.
 lazy: Algorithms that use lazy learning, like k-Nearest Neighbours.
 meta: Algorithms that use or combine multiple algorithms, like Ensembles.
 misc: Implementations that do not neatly fit into the other groups, like running a
saved model.
 rules: Algorithms that use rules, like One Rule.
 trees: Algorithms that use decision trees, like Random Forest.
The tab is called “Classify” and the algorithms are listed under an overarching group
called “Classifiers”. Nevertheless, Weka supports both classification (predict a
category) and regression (predict a numeric value) predictive modeling problems.

16 | P a g e EN20CS306007 Amitesh Sharma


1. Linear Machine Learning Algorithms
Linear algorithms assume that the predicted attribute is a linear combination of the
input attributes.

 Linear Regression: function.LinearRegression


 Logistic Regression: function.Logistic
2. Nonlinear Machine Learning Algorithms
Nonlinear algorithms do not make strong assumptions about the relationship between
the input attributes and the output attribute being predicted.

 Naive Bayes: bayes.NaiveBayes


 Decision Tree (specifically the C4.5 variety): trees.J48
 k-Nearest Neighbors (also called KNN: lazy.IBk
 Support Vector Machines (also called SVM): functions.SMO
 Neural Network: functions.MultilayerPerceptron
3. Ensemble Machine Learning Algorithms
Ensemble methods combine the predictions from multiple models in order to make
more robust predictions.

 Random Forest: trees.RandomForest


 Bootstrap Aggregation (also called Bagging): meta.Bagging
 Stacked Generalization (also called Stacking or Blending): meta.Stacking
Weka has an extensive array of ensemble methods, perhaps one of the largest available
across all of the popular machine learning frameworks.

17 | P a g e EN20CS306007 Amitesh Sharma


PRACTICAL 4

Aim:- Introduction to R Programming

R language is basically developed by statisticians to help other statisticians and


developers faster and efficiently with the data. As by now, we know that machine
learning is basically working with a large amount of data and statistics as a part of
data science the use of R language is always recommended. Therefore the R language
is mostly becoming handy for those working with machine learning making tasks
easier, faster, and innovative. Here are some top advantages of R language to
implement a machine learning algorithm in R programming.

Advantages to Implement Machine Learning Using R


Language
 It provides good explanatory code. For example, if you are at the early stage
of working with a machine learning project and you need to explain the
work you do, it becomes easy to work with R language comparison to
python language as it provides the proper statistical method to work with
data with fewer lines of code.
 R language is perfect for data visualization. R language provides the best
prototype to work with machine learning models.
 R language has the best tools and library packages to work with machine
learning projects. Developers can use these packages to create the best pre-
model, model, and post-model of the machine learning projects. Also, the
packages for R are more advanced and extensive than python language
which makes it the first choice to work with machine learning projects.

Popular R Language Packages Used to Implement Machine


Learning
 lattice: The lattice package supports the creation of the graphs displaying
the variable or relation between multiple variables with conditions.
 DataExplorer: This R package focus to automate the data visualization and
data handling so that the user can pay attention to data insights of the
project.
 Dalex(Descriptive Machine Learning Explanations): This package helps
to provide various explanations for the relation between the input variable

18 | P a g e EN20CS306007 Amitesh Sharma


and its output. It helps to understand the complex models of machine
learning
 dplyr: This R package is used to summarize the tabular data of machine
learning with rows and columns. It applies the “split-apply-combine”
approach.
 Esquisse: This R package is used to explore the data quickly to get the
information it holds. It also allows to plot bar graph, histograms, curves,
and scatter plots.
 caret: This R package attempts to streamline the process for creating
predictive models.
 janitor: This R package has functions for examining and cleaning dirty
data. It is basically built for the purpose of user-friendliness for beginners
and intermediate users.
 rpart: This R package helps to create the classification and regression
models using two-stage procedures. The resulting models are represented as
binary trees.

Application Of R in Machine Learning


There are many top companies like Google, Facebook, Uber, etc using the R
language for application of Machine Learning. The application are:
 Social Network Analytics
 To analyze trends and patterns
 Getting insights for behaviour of users
 To find the relationships between the users
 Developing analytical solutions
 Accessing charting components
 Embedding interactive visual graphics

Machine learning is a branch in computer science that studies the design of algorithms
that can learn. Typical machine learning tasks are concept learning, function learning
or “predictive modeling”, clustering and finding predictive patterns. These tasks are
learned through available data that were observed through experiences or instructions,
for example. Machine learning hopes that including the experience into its tasks will
eventually improve the learning. The ultimate goal is to improve the learning in such a
way that it becomes automatic, so that humans like ourselves don’t need to interfere
any more.

19 | P a g e EN20CS306007 Amitesh Sharma


Using R For k-Nearest Neighbours (KNN)
Step One. Get your Data

Step Two. Know your Data

Step Three. Where to go Now?

Step Four. Prepare your Workspace

Step Five. Prepare your Data

Step Six. The Actual KNN Model

Step Seven. Evaluation of your Model

Basic program of hello world!!

20 | P a g e EN20CS306007 Amitesh Sharma


21 | P a g e EN20CS306007 Amitesh Sharma
PRACTICAL 5

Aim:- Introduction about Linear Regression Model


Linear regression is one of the easiest and most popular Machine Learning algorithms.
It is a statistical method that is used for predictive analysis. Linear regression makes
predictions for continuous/real or numeric variables such as sales, salary, age,
product price, etc.

Linear regression algorithm shows a linear relationship between a dependent (y) and
one or more independent (y) variables, hence called as linear regression. Since linear
regression shows the linear relationship, which means it finds how the value of the
dependent variable is changing according to the value of the independent variable.

The linear regression model provides a sloped straight line representing the
relationship between the variables. Consider the below image:

Mathematically, we can represent a linear regression as:


y= a0+a1x+ ε

22 | P a g e EN20CS306007 Amitesh Sharma


Here,

Y= Dependent Variable (Target Variable)


X= Independent Variable (predictor Variable)
a0= intercept of the line (Gives an additional degree of freedom)
a1 = Linear regression coefficient (scale factor to each input value).
ε = random error

The values for x and y variables are training datasets for Linear Regression model
representation.

Types of Linear Regression


Linear regression can be further divided into two types of the algorithm:

o Simple Linear Regression:


If a single independent variable is used to predict the value of a numerical
dependent variable, then such a Linear Regression algorithm is called Simple
Linear Regression.

o Multiple Linear regression:


If more than one independent variable is used to predict the value of a numerical
dependent variable, then such a Linear Regression algorithm is called Multiple
Linear Regression.

Linear Regression Line


A linear line showing the relationship between the dependent and independent
variables is called a regression line. A regression line can show two types of
relationship:

o Positive Linear Relationship:


If the dependent variable increases on the Y-axis and independent variable
increases on X-axis, then such a relationship is termed as a Positive linear
relationship.

23 | P a g e EN20CS306007 Amitesh Sharma


o Negative Linear Relationship:
If the dependent variable decreases on the Y-axis and independent variable
increases on the X-axis, then such a relationship is called a negative linear
relationship.

Implementation: -
To implement the Simple Linear regression model in machine learning

using Python, we need to follow the below steps:


Step-1: Data Pre-processing

Step-2: Fitting the Simple Linear Regression to the Training

set Step: 3. Prediction of test set result

Step: 4. visualizing the Training set results

24 | P a g e EN20CS306007 Amitesh Sharma


25 | P a g e EN20CS306007 Amitesh Sharma
PRACTICAL 6

Aim:- Support Vector Machine

Support Vector Machine or SVM is one of the most popular Supervised


Learning algorithms, which is used for Classification as well as Regression
problems. However, primarily, it is used for Classification problems in Machine
Learning.
The goal of the SVM algorithm is to create the best line or decision boundary
that can segregate n-dimensional space into classes so that we can easily put the
new data point in the correct category in the future. This best decision boundary
is called a hyperplane.
SVM chooses the extreme points/vectors that help in creating the hyperplane.
These extreme cases are called as support vectors, and hence algorithm is
termed as Support Vector Machine.

SVM algorithm can be used for Face detection, image classification, text


categorization, etc.

26 | P a g e
EN20CS306036 Mukta Gupta
Types of SVM

1. Linear SVM

2. Non-Linear SVM

o Linear SVM: Linear SVM is used for linearly separable data, which


means if a dataset can be classified into two classes by using a single
straight line, then such data is termed as linearly separable data, and
classifier is used called as Linear SVM classifier.

o Non-linear SVM: Non-Linear SVM is used for non-linearly separated


data, which means if a dataset cannot be classified by using a straight
line, then such data is termed as non-linear data and classifier used is
called as Non-linear SVM classifier.

27 | P a g e
EN20CS306036 Mukta Gupta
28 | P a g e
EN20CS306036 Mukta Gupta

You might also like