You are on page 1of 8

Report On Project on Seeds (MachineLearning

Project using Python)

INT 354
(Advanced Machine Learning)

Submitted By:- Bongula Mokshagna Tarakram


Reg No.: 12010441
Section.: KM037
Roll No.: RKM371B51

Submitted To :- Dr Dhanpratap Singh (25706)


Introduction: -
The examined group comprised kernels belonging to three different varieties of
wheat: Kama, Rosa and Canadian, 70 elements each, randomly selected for
the experiment. High quality visualization of the internal kernel structure was
detected using a soft X-ray technique. It is non-destructive and considerably
cheaper than other more sophisticated imaging techniques like scanning
microscopy or laser technology. The images were recorded on 13x18 cm X-ray
KODAK plates. Studies were conducted using combine harvested wheat grain
originating from experimental fields, explored at the Institute of Agrophysics of the
Polish Academy of Sciences in Lublin
Dataset Used:-
For this project, I selected a UCI dataset which is a Measurements of geometrical
properties of kernels belonging to three different varieties of wheat. A soft X-ray
technique and GRAINS package were used to construct all seven, real-valued
attributes
Dataset Link :- http://archive.ics.uci.edu/ml/datasets/seeds#

Used Libraries :-
Numpy :- NumPy is a library for the Python programming language, adding
support for large, multi-dimensional arrays and matrices, along with a large
collection of high-level mathematical functions to operate on these arrays.

Keras:- Keras is an open-source software library that provides a Python interface


for artificial neural networks. Keras acts as an interface for the TensorFlow library.
Up until version 2.3, Keras supported multiple backends, including TensorFlow,
Microsoft Cognitive Toolkit, Theano, and PlaidML.

Logistic Regression :- The basic setup of logistic regression is as follows. We are


given a dataset containing N points. Each point i consists of a set of m input
variables x1,i ... xm,i (also called independent variables, explanatory variables,
predictor variables, features, or attributes), and a binary outcome variable Yi (also
known as a dependent variable, response variable, output variable, or class), i.e. it
can assume only the two possible values 0 (often meaning "no" or "failure") or 1
(often meaning "yes" or "success"). The goal of logistic regression is to use the
dataset to create a predictive model of the outcome variable.
SVM:- Support Vector Machine or SVM is one of the most popular Supervised
Learning algorithms, which is used for Classification as well as Regression problems.
However, primarily, it is used for Classification problems in Machine Learning.

The goal of the SVM algorithm is to create the best line or decision boundary that
can segregate n-dimensional space into classes so that we can easily put the new
data point in the correct category in the future. This best decision boundary is called
a hyperplane.

Gaussian-NaïveBaised Classifier:- Gaussian Naïve Bayes is the extension of naïve Bayes.


While other functions are used to estimate data distribution, Gaussian or normal
distribution is the simplest to implement as you will need to calculate the mean and
standard deviation for the training data .

The Gaussian probability density function can be used to make predictions by


substituting the parameters with the new input value of the variable and as a result,
the Gaussian function will give an estimate for the new input value’s probability.

K-Nearest Neighbor(KNN) Algorithm:- K-Nearest Neighbour is one of the simplest


Machine Learning algorithms based on Supervised Learning technique.K-NN
algorithm assumes the similarity between the new case/data and available cases
and put the new case into the category that is most similar to the available
categories.K-NN algorithm stores all the available data and classifies a new data
point based on the similarity. This means when new data appears then it can be
easily classified into a well suite category by using K- NN algorithm.K-NN algorithm
can be used for Regression as well as for Classification but mostly it is used for the
Classification problems.K-NN is a non-parametric algorithm, which means it does
not make any assumption on underlying data.

Perceptron:- Perceptron is a linear Machine Learning algorithm used for supervised


learning for various binary classifiers. This algorithm enables neurons to learn
elements and processes them one by one during preparation. It is the primary step
to learn Machine Learning and Deep Learning technologies, which consists of a set
of weights, input values or scores, and a threshold. Perceptron is also understood as
an Artificial Neuron or neural network unit that helps to detect certain input data
computations in business intelligence.

Perceptron model is also treated as one of the best and simplest types of Artificial
Neural networks. However, it is a supervised learning algorithm of binary classifiers.
Hence, we can consider it as a single-layer neural network with four main
parameters, i.e., input values, weights and Bias, net sum, and an activation function.

Proposed Architecture:-
Decision tree Algorithm:- Decision Tree is a Supervised learning
technique that can be used for both classification and Regression problems, but
mostly it is preferred for solving Classification problems. It is a tree-structured
classifier, where internal nodes represent the features of a dataset, branches
represent the decision rules and each leaf node represents the outcome.

o In a Decision tree, there are two nodes, which are the Decision Node and Leaf
Node. Decision nodes are used to make any decision and have multiple branches,
whereas Leaf nodes are the output of those decisions and do not contain any
further branches.
o The decisions or the test are performed on the basis of features of the given
dataset.
o It is a graphical representation for getting all the possible solutions to a
problem/decision based on given conditions.
o It is called a decision tree because, similar to a tree, it starts with the root node,
which expands on further branches and constructs a tree-like structure.
o In order to build a tree, we use the CART algorithm, which stands for Classification
and Regression Tree algorithm.
o A decision tree simply asks a question, and based on the answer (Yes/No), it further
split the tree into subtrees.
o Below diagram explains the general structure of a decision tree:
LSTM :- Long short-term memory (LSTM) is an artificial recurrent neural
network (RNN) architecture[1] used in the field of deep learning. Unlike
standard feedforward neural networks, LSTM has feedback connections. It can
process not only single data points (such as images), but also entire sequences of
data (such as speech or video). For example, LSTM is applicable to tasks such as
unsegmented, connected handwriting recognition,[2] speech recognition[3][4] and
anomaly detection in network traffic or IDSs (intrusion detection systems).
A common LSTM unit is composed of a cell, an input gate, an output gate and
a forget gate. The cell remembers values over arbitrary time intervals and the
three gates regulate the flow of information into and out of the cell.

Results and Experimental analysis


The below I perform the data vectorization where I read the file containing the
three different varieties of wheat. A soft X-ray technique and GRAINS package were
used to construct all seven, real-valued attributes
Conclusion and Future Scope
Many competitors have improved its capacity significantly using artificial
intelligence. And several companies will integrate similar software into their
workflows, to deliver precise classification faster, for broader audiences. At the
moment, machine translation (MT) still has difficulties with classifying into many
of the 7,000 varieties. Mastering artificial intelligence and deep learning will
create a new generation of the translation software. One that delivers more
accurate versions of the original content, in more varities. The future of
classification will cover more types, as the internet continues to penetrate
emerging varities worldwide.
The software of the future is easy to use, has a minimal design, and can connect
in seconds with complementary technology.

References
https://blog.keras.io/a-ten-minute-introduction-to-sequence-to-sequence-
learning-in-keras.html
https://www.kaggle.com/jannesklaas/frenchenglish-bilingual-
pairs?select=fra.txt
https://faroit.com/keras-docs/2.0.8/optimizers/#adam
https://www.analyticsvidhya.com/blog/2020/08/a-simple-introduction-to-
sequence-to-sequence-models/
https://en.wikipedia.org/wiki/Long_short-term_memory

You might also like