Hidden Layers in Neural Networks

D20MCA11140
Section A
Ques. 1: What is the role of the hidden layers in a neural network?
Ans. 1:
A hidden layer in an artificial neural network is a layer in between input layers and output layers, where
artificial neurons take in a set of weighted inputs and produce an output through an activation function. It
is a typical part of nearly any neural network in which engineers simulate the types of activity that go on
in the human brain.
Hidden neural network layers are set up in many different ways. In some cases, weighted inputs are
randomly assigned. In other cases, they are fine-tuned and calibrated through a process called
backpropagation.
Either way, the artificial neuron in the hidden layer works like a biological neuron in the brain – it takes
in its probabilistic input signals, works on them and converts them into an output corresponding to the
biological neuron’s axon.
Many analyses of machine learning models focus on the construction of hidden layers in the neural
network. There are different ways to set up these hidden layers to generate various results – for instance,
convolutional neural networks that focus on image processing, recurrent neural networks that contain an
element of memory and simple feedforward neural networks that work in a straightforward way on
training data sets.
First of all, Hidden layer in artificial neural networks a layer of neurons, whose output is connected to the
inputs of other neurons and therefore is not visible as a network output.
Now, let me explain the role of the hidden layers on the following example: There is a well-known
problem of facial recognition, where computer learns to detect human faces.
Human face is a complex object, it must have eyes, a nose, a mouth, and to be in a round shape, for
computer it means that there are a lot of pixels of different colors that are comprised in different shapes.
And in order to decide whether there is a human face on a picture, computer has to detect all those objects.
The role of the Hidden Layers is to identify features from the input data and use these to correlate
1
D20MCA11140
between a given input and the correct output.
There is a well-known problem of facial recognition, where computer learns to detect human faces.
Human face is a complex object, it must have eyes, a nose, a mouth, and to be in a round shape, for
computer it means that there are a lot of pixels of different colors that are comprised in different shapes.
And in order to decide whether there is a human face on a picture, computer has to detect all those objects.
The hidden layers will break down our input image in order to identify features present in the image.
The initial layers focus on low-level features such as edges while the later layers progressively get more
abstract. At the end of all the layers, we have a fully connected layer with neurons for each of our
classification values.
2
D20MCA11140
Ques. 2: Illuminate the need for sigmoid function.
Ans. 2: Sigmoid functions are popularly used in neural networks and deep learning algorithms because of
their uses as activation functions. For Ex: Biological neural networks activation.
For the actual formulae of sigmoid-functions, one would need to understand logistic regression in the
sigmoid function equation and involves a lot of mathematics.
Consider a mathematical function with the S (Sigma)-shaped sigmoid curve being called a sigmoid
function for brevity. Common functions are the Hyperbolic, logistic, and arctangent sigmoid functions. In
machine learning, the term refers to the sigmoid logistic function.
Looking at the key properties of sigmoid-functions, one can see that probability is linked to the
convergence of the functions and is very fast in logistic functions, very slow in the arctan function and
very fast in the tan hyperbolic functions.
These functions are used for deducing probability because they map 2 classes by converting the data to
small ranges between 1 and 0 using sigmoid values wherein the output can read the probability of an
event’s occurrence. They always have the first derivative of sigmoid-function curve that is bell-shaped
and are monotonic functions.
The various types of sigmoid graphs are
1. Logistic Sigmoid Function Formula: The most commonly used sigmoid function in ML works with
inputs of any real-value with its output value being between one and zero.
2. Hyperbolic Tangent Function Formula: The hyperbolic function is used when the input values are real
and range between 1 and -1.
3. Arctangent Function Formula: The arctangent function or inverse of the tangent function is also very
popular and used if the real-value of inputs lies between π/2 and −π/2.
APPLICATIONS OF SIGMOID FUNCTION
 Logistic regression models for probability prediction: The logistic regression model of sigmoid-
functions are used in machine learning to estimate the binary event’s probability with a probability value
output between 1 and 0. This means that the dependent variable is either 1 or 0, while the independent
variables can have any real value when fit to a dataset. For Ex: Choose a dataset of diagnoses and tumour
measurements where one needs to predict the tumour spread based on its size in cm. A plot shows that
generally, large tumours spread faster, and overlap in classes is found in tumours between 2.5-3.5 cms. If
the model plots using logistic regression, the tumour status on y (1 and 0) with tumour size x (any real
3
D20MCA11140
value) by finding the best values for b and m, the sigmoid curve can be stretched to fit the data. Such a
model shows from plots that tumours of 4cm had near-certainty of spread with y = 1. Thus sigmoid
logistic functions can be very useful in modelling for probability.
 Artificial neural networks using a sigmoid function for activation: In artificial neural networks, there
are several functional layers on top of each other. These layers have biases, weights and an activation
function. The sigmoid activation function introduces non-linearity between its layers. In the past, sigmoid
functions served well in activating neural networks that were biological and function like the arctangent,
logistic function, hyperbolic tangent etc., found many uses. In the modern world, variants of ReLU are
used for activation by sigmoid functions.
4
D20MCA11140
Ques. 3: Differentiate between supervised and unsupervised learning.

Ans. 3:
Supervised Machine Learning:

Supervised learning is a machine learning method in which models are trained using labeled data. In
supervised learning, models need to find the mapping function to map the input variable (X) with the
output variable (Y).
Supervised learning needs supervision to train the model, which is similar to as a student learns things in
the presence of a teacher. Supervised learning can be used for two types of
problems: Classification and Regression.
Unsupervised Machine Learning:

Unsupervised learning is another machine learning method in which patterns inferred from the unlabeled
input data.
The goal of unsupervised learning is to find the structure and patterns from the input data. Unsupervised
learning does not need any supervision. Instead, it finds patterns from the data by its own.
Supervised Learning Unsupervised Learning
Supervised learning algorithms are trained using Unsupervised learning algorithms are trained
labeled data. using unlabeled data.
Supervised learning model takes direct feedback to Unsupervised learning model does not take any
check if it is predicting correct output or not. feedback.
Supervised learning model predicts the output. Unsupervised learning model finds the hidden
patterns in data.
In supervised learning, input data is provided to the In unsupervised learning, only input data is
model along with the output. provided to the model.
The goal of supervised learning is to train the The goal of unsupervised learning is to find the
model so that it can predict the output when it is hidden patterns and useful insights from the
given new data. unknown dataset.
Supervised learning needs supervision to train the Unsupervised learning does not need any
model. supervision to train the model.
Supervised learning can be categorized Unsupervised Learning can be classified

in Classification and Regression problems. in Clustering and Associations problems.
5
D20MCA11140
Supervised learning can be used for those cases Unsupervised learning can be used for those
where we know the input as well as corresponding cases where we have only input data and no
outputs. corresponding output data.
Supervised learning model produces an accurate Unsupervised learning model may give less
result. accurate result as compared to supervised
learning.
Supervised learning is not close to true Artificial Unsupervised learning is more close to the true
intelligence as in this, we first train the model for Artificial Intelligence as it learns similarly as a
each data, and then only it can predict the correct child learns daily routine things by his
output. experiences.
It includes various algorithms such as Linear It includes various algorithms such as

Regression, Logistic Regression, Support Vector Clustering, KNN, and Apriori algorithm.
Machine, Multi-class Classification, Decision tree,
Bayesian Logic, etc.
6
D20MCA11140
Ques. 4: List the important techniques for the NLU.

Ans. 4:
Natural language processing (NLP), as the title clears our perception that it has a sort of processing to
do with language or linguistics. NLP primarily comprises two major functionalities, The first is “Human
to Machine Translation” (Natural Language Understanding), and the second is “Machine to Human
translation”(Natural Language Generation).
Natural language processing (NLP ) is an intersection of Artificial intelligence, Computer Science and
Linguistics. The end goal of this technology is for computers to understand the content, nuances and the
sentiment of the document.
With NLP we can perfectly extract the information and insights contained in the document and then
organize it to their respective categories. For example whenever a user searches something on Google
search engine, Google’s algorithm shows all the relevant documents, blogs and articles using NLP
techniques.
Techniques of Natural Language Processing Covered
1. Named Entity Recognition (NER)
2. Tokenization
3. Stemming and Lemmatization
4. Bag of Words
5. Natural language generation
6. Sentiment Analysis
7. Sentence Segmentation
1. Named Entity Recognition (NER)
This technique is one of the most popular and advantageous techniques in Semantic analysis, Semantics is
something conveyed by the text. Under this technique, the algorithm takes a phrase or paragraph as input
and identifies all the nouns or names present in that input.
There are many popular use cases of this algorithm below we are mentioning some of the daily use cases:
7
D20MCA11140
1. News Categorization:> This algorithm automatically scans all the news article and extract out
all sorts of information, like, individuals, companies, organizations, people, celebrities name,
places from that article. Using this algorithm we can easily classify news content into different
categories.
2. Efficient Search Engine:> The Named entity recognition algorithm applies to all the articles,
results, news to extract relevant tags and stores them separately. These will boost up the searching
process and makes an efficient search engine.
3. Customer Support :> You must have read out thousands of feedbacks provided by people
concerning heavy traffic areas on twitter on a daily basis. If Named Entity Recognition API is
used then we can easily be pulled out all the keywords(or tags) to inform concerned traffic police
departments.
2. Tokenization
First of all, understanding the meaning of Tokenization, it is basically splitting of the whole text into the
list of tokens, lists can be anything such as words, sentences, characters, numbers, punctuation,
etc. Tokenization has two main advantages, one is to reduce search with a significant degree, and the
second is to be effective in the use of storage space.
The process of mapping sentences from character to strings and strings into words are initially the basic
steps of any NLP problem because to understand any text or document we need to understand the
meaning of the text by interpreting words/sentences present in the text.
Tokenization is an integral part of any Information Retrieval(IR) system, it not only involves the pre-
process of text but also generates tokens respectively that are used in the indexing/ranking process. There
are various tokenization’ techniques available among which Porter’s Algorithm is one of the most
prominent techniques.
3. Stemming and Lemmatization
The increasing size of data and information on the web is all-time high from the past couple of years. This
huge data and information demand necessary tools and techniques to extract inferences with much ease.
8
D20MCA11140
“Stemming is the process of reducing inflected (or sometimes derived) words to their word stem, base or
root form - generally a written form of the word.” For example, what stemming does, basically it cuts off
all the suffixes. So after applying a step of stemming on the word “playing”, it becomes “play”, or like,
“asked” becomes “ask”.
Lemmatization usually refers to do things with the proper use of vocabulary and morphological analysis
of words, normally aiming to remove inflectional endings only and to return the base or dictionary form
of a word, which is known as the lemma. In simple words, Lemmatization deals with lemma of a word
that involves reducing the word form after understanding the part of speech (POS) or context of the word
in any document.
4. Bag of Words
Bag of words technique is used to pre-process text and to extract all the features from a text document to
use in Machine Learning modeling. It is also a representation of any text that elaborates/explains the
occurrence of the words within a corpus (document). It is also called “Bag” due to its mechanism, i.e. it is
only concerned with whether known words occur in the document, not the location of the words.
Let’s take an example to understand bag-of-words in more detail. Like below, we are taking 2 text
documents:
“Neha was angry on Sunil and he was angry on Ramesh.”
“Neha love animals.”
Above you see two corpora as documents, we treat both documents as a different entity and make a list of
all the words present in both documents except punctuations as here,

“Neha”, “was”, “angry”, “on”, “Sunil”, “and”, “he”, “Ramesh”, “love”, “animals”
Then we create these documents into vectors (or we can say, creating a text into numbers is called
vectorization in ML) for further modelling.
Presentation of “Neha was angry on Sunil and he was angry on Ramesh” into vector form as
[1,1,1,1,1,1,1,0,0] , and the same as in, “Neha love animals” having vector form
as [1,0,0,0,0,0,0,0,1,1]. So, the bag-of-words technique is mainly used for featuring generation from text
data.
9
D20MCA11140
5. Natural Language Generation
Natural language generation (NLG) is a technique that uses raw structured data to convert it into plain
English (or any other) language. We also call it data storytelling. This technique is very helpful in many
organizations where a large amount of data is used, it converts structured data into natural languages for a
better understanding of patterns or detailed insights into any business.
As this can be viewed opposite of Natural Language Understanding (NLU) that we have already
explained above. NLG makes data understandable to all by making reports that are mainly data-driven,
like, stock-market and financial reports, meeting memos, reports on product requirements, etc.
There are many stages of any NLG;
1. Content Determination: Deciding what are the main content to be represented in text or
information provided in the text.
2. Document Clustering: Deciding the overall structure of the information to convey.
3. Aggregation: Merging of sentences to improve sentence understanding and readability.
4. Lexical Choice: Putting appropriate words to convey the meaning of the sentence more clearly.
5. Referring Expression Generation: Creating references to identify main objects and regions of
the text properly.
6. Realization: Creating and optimizing text that should follow all the norms of grammar (like
syntax, morphology, orthography).
6. Sentiment Analysis
It is one of the most common natural language processing techniques. With sentiment analysis, we can
understand the emotion/feeling of the written text. Sentiment analysis is also known as Emotion
AI or Opinion Mining.
Sentiment analysis usually works best on subjective text data rather than objective test data. Generally,
objective text data are either statements or facts which does not represent any emotion or feeling. On the
other hand, the subjective text is usually written by humans showing emotions and feelings.
10
D20MCA11140
For example, Twitter is all filled up with sentiments, users are addressing their reactions or expressing
their opinions on each topic whichever or wherever possible. So, to access tweets of users in a real-time
scenario, there is a powerful python library called “twippy”.
7. Sentence Segmentation
The most fundamental task of this technique is to divide all text into meaningful sentences or phrases.
This task involves identifying sentence boundaries between words in text documents. We all know that
almost all languages have punctuation marks that are presented at sentence boundaries, So sentence
segmentation also referred to as sentence boundary detection, sentence boundary
disambiguation or sentence boundary recognition.
There are many libraries available to do sentence segmentation, like, NLTK, Spacy, Stanford CoreNLP,
etc, that provide specific functions to do the task.
11
D20MCA11140
Ques. 5: What is meant by conditional probability? Name the theorem where it is used.
Ans. 5:
Conditional probability is defined as the likelihood of an event or outcome occurring, based on the
occurrence of a previous event or outcome. Conditional probability is calculated by multiplying
the probability of the preceding event by the updated probability of the succeeding, or conditional, event.
For example:
 Event A is that an individual applying for college will be accepted. There is an 80% chance that
this individual will be accepted to college.
 Event B is that this individual will be given dormitory housing. Dormitory housing will only be
provided for 60% of all of the accepted students.
 P (Accepted and dormitory housing) = P (Dormitory Housing | Accepted) P (Accepted) =
(0.60)*(0.80) = 0.48.
A conditional probability would look at these two events in relationship with one another, such as the
probability that you are both accepted to college, and you are provided with dormitory housing.
Conditional probability can be contrasted with unconditional probability. Unconditional probability refers
to the likelihood that an event will take place irrespective of whether any other events have taken place or
any other conditions are present.
KEY TAKEAWAYS
 Conditional probability refers to the chances that some outcome occurs given that another event
has also occurred.
 It is often stated as the probability of B given A and is written as P(B|A), where the probability of
B depends on that of A happening.
 Conditional probability can be contrasted with unconditional probability.
 Probabilities are classified as either conditional, marginal, or joint.
 Bayes' theorem is a mathematical formula used in calculating conditional probability.
First, the probability of drawing a blue marble is about 33% because it is one possible outcome out of
three.
Assuming this first event occurs, there will be two marbles remaining, with each having a 50% chance of
being drawn. So the chance of drawing a blue marble after already drawing a red marble would be about
16.5% (33% x 50%).
As another example to provide further insight into this concept, consider that a fair die has been rolled
and you are asked to give the probability that it was a five. There are six equally likely outcomes, so your
answer is 1/6.
But imagine if before you answer, you get extra information that the number rolled was odd. Since there
are only three odd numbers that are possible, one of which is five, you would certainly revise your
estimate for the likelihood that a five was rolled from 1/6 to 1/3.
12
D20MCA11140
This revised probability that an event A has occurred, considering the additional information that another
event B has definitely occurred on this trial of the experiment, is called the conditional probability
of A given B and is denoted by P(A|B).
Conditional Probability Formula
P(B|A) = P(A and B) / P(A)
Or:
P(B|A) = P(A∩B) / P(A)
Where
P = Probability
A = Event A
B = Event B
Bayes' Theorem
Bayes' theorem, named after 18th-century British mathematician Thomas Bayes, is a mathematical
formula for determining conditional probability.
The theorem provides a way to revise existing predictions or theories (update probabilities) given new or
additional evidence.
In finance, Bayes' theorem can be used to rate the risk of lending money to potential borrowers.
Bayes' theorem is also called Bayes' Rule or Bayes' Law and is the foundation of the field of Bayesian
statistics.
This set of rules of probability allows one to update their predictions of events occurring based on new
information that has been received, making for better and more dynamic estimates.
13
D20MCA11140
Ques. 6: Compare classification and clustering.
Ans. 6:
Machine Learning algorithms are generally categorized based upon the type of output variable and the
type of problem that needs to be addressed.
These algorithms are broadly divided into three types i.e. Regression, Clustering, and Classification.
Regression and Classification are types of supervised learning algorithms while Clustering is a type of
unsupervised algorithm.
When the output variable is continuous, then it is a regression problem whereas when it contains discrete
values, it is a classification problem.
Clustering algorithms are generally used when we need to create the clusters based on the characteristics
of the data points.
Classification
Classification is a type of supervised machine learning algorithm. For any given input, the classification
algorithms help in the prediction of the class of the output variable.
There can be multiple types of classifications like binary classification, multi-class classification, etc. It
depends upon the number of classes in the output variable.
Types of Classification algorithms
 Logistic Regression: – It is one of the linear models which can be used for classification. It uses
the sigmoid function to calculate the probability of a certain event occurring. It is an ideal method
for the classification of binary variables.
 K-Nearest Neighbours (kNN): – It uses distance metrics like Euclidean distance, Manhattan
distance, etc. to calculate the distance of one data point from every other data point. To classify
the output, it takes a majority vote from k nearest neighbors of each data point.
 Decision Trees: – It is a non-linear model that overcomes a few of the drawbacks of linear
algorithms like Logistic regression. It builds the classification model in the form of a tree
structure that includes nodes and leaves. This algorithm involves multiple if-else statements
which help in breaking down the structure into smaller structures and eventually providing the
final outcome. It can be used for regression as well as classification problems.
 Random Forest: – It is an ensemble learning method that involves multiple decision trees to
predict the outcome of the target variable. Each decision tree provides its own outcome. In the
case of the classification problem, it takes the majority vote of these multiple decision trees to
classify the final outcome. In the case of the regression problem, it takes the average of the values
predicted by the decision trees.
 Naïve Bayes: – It is an algorithm that is based upon Bayes’ theorem. It assumes that any
particular feature is independent of the inclusion of other features. i.e. They are not correlated to
one another. It generally does not work well with complex data due to this assumption as in most
of the data sets there exists some kind of relationship between the features.
14
D20MCA11140
 Support Vector Machine: – It represents the data points in multi-dimensional space. These data
points are then segregated into classes with the help of hyperplanes. It plots an n-dimensional
space for the n number of features in the dataset and then tries to create the hyperplanes such that
it divides the data points with maximum margin.
Clustering
Clustering is a type of unsupervised machine learning algorithm. It is used to group data points having
similar characteristics as clusters. Ideally, the data points in the same cluster should exhibit similar
properties and the points in different clusters should be as dissimilar as possible.
Clustering is divided into two groups – hard clustering and soft clustering. In hard clustering, the data
point is assigned to one of the clusters only whereas in soft clustering, it provides a probability likelihood
of a data point to be in each of the clusters.
Types of Clustering algorithms
 K-Means Clustering: – It initializes a pre-defined number of k clusters and uses distance metrics
to calculate the distance of each data point from the centroid of each cluster. It assigns the data
points into one of the k clusters based on its distance.
 Agglomerative Hierarchical Clustering (Bottom-Up Approach): – It considers each data point as a

cluster and merges these data points on the basis of distance metric and the criterion which is
used for linking these clusters.
 Divisive Hierarchical Clustering (Top-Down Approach): – It initializes with all the data points as
one cluster and splits these data points on the basis of distance metric and the criterion.
Agglomerative and Divisive clustering can be represented as a dendrogram and the number of
clusters to be selected by referring to the same.
 DBSCAN (Density-based Spatial Clustering of Applications with Noise): – It is a density-based

clustering method. Algorithms like K-Means work well on the clusters that are fairly separated
and create clusters that are spherical in shape. DBSCAN is used when the data is in arbitrary
shape and it is also less sensitive to the outliers. It groups the data points that have many
neighbouring data points within a certain radius.
 OPTICS (Ordering Points to Identify Clustering Structure): – It is another type of density-based

clustering method and it is similar in process to DBSCAN except that it considers a few more
parameters. But it is more computationally complex than DBSCAN. Also, it does not separate the
data points into clusters, but it creates a reachability plot which can help in the interpretation of
creating clusters.
 BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies): – It creates clusters by
generating a summary of the data. It works well with huge datasets as it first summarises the data
and then uses the same to create clusters. However, it can only deal with numeric attributes that
can be represented in space.
15
D20MCA11140
Ques. 8: List the advantages and disadvantages of Dimensionality Reduction.
Ans. 8:
Dimensionality reduction is the task of reducing the number of features in a dataset. In machine learning
tasks like regression or classification, there are often too many variables to work with. These variables are
also called features. The higher the number of features, the more difficult it is to model them, this is
known as the curse of dimensionality. This will be discussed in detail in the next section.
Additionally, some of these features can be quite redundant, adding noise to the dataset and it makes no
sense to have them in the training data. This is where feature space needs to be reduced.
The process of dimensionality reduction essentially transforms data from high-dimensional feature space
to a low-dimensional feature space. Simultaneously, it is also important that meaningful properties
present in the data are not lost during the transformation.
Dimensionality reduction is commonly used in data visualization to understand and interpret the data, and
in machine learning or deep learning techniques to simplify the task at hand.
Advantages of dimensionality reduction:
 Removes Correlated Features: In a real world scenario, this is very common that you get thousands of
features in your dataset. You cannot run your algorithm on all the features as it will reduce the
performance of your algorithm and it will not be easy to visualize that many features in any kind of graph.
So, you MUST reduce the number of features in your dataset. You need to find out the correlation among
the features (correlated variables). Finding correlation manually in thousands of features is nearly
impossible, frustrating and time-consuming. PCA does this for you efficiently.After implementing the
PCA on your dataset, all the Principal Components are independent of one another. There is no
correlation among them.
 Improves Algorithm Performance: With so many features, the performance of your algorithm will
drastically degrade. PCA is a very common way to speed up your Machine Learning algorithm by getting
rid of correlated variables which don't contribute in any decision making. The training time of the
algorithms reduces significantly with less number of features.So, if the input dimensions are too high,
then using PCA to speed up the algorithm is a reasonable choice.
 Reduces Overfitting: Overfitting mainly occurs when there are too many variables in the dataset. So,
PCA helps in overcoming the overfitting issue by reducing the number of features.
 Improves Visualization: It is very hard to visualize and understand the data in high dimensions. PCA
transforms a high dimensional data to low dimensional data (2 dimension) so that it can be visualized
easily.
 It helps in data compression by reducing features.
 It reduces storage.
 It makes machine learning algorithms computationally efficient.
 It also helps remove redundant features and noise.
 It tackles the curse of dimensionality
16
D20MCA11140
Disadvantages of dimensionality reduction:
 Independent variables become less interpretable: After implementing PCA on the dataset, your
original features will turn into Principal Components. Principal Components are the linear combination of
your original features. Principal Components are not as readable and interpretable as original features.
 Data standardization is must before PCA: You must standardize your data before implementing PCA,
otherwise PCA will not be able to find the optimal Principal Components. For instance, if a feature set
has data expressed in units of Kilograms, Light years, or Millions, the variance scale is huge in the
training set. If PCA is applied on such a feature set, the resultant loadings for features with high variance
will also be large. Hence, principal components will be biased towards features with high variance,
leading to false results.Also, for standardization, all the categorical features are required to be converted
into numerical features before PCA can be applied.PCA is affected by scale, so you need to scale the
features in your data before applying PCA. Use StandardScaler from Scikit Learn to standardize the
dataset features onto unit scale (mean = 0 and standard deviation = 1) which is a requirement for the
optimal performance of many Machine Learning algorithms.
 Information Loss: Although Principal Components try to cover maximum variance among the features
in a dataset, if we don't select the number of Principal Components with care, it may miss some
information as compared to the original list of features.
 It may lead to some amount of data loss.
 Accuracy is compromised.
17
D20MCA11140
Ques. 9: What do you mean by sentiment analysis?

Ans. 9:
Sentiment analysis is contextual mining of text which identifies and extracts subjective information in
source material, and helping a business to understand the social sentiment of their brand, product or
service while monitoring online conversations.
However, analysis of social media streams is usually restricted to just basic sentiment analysis and count
based metrics. This is akin to just scratching the surface and missing out on those high value insights that
are waiting to be discovered.
Sentiment Analysis is the process of determining whether a piece of writing is positive, negative or
neutral. A sentiment analysis system for text analysis combines natural language processing (NLP) and
machine learning techniques to assign weighted sentiment scores to the entities, topics, themes and
categories within a sentence or phrase.
Sentiment analysis helps data analysts within large enterprises gauge public opinion, conduct nuanced
market research, monitor brand and product reputation, and understand customer experiences.
In addition, data analytics companies often integrate third-party sentiment analysis APIs into their own
customer experience management, social media monitoring, or workforce analytics platform, in order to
deliver useful insights to their own customers.
Sentiment analysis is one of the Natural Language Processing fields, dedicated to the exploration of
subjective opinions or feelings collected from various sources about a particular subject.
In more strict business terms, it can be summarized as:
Sentiment Analysis is a set of tools to identify and extract opinions and use them for the benefit of the
business operation
Such algorithms dig deep into the text and find the stuff that points out the attitude towards the product in
general or its specific element.
In other words, opinion mining and sentiment analysis mean an opportunity to explore the mindset of the
audience members and study the state of the product from the opposite point of view. This makes
sentiment analysis a great tool for:
 expanded product analytics
 market research
 reputation management
 precision targeting
 marketing analysis
 public relations (PR)
 product reviews
 net promoter scoring
 product feedback
18
D20MCA11140
 customer service
First and foremost, sentiment analysis is important because emotions and attitudes towards a topic can
become actionable pieces of information useful in numerous areas of business and research.
Secondly, it saves time and effort because the process of sentiment extraction is fully automated – it’s the
algorithm that analyses the sentiment datasets, therefore human participation is sparse.
Can you imagine browsing the web, finding relevant texts, reading them, and assessing the tone they carry
manually? It’s doable, but takes ages.
Thirdly, it’s becoming a more and more popular topic as artificial intelligence, deep learning, machine
learning techniques, and natural language processing technologies are developing.
Fourthly, as the technology develops, sentiment analysis will be more accessible and affordable for the
public and smaller companies as well.
And lastly, the tools are becoming smarter every day. The more they’re fed with data, the smarter and
more accurate they become in sentiment extraction.
19
D20MCA11140
Ques. 10: How to eliminate overfitting?
Ans. 10: Overfitting occurs when you achieve a good fit of your model on the training data, while it does
not generalize well on new, unseen data. In other words, the model learned patterns specific to the training
data, which are irrelevant in other data.
We can identify overfitting by looking at validation metrics, like loss or accuracy. Usually, the validation
metric stops improving after a certain number of epochs and begins to decrease afterward. The training
metric continues to improve because the model seeks to find the best fit for the training data.
There are several manners in which we can reduce overfitting in deep learning models. The best option is
to get more training data. Unfortunately, in real-world situations, you often do not have this possibility
due to time, budget or technical constraints.
Another way to reduce overfitting is to lower the capacity of the model to memorize the training data. As
such, the model will need to focus on the relevant patterns in the training data, which results in better
generalization.
How to Avoid Overfitting In Machine Learning?

There are several techniques to avoid overfitting in Machine Learning altogether listed below.
1. Cross-Validation
2. Training With More Data
3. Removing Features
4. Early Stopping
5. Regularization
6. Ensembling
1. Cross-Validation
One of the most powerful features to avoid/prevent overfitting is cross-validation. The idea behind this is
to use the initial training data to generate mini train-test-splits, and then use these splits to tune your
model.
In a standard k-fold validation, the data is partitioned into k-subsets also known as folds. After this, the
algorithm is trained iteratively on k-1 folds while using the remaining folds as the test set, also known as
holdout fold.
20
D20MCA11140
The cross-validation helps us to tune the hyperparameters with only the original training set. It basically
keeps the test set separately as a true unseen data set for selecting the final model. Hence, avoiding
overfitting altogether.
2. Training With More Data
This technique might not work every time, as we have also discussed in the example above, where
training with a significant amount of population helps the model. It basically helps the model in
identifying the signal better.
But in some cases, the increased data can also mean feeding more noise to the model. When we are
training the model with more data, we have to make sure the data is clean and free from randomness and
inconsistencies.
3. Removing Features
Although some algorithms have an automatic selection of features. For a significant number of those who
do not have a built-in feature selection, we can manually remove a few irrelevant features from the input
features to improve the generalization.
One way to do it is by deriving a conclusion as to how a feature fits into the model. It is quite similar to
debugging the code line-by-line.
In case if a feature is unable to explain the relevancy in the model, we can simply identify those features.
We can even use a few feature selection heuristics for a good starting point.
4. Early Stopping
When the model is training, you can actually measure how well the model performs based on each
iteration. We can do this until a point when the iterations improve the model’s performance. After this,
the model overfits the training data as the generalization weakens after each iteration.
21
D20MCA11140
Section B
Ques. 1: List and explain the applications of machine learning.
Ans. 1:
Image Recognition:
Image recognition is one of the most common applications

of machine learning. It is used to identify objects, persons,
places, digital images, etc. The popular use case of image
recognition and face detection is, Automatic friend tagging
suggestion:
Facebook provides us a feature of auto friend tagging

suggestion. Whenever we upload a photo with our
Facebook friends, then we automatically get a tagging
suggestion with name, and the technology behind this is
machine learning's face detection and recognition
algorithm.
It is based on the Facebook project named "Deep Face,"

which is responsible for face recognition and person identification in the picture.
Speech Recognition
While using Google, we get an option of "Search by voice," it comes under speech recognition, and it's a
popular application of machine learning.
Speech recognition is a process of converting voice instructions into text, and it is also known as "Speech
to text", or "Computer speech recognition." At present, machine learning algorithms are widely used by
various applications of speech recognition. Google assistant, Siri, Cortana, and Alexa are using speech
recognition technology to follow the voice instructions.
Traffic prediction:
If we want to visit a new place, we take help of Google Maps, which shows us the correct path with the
shortest route and predicts the traffic conditions.
It predicts the traffic conditions such as whether traffic is cleared, slow-moving, or heavily congested
with the help of two ways:
o Real Time location of the vehicle form Google Map app and sensors
o Average time has taken on past days at the same time.
22
D20MCA11140
Everyone who is using Google Map is helping this app to make it better. It takes information from the
user and sends back to its database to improve the performance.
Product recommendations:
Machine learning is widely used by various e-commerce and entertainment companies such
as Amazon, Netflix, etc., for product recommendation to the user. Whenever we search for some product
on Amazon, then we started getting an advertisement for the same product while internet surfing on the
same browser and this is because of machine learning.
Google understands the user interest using various machine learning algorithms and suggests the product
as per customer interest.
As similar, when we use Netflix, we find some recommendations for entertainment series, movies, etc.,
and this is also done with the help of machine learning.
Self-driving cars:
One of the most exciting applications of machine learning is self-driving cars. Machine learning plays a
significant role in self-driving cars. Tesla, the most popular car manufacturing company is working on
self-driving car. It is using unsupervised learning method to train the car models to detect people and
objects while driving.
Email Spam and Malware Filtering:
Whenever we receive a new email, it is filtered automatically as important, normal, and spam. We always
receive an important mail in our inbox with the important symbol and spam emails in our spam box, and
the technology behind this is Machine learning. Below are some spam filters used by Gmail:
o Content Filter
o Header filter
o General blacklists filter
o Rules-based filters
o Permission filters
Some machine learning algorithms such as Multi-Layer Perceptron, Decision tree, and Naïve Bayes
classifier are used for email spam filtering and malware detection.
Virtual Personal Assistant:
We have various virtual personal assistants such as Google assistant, Alexa, Cortana, Siri. As the name
suggests, they help us in finding the information using our voice instruction. These assistants can help us
in various ways just by our voice instructions such as Play music, call someone, Open an email,
Scheduling an appointment, etc.
23
D20MCA11140
These virtual assistants use machine learning algorithms as an important part.
These assistant record our voice instructions, send it over the server on a cloud, and decode it using ML
algorithms and act accordingly.
Online Fraud Detection:
Machine learning is making our online transaction safe and secure by detecting fraud transaction.
Whenever we perform some online transaction, there may be various ways that a fraudulent transaction
can take place such as fake accounts, fake ids, and steal money in the middle of a transaction. So to detect
this, Feed Forward Neural network helps us by checking whether it is a genuine transaction or a fraud
transaction.
For each genuine transaction, the output is converted into some hash values, and these values become the
input for the next round. For each genuine transaction, there is a specific pattern which gets change for the
fraud transaction hence, it detects it and makes our online transactions more secure.
Stock Market trading:
Machine learning is widely used in stock market trading. In the stock market, there is always a risk of up
and downs in shares, so for this machine learning's long short term memory neural network is used for the
prediction of stock market trends.
Medical Diagnosis:
In medical science, machine learning is used for diseases diagnoses. With this, medical technology is
growing very fast and able to build 3D models that can predict the exact position of lesions in the brain.
It helps in finding brain tumors and other brain-related diseases easily.
Automatic Language Translation:
Nowadays, if we visit a new place and we are not aware of the language then it is not a problem at all, as
for this also machine learning helps us by converting the text into our known languages. Google's GNMT
(Google Neural Machine Translation) provide this feature, which is a Neural Machine Learning that
translates the text into our familiar language, and it called as automatic translation.
The technology behind the automatic translation is a sequence to sequence learning algorithm, which is
used with image recognition and translates the text from one language to another language.
24
D20MCA11140
Ques. 2: Elaborate the Life cycle of machine learning process.
Ans. 2:
Machine learning has given the computer systems the

abilities to automatically learn without being explicitly
programmed. But how does a machine learning system
work?
So, it can be described using the life cycle of machine

learning. Machine learning life cycle is a cyclic process
to build an efficient machine learning project. The main
purpose of the life cycle is to find a solution to the
problem or project.
Machine learning life cycle involves seven major steps,

which are given below:
o Gathering Data
o Data preparation
o Data Wrangling
o Analyse Data
o Train the model
o Test the model
o Deployment
1. Gathering Data:
Data Gathering is the first step of the machine learning life cycle. The goal of this step is to identify and
obtain all data-related problems.
In this step, we need to identify the different data sources, as data can be collected from various sources
such as files, database, internet, or mobile devices. It is one of the most important steps of the life cycle.
The quantity and quality of the collected data will determine the efficiency of the output. The more will
be the data, the more accurate will be the prediction.
This step includes the below tasks:
o Identify various data sources

o Collect data
o Integrate the data obtained from different sources
25
D20MCA11140
By performing the above task, we get a coherent set of data, also called as a dataset. It will be used in
further steps.
2. Data preparation
After collecting the data, we need to prepare it for further steps. Data preparation is a step where we put
our data into a suitable place and prepare it to use in our machine learning training.
In this step, first, we put all data together, and then randomize the ordering of data.
This step can be further divided into two processes:
o Data exploration:It is used to understand the nature of data that we have to work with. We need
to understand the characteristics, format, and quality of data.
A better understanding of data leads to an effective outcome. In this, we find Correlations,
general trends, and outliers.
o Data pre-processing:Now the next step is preprocessing of data for its analysis.
3. Data Wrangling
Data wrangling is the process of cleaning and converting raw data into a useable format. It is the process
of cleaning the data, selecting the variable to use, and transforming the data in a proper format to make it
more suitable for analysis in the next step. It is one of the most important steps of the complete process.
Cleaning of data is required to address the quality issues.
It is not necessary that data we have collected is always of our use as some of the data may not be useful.
In real-world applications, collected data may have various issues, including:
o Missing Values
o Duplicate data
o Invalid data
o Noise
So, we use various filtering techniques to clean the data.
It is mandatory to detect and remove the above issues because it can negatively affect the quality of the
outcome.
4.Data Analysis
Now the cleaned and prepared data is passed on to the analysis step. This step involves:
o Selection of analytical techniques
26
D20MCA11140
o Building models
o Review the result
The aim of this step is to build a machine learning model to analyze the data using various analytical
techniques and review the outcome. It starts with the determination of the type of the problems, where we
select the machine learning techniques such as Classification, Regression, Cluster analysis, Association,
etc. then build the model using prepared data, and evaluate the model.
Hence, in this step, we take the data and use machine learning algorithms to build the model.
5.Train Model
Now the next step is to train the model, in this step we train our model to improve its performance for
better outcome of the problem.
We use datasets to train the model using various machine learning algorithms. Training a model is
required so that it can understand the various patterns, rules, and, features.
6. Test Model
Once our machine learning model has been trained on a given dataset, then we test the model. In this step,
we check for the accuracy of our model by providing a test dataset to it.
Testing the model determines the percentage accuracy of the model as per the requirement of project or
problem.
7.Deployment
The last step of machine learning life cycle is deployment, where we deploy the model in the real-world
system.
If the above-prepared model is producing an accurate result as per our requirement with acceptable speed,
then we deploy the model in the real system. But before deploying the project, we will check whether it is
improving its performance using available data or not. The deployment phase is similar to making the
final report for a project.
27
D20MCA11140
Ques. 3: Differentiate between simple and multiple linear regression.
Ans. 3:
Linear Regression
It is also called simple linear regression. It establishes the relationship between two variables using a
straight line. Linear regression attempts to draw a line that comes closest to the data by finding the slope
and intercept that define the line and minimize regression errors.
If two or more explanatory variables have a linear relationship with the dependent variable, the regression
is called a multiple linear regression.
Many data relationships do not follow a straight line, so statisticians use nonlinear regression instead. The
two are similar in that both track a particular response from a set of variables graphically.
But nonlinear models are more complicated than linear models because the function is created through a
series of assumptions that may stem from trial and error.
Simple Linear Regression establishes the relationship between two variables using a straight line. It
attempts to draw a line that comes closest to the data by finding the slope and intercept which define the
line and minimize regression errors.
Simple linear regression has only one x and one y variable.
Linear regression is one of the most common techniques of regression analysis. It is also called a simple
linear regression. It establishes the relationship between two variables using a straight line.
Linear regression attempts to draw a line that comes closest to the data by finding the slope and intercept
that define the line and minimize regression errors.
Multiple Regression
It is rare that a dependent variable is explained by only one variable. In this case, an analyst uses multiple
regression, which attempts to explain a dependent variable using more than one independent variable.
Multiple regressions can be linear and nonlinear.
Multiple regressions are based on the assumption that there is a linear relationship between both the
dependent and independent variables. It also assumes no major correlation between the independent
variables.
As mentioned above, there are several different advantages to using regression analysis. These models
can be used by businesses and economists to help make practical decisions.
Multiple Linear regressions are based on the assumption that there is a linear relationship between both
the dependent and independent variables or Predictor variable and Target variable.
It also assumes that there is no major correlation between the independent variables. Multi Linear
regressions can be linear and nonlinear.
28
D20MCA11140
It has one y and two or more x variables or one dependent variable and two or more independent variables.
Multiple regression is a broader class of regressions that encompasses linear and nonlinear regressions
with multiple explanatory variables.
It is rare that a dependent variable is explained by only one variable. In this case, an analyst uses multiple
regression, which attempts to explain dependent variable using more than one independent variable.
Multiple regressions can be linear and nonlinear.
29
D20MCA11140
Ques. 5: Discuss the role of regular expression in NLP

Ans. 5:
A regular expression (RE) is a language for specifying text search strings. RE helps us to match or find
other strings or sets of strings, using a specialized syntax held in a pattern.
Regular expressions are used to search texts in UNIX as well as in MS WORD in identical way. We have
various search engines using a number of RE features.
Properties of Regular Expressions

Followings are some of the important properties of RE −
 American Mathematician Stephen Cole Kleene formalized the Regular Expression language.
 RE is a formula in a special language, which can be used for specifying simple classes of strings,
a sequence of symbols. In other words, we can say that RE is an algebraic notation for
characterizing a set of strings.
 Regular expression requires two things, one is the pattern that we wish to search and other is a
corpus of text from which we need to search.
Mathematically, A Regular Expression can be defined as follows −
 ε is a Regular Expression, which indicates that the language is having an empty string.
 φ is a Regular Expression which denotes that it is an empty language.
 If X and Y are Regular Expressions, then
o X, Y
o X.Y(Concatenation of XY)
o X+Y (Union of X and Y)

o X*, Y* (Kleen Closure of X and Y)
are also regular expressions.
 If a string is derived from above rules then that would also be a regular expression.
Regular Expressions is very popular among programmers and can be applied in many programming
languages like Java, JS, php, C++, etc. Regular Expressions are useful for numerous practical day-to-day
tasks that a data scientist encounters. It is one of the key concepts of Natural Language Processing that
every NLP expert should be proficient in.
Regular Expressions are used in various tasks such as data pre-processing, rule-based information mining
systems, pattern matching, text feature engineering, web scraping, data extraction, etc.
30
D20MCA11140
A fascinating programming tool available within most of the programming languages — Regular
expressions also called regex. It is a very powerful programming tool that is used for a variety of
purposes such as feature extraction from text, string replacement and other string manipulations. A
regular expression is a set of characters, or a pattern, which is used to find sub strings in a given string.
for ex. extracting all hashtags from a tweet, getting email id or phone numbers etc..from a large
unstructured text content.
In short, if there’s a pattern in any string, you can easily extract, substitute and do variety of other string
manipulation operations using regular expressions. Regular expressions are a language in itself since they
have their own compilers and almost all popular programming languages support working with regexes.
It won’t be an exaggeration to mention that without having understanding of regular expressions, it is not
possible to really build a NLP based system such as chatbots, conversational UI etc.. Regex has many
important elements and features that help to build a useful, fit to purpose expressions solution for string
extraction or manipulation.
31

Hidden Layers in Neural Networks

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Hidden Layers in Neural Networks

Uploaded by

Copyright:

Available Formats

D20MCA11140

between a given input and the correct output.

Ques. 2: Illuminate the need for sigmoid function.

The various types of sigmoid graphs are

APPLICATIONS OF SIGMOID FUNCTION

Ques. 3: Differentiate between supervised and unsupervised learning.

Supervised Machine Learning:

Unsupervised Machine Learning:

Supervised Learning Unsupervised Learning

Supervised learning can be categorized Unsupervised Learning can be classified

It includes various algorithms such as Linear It includes various algorithms such as

Ques. 4: List the important techniques for the NLU.

sentiment of the document.

3. Stemming and Lemmatization

5. Natural language generation

1. Named Entity Recognition (NER)

and identifies all the nouns or names present in that input.

process and makes an efficient search engine.

second is to be effective in the use of storage space.

meaning of the text by interpreting words/sentences present in the text.

“asked” becomes “ask”.

all the words present in both documents except punctuations as here,

vectorization in ML) for further modelling.

5. Natural Language Generation

better understanding of patterns or detailed insights into any business.

There are many stages of any NLG;

information provided in the text.

2. Document Clustering: Deciding the overall structure of the information to convey.

3. Aggregation: Merging of sentences to improve sentence understanding and readability.

the text properly.

syntax, morphology, orthography).

scenario, there is a powerful python library called “twippy”.

segmentation also referred to as sentence boundary detection, sentence boundary

disambiguation or sentence boundary recognition.

etc, that provide specific functions to do the task.

Conditional Probability Formula

P(B|A) = P(A and B) / P(A)

P(B|A) = P(A∩B) / P(A)

Ques. 6: Compare classification and clustering.

Types of Classification algorithms

Types of Clustering algorithms

 Agglomerative Hierarchical Clustering (Bottom-Up Approach): – It considers each data point as a

 DBSCAN (Density-based Spatial Clustering of Applications with Noise): – It is a density-based

 OPTICS (Ordering Points to Identify Clustering Structure): – It is another type of density-based

Ques. 8: List the advantages and disadvantages of Dimensionality Reduction.

Advantages of dimensionality reduction:

 It helps in data compression by reducing features.

 It makes machine learning algorithms computationally efficient.

 It also helps remove redundant features and noise.

 It tackles the curse of dimensionality

Disadvantages of dimensionality reduction:

 It may lead to some amount of data loss.

Ques. 9: What do you mean by sentiment analysis?

Ques. 10: How to eliminate overfitting?

How to Avoid Overfitting In Machine Learning?

2. Training With More Data

Image recognition is one of the most common applications

Facebook provides us a feature of auto friend tagging

It is based on the Facebook project named "Deep Face,"

Email Spam and Malware Filtering:

Virtual Personal Assistant:

These virtual assistants use machine learning algorithms as an important part.

Online Fraud Detection: