Subjective Ai 417 2023

Important Topics:
Unit 1: INTRODUCTION TO AI
• Types of Intelligence
• How to identify an AI
• AI vs ML vs DL
• AI Domains
Important Topics:
Unit 2: AI PROJECT CYCLE
• Different Stages
• 4W Canvas
• Rule based vs Learning based

• Supervised vs Unsupervised vs Reinforcement
• Neural Networks
Important Topics:
Unit 3: NATURAL LANGUAGE PROCESSING
• Applications: Assistances, text Summarization etc.
• Text Normalization
• Bag of words
• TFIDF (optional)
Important Topics:
Unit 4: EVALUATION
• Terms like TRUE POSITIVE, FALSE NEGATIVE etc.
• Confusion matrix
• Evaluation Metrics and their formulae

UNIT RDBMS:
Data
manupulation
using SQL
Introduction to AI
Define the term Machine Learning. Also give 2 applications
of Machine Learning in our daily lives.
Ans:
• Machine Learning: It is a subset of Artificial Intelligence which

enables machines to improve at tasks with experience (data).
• The intention of Machine Learning is to enable machines to learn

by themselves using the provided data and make accurate
Predictions/ Decisions.
• Machine Learning is used in Snapchat Filters, NETFLIX

recommendation system.
Introduction to AI
How does a machine become Artificially Intelligent?
Ans
• A machine becomes intelligent by training with data and
algorithm.
• AI machines keep updating their knowledge to optimize
their output
Introduction to AI
Differentiate between what is AI and what is not AI with the
help of an example?
Introduction to AI
What do you understand by Deep Learning?
ANS
Deep learning is a subset of machine learning where artificial neural

networks, algorithms inspired by the human brain, learn from large
amounts of data.
Introduction to AI
What are the three domains of AI? Give examples of each
ANS
• Data Science: Price Comparison Websites/ Website

Recommendation
• Computer Vision: Self-Driving cars/Facial recognition
• Natural Language Processing (NLP): Email filters/Smart

assistants/ Sentiment Analysis/
Project Cycle
What is a problem statement template and what is its
significance?
• The problem statement template gives a clear idea about the

basic framework required to achieve the goal.
• It is the 4Ws canvas which segregates; what is the problem,

where does it arise, who is affected, why is it a problem? It takes
us straight to the goal.
Project Cycle
Draw the graphical representation of Classification AI model.
Explain in brief.
Classification: In classification, data is categorized under

different labels according to some parameters given in input
and then the labels are predicted for the data.
Project Cycle
Draw the graphical representation of Regression AI model.
Explain in brief.
Regression: These models work on continuous data to predict

the output based on patterns.
If you wish to predict your next salary, then you would put in
the data of your previous salary and train your model
Project Cycle
What is an Artificial Neural Network? Explain the layers in an artificial
neural network.
Artificial Neural Network: Modeled in accordance with the human brain, a

Neural Network was built to mimic the functionality of a human brain.
A neural network consists of three important layers:
Input Layer: this layer accepts all the inputs provided by the programmer.
Hidden Layer: computations are performed which result in the

output. There can be any number of hidden layers
Output Layer: Final results in the output are delivered via this layer.
Project Cycle
Differentiate between Classification and Regression.
Ans:
Project Cycle
Natural Language Processing
Give 2 points of difference between a script-bot and a smart-bot
Explain the term Text Normalization in Data Processing.
Ans:
• The first step in Data processing is Text Normalisation.
• Text Normalisation helps in cleaning up the textual data in such

a way that it comes down to a level where its complexity is lower
than the actual data.
• In this we undergo several steps to normalise the whole textual

data from all the documents altogether is known as corpus.
Differentiate between stemming and lemmatization. Explain with
the help of an example
Ans:
• Stemming is the process in which the affixes of words are

removed and the words are converted to their base form.
• Lemmatization, the word we get after affix removal (also known

as lemma) is a meaningful one. It takes a longer time to execute
than stemming.
Write the applications of NLP (Natural Language Processing). (Any
four)
• Automatic Summarization: Used for summarizing the meaning of documents

and information, but also to understand the emotional meanings within the
information
• Sentiment Analysis: The goal of sentiment analysis is to identify sentiment

among social media post in terms of positive, negative or neutral where
emotion is not always explicitly expressed.
• Text classification : Used to assign predefined categories to a document and

organize it to help you find the information you need
• Virtual Assistants: With the help of speech recognition, these assistants can
not only detect our speech but can help with everyday tasks like setting an
alarm
Name any 2 applications of Natural Language Processing which are
used in the real-life scenario
Ans:
• Automatic Summarization
• Sentiment Analysis
• Text classification
• Virtual Assistants
What will be the output of the word “studies” if we do the
following:
a. Lemmatization
b. Stemming
Ans:
The output of the word after lemmatization will be study.
The output of the word after stemming will be studi.

How many tokens are there in the sentence given below?
Traffic Jams have become a common part of our lives nowadays.

Living in an urban area means you have to face traffic each and
every time you get out on the road. Mostly, school students opt
for buses to go to school.
Ans:
46 tokens are there in the given sentence

What is a corpus?
Ans:
The term used to describe the whole textual data from all the
documents altogether is known as corpus.
Identify any 2 stopwords in the given sentence:
Pollution is the introduction of contaminants into the natural

environment that cause adverse change. The three types of
pollution are air pollution, water pollution and land pollution.
Ans:
Stopwords in the given sentence are: is, the, of, that, into, are, and
“Automatic summarization is used in NLP applications”. Is the
given statement correct? Justify your answer with an example.
Ans:
• Yes, the given statement is correct.
• Automatic summarization is relevant not only for summarizing

the meaning of documents and information, but also to
understand the emotional meanings within the information.
• Automatic summarization is especially relevant when used to

provide an overview of a news item or blog post
Write any two applications of TFIDF
Ans:
1. Document Classification
Helps in classifying the type and genre of a document.
2. Topic Modelling
It helps in predicting the topic for a corpus.
3. Information Retrieval System

To extract the important information out of a corpus.
4. Stop word filtering

Helps in removing the unnecessary words out of a text body.
Write down the steps to implement bag of words algorithm
Ans:
The steps to implement bag of words algorithm are as follows:
1. Text Normalization: Collect data and pre-process it
2. Create Dictionary: Make a list of all the unique words occurring in

the corpus. (Vocabulary)
3. Create document vectors: For each document in the corpus, find

out how many times the word from the unique list of words has
occurred.
Explain from the given graph, how the value and occurrence of a word are
related in a corpus?
Ans:
Occurrence and value of a word are inversely proportional.
• The words which occur most (like stop words) have negligible value.
• These words occur the least but add the most value to the corpus.
Classify each of the images according to how well the model’s output matches
the data samples:
1. The model’s output does not match the true function at all. Hence the model
is said to be under fitting and its accuracy is lower.
2. Model is trying to cover all the data samples even if they are out of
alignment. This model is said to be over fitting and has a lower accuracy
3. Model’s performance matches well with the true function and is called a
perfect fit
Evaluation
What is F1 Score in Evaluation?
Ans:
F1 score can be defined as the measure of balance between
precision and recall.
Evaluation
Give an example of a situation wherein false positive would have a
high cost associated with it.
Ans:
• Let us consider a model that predicts that a mail is spam or not.
• If the model always predicts that the mail is spam, people would
not look at it and eventually might lose important information.
• Here False Positive condition (Predicting the mail as spam while

the mail is not spam) would have a high cost.
Evaluation
Why should we avoid using the training data for evaluation?
Ans:
This is because our model will simply remember the whole training
set, and will therefore always predict the correct label for any point
in the training set
Evaluation
Which evaluation metric would be crucial in the following cases?
Justify your answer.
a. Mail Spamming
• If the model always predicts that the mail is spam, people would
not look at it and eventually might lose important information.
• False Positive condition would have a high cost. (predicting the

mail as spam while the mail is not spam)
Evaluation
b. Gold Mining
• A model saying that there exists treasure at a point and you keep
on digging there but it turns out that it is a false alarm.
• False Positive case is very costly. (predicting there is a treasure

but there is no treasure)
Evaluation
c. Viral Outbreak
• A deadly virus has started spreading and the model which is

supposed to predict a viral outbreak does not detect it.
• The virus might spread widely and infect a lot of

people. Hence, False Negative can be dangerous.
Evaluation
What are the possible reasons for an AI model not being efficient? Explain.
a. Lack of Training Data: If the data is not sufficient for developing an AI Model,
or if the data is missed while training the model, it will not be efficient.
b. Unauthenticated Data / Wrong Data: If the data is not authenticated and

correct, then the model will not give good results.
c. Inefficient coding / Wrong Algorithms: If the written algorithms are not

correct and relevant, Model will not give desired output. Not Tested: If the model
is not tested properly, then it will not be efficient.
Evaluation
Calculate Accuracy, Precision, Recall and F1 Score for the following Confusion
Matrix on Heart Attack Risk. Also suggest which metric would not be a good
evaluation parameter here and why?
Precision: percentage of true positive cases versus all the cases where the prediction is true.
Recall: It is defined as the fraction of positive cases that are correctly identified
False Positive (impacts Precision): A person is predicted as high risk but does not
have heart attack.
False Negative (impacts Recall): A person is predicted as low risk but has heart
attack.
Therefore, False Negatives miss actual heart patients, hence recall metric need
more improvement. False Negatives are more dangerous than False Positives.
Evaluation
Calculate Accuracy, Precision, Recall and F1 Score for the following Confusion
Matrix on SPAM FILTERING: Also suggest which metric would not be a good
evaluation parameter here and why?
Precision: percentage of true positive cases versus all the cases where the prediction is true.
Recall: It is defined as the fraction of positive cases that are correctly identified
False Positive (impacts Precision): Mail is predicted as “spam” but it is not.
False Negative (impacts Recall): Mail is predicted as “not spam” but spam
Of course, too many False Negatives will make the Spam Filter ineffective. But
False Positives may cause important mails to be missed. Hence, Precision is more
important to improve
Evaluation
Thank you!

Subjective Ai 417 2023

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Subjective Ai 417 2023

Uploaded by

Copyright:

Available Formats

Important Topics:

• Rule based vs Learning based

• Applications: Assistances, text Summarization etc.

• Terms like TRUE POSITIVE, FALSE NEGATIVE etc.

• Evaluation Metrics and their formulae

• Machine Learning: It is a subset of Artificial Intelligence which

• The intention of Machine Learning is to enable machines to learn

• Machine Learning is used in Snapchat Filters, NETFLIX

Deep learning is a subset of machine learning where artificial neural

• Data Science: Price Comparison Websites/ Website

• Computer Vision: Self-Driving cars/Facial recognition

• Natural Language Processing (NLP): Email filters/Smart

• The problem statement template gives a clear idea about the

• It is the 4Ws canvas which segregates; what is the problem,

Classification: In classification, data is categorized under

Regression: These models work on continuous data to predict

Artificial Neural Network: Modeled in accordance with the human brain, a

A neural network consists of three important layers:

Hidden Layer: computations are performed which result in the

• The first step in Data processing is Text Normalisation.

• Text Normalisation helps in cleaning up the textual data in such

• In this we undergo several steps to normalise the whole textual

• Stemming is the process in which the affixes of words are

• Lemmatization, the word we get after affix removal (also known

• Automatic Summarization: Used for summarizing the meaning of documents

• Sentiment Analysis: The goal of sentiment analysis is to identify sentiment

• Text classification : Used to assign predefined categories to a document and

The output of the word after lemmatization will be study.

The output of the word after stemming will be studi.

Traffic Jams have become a common part of our lives nowadays.

46 tokens are there in the given sentence

Pollution is the introduction of contaminants into the natural

• Yes, the given statement is correct.

• Automatic summarization is relevant not only for summarizing

• Automatic summarization is especially relevant when used to

3. Information Retrieval System

4. Stop word filtering

The steps to implement bag of words algorithm are as follows:

1. Text Normalization: Collect data and pre-process it

2. Create Dictionary: Make a list of all the unique words occurring in

3. Create document vectors: For each document in the corpus, find

• Let us consider a model that predicts that a mail is spam or not.

• Here False Positive condition (Predicting the mail as spam while

• False Positive condition would have a high cost. (predicting the

• False Positive case is very costly. (predicting there is a treasure

• A deadly virus has started spreading and the model which is

• The virus might spread widely and infect a lot of

b. Unauthenticated Data / Wrong Data: If the data is not authenticated and

c. Inefficient coding / Wrong Algorithms: If the written algorithms are not

False Positive (impacts Precision): Mail is predicted as “spam” but it is not.

You might also like