You are on page 1of 43

Important Topics:

Unit 1: INTRODUCTION TO AI

• Types of Intelligence

• How to identify an AI

• AI vs ML vs DL

• AI Domains
Important Topics:
Unit 2: AI PROJECT CYCLE

• Different Stages

• 4W Canvas

• Rule based vs Learning based


• Supervised vs Unsupervised vs Reinforcement

• Neural Networks
Important Topics:
Unit 3: NATURAL LANGUAGE PROCESSING

• Applications: Assistances, text Summarization etc.

• Text Normalization

• Bag of words

• TFIDF (optional)
Important Topics:
Unit 4: EVALUATION

• Terms like TRUE POSITIVE, FALSE NEGATIVE etc.

• Confusion matrix

• Evaluation Metrics and their formulae


UNIT RDBMS:
Data
manupulation
using SQL
Introduction to AI
Define the term Machine Learning. Also give 2 applications
of Machine Learning in our daily lives.

Ans:

• Machine Learning: It is a subset of Artificial Intelligence which


enables machines to improve at tasks with experience (data).

• The intention of Machine Learning is to enable machines to learn


by themselves using the provided data and make accurate
Predictions/ Decisions.

• Machine Learning is used in Snapchat Filters, NETFLIX


recommendation system.
Introduction to AI
How does a machine become Artificially Intelligent?

Ans
• A machine becomes intelligent by training with data and
algorithm.
• AI machines keep updating their knowledge to optimize
their output
Introduction to AI
Differentiate between what is AI and what is not AI with the
help of an example?
Introduction to AI
What do you understand by Deep Learning?

ANS

Deep learning is a subset of machine learning where artificial neural


networks, algorithms inspired by the human brain, learn from large
amounts of data.
Introduction to AI
What are the three domains of AI? Give examples of each

ANS

• Data Science: Price Comparison Websites/ Website


Recommendation

• Computer Vision: Self-Driving cars/Facial recognition

• Natural Language Processing (NLP): Email filters/Smart


assistants/ Sentiment Analysis/
Project Cycle
What is a problem statement template and what is its
significance?

• The problem statement template gives a clear idea about the


basic framework required to achieve the goal.

• It is the 4Ws canvas which segregates; what is the problem,


where does it arise, who is affected, why is it a problem? It takes
us straight to the goal.
Project Cycle
Draw the graphical representation of Classification AI model.
Explain in brief.

Classification: In classification, data is categorized under


different labels according to some parameters given in input
and then the labels are predicted for the data.
Project Cycle
Draw the graphical representation of Regression AI model.
Explain in brief.

Regression: These models work on continuous data to predict


the output based on patterns.

If you wish to predict your next salary, then you would put in
the data of your previous salary and train your model
Project Cycle
What is an Artificial Neural Network? Explain the layers in an artificial
neural network.

Artificial Neural Network: Modeled in accordance with the human brain, a


Neural Network was built to mimic the functionality of a human brain.

A neural network consists of three important layers:

Input Layer: this layer accepts all the inputs provided by the programmer.

Hidden Layer: computations are performed which result in the


output. There can be any number of hidden layers

Output Layer: Final results in the output are delivered via this layer.
Project Cycle
Differentiate between Classification and Regression.

Ans:
Project Cycle
Natural Language Processing
Give 2 points of difference between a script-bot and a smart-bot
Natural Language Processing
Explain the term Text Normalization in Data Processing.

Ans:

• The first step in Data processing is Text Normalisation.

• Text Normalisation helps in cleaning up the textual data in such


a way that it comes down to a level where its complexity is lower
than the actual data.

• In this we undergo several steps to normalise the whole textual


data from all the documents altogether is known as corpus.
Natural Language Processing
Differentiate between stemming and lemmatization. Explain with
the help of an example

Ans:

• Stemming is the process in which the affixes of words are


removed and the words are converted to their base form.

• Lemmatization, the word we get after affix removal (also known


as lemma) is a meaningful one. It takes a longer time to execute
than stemming.
Natural Language Processing
Write the applications of NLP (Natural Language Processing). (Any
four)

• Automatic Summarization: Used for summarizing the meaning of documents


and information, but also to understand the emotional meanings within the
information

• Sentiment Analysis: The goal of sentiment analysis is to identify sentiment


among social media post in terms of positive, negative or neutral where
emotion is not always explicitly expressed.

• Text classification : Used to assign predefined categories to a document and


organize it to help you find the information you need

• Virtual Assistants: With the help of speech recognition, these assistants can
not only detect our speech but can help with everyday tasks like setting an
alarm
Natural Language Processing
Natural Language Processing
Name any 2 applications of Natural Language Processing which are
used in the real-life scenario

Ans:

• Automatic Summarization

• Sentiment Analysis

• Text classification

• Virtual Assistants
Natural Language Processing
What will be the output of the word “studies” if we do the
following:

a. Lemmatization
b. Stemming

Ans:

The output of the word after lemmatization will be study.

The output of the word after stemming will be studi.


Natural Language Processing
How many tokens are there in the sentence given below?

Traffic Jams have become a common part of our lives nowadays.


Living in an urban area means you have to face traffic each and
every time you get out on the road. Mostly, school students opt
for buses to go to school.

Ans:

46 tokens are there in the given sentence


Natural Language Processing
What is a corpus?

Ans:
The term used to describe the whole textual data from all the
documents altogether is known as corpus.
Natural Language Processing
Identify any 2 stopwords in the given sentence:

Pollution is the introduction of contaminants into the natural


environment that cause adverse change. The three types of
pollution are air pollution, water pollution and land pollution.

Ans:

Stopwords in the given sentence are: is, the, of, that, into, are, and
Natural Language Processing
“Automatic summarization is used in NLP applications”. Is the
given statement correct? Justify your answer with an example.

Ans:

• Yes, the given statement is correct.

• Automatic summarization is relevant not only for summarizing


the meaning of documents and information, but also to
understand the emotional meanings within the information.

• Automatic summarization is especially relevant when used to


provide an overview of a news item or blog post
Natural Language Processing
Write any two applications of TFIDF

Ans:

1. Document Classification
Helps in classifying the type and genre of a document.

2. Topic Modelling
It helps in predicting the topic for a corpus.

3. Information Retrieval System


To extract the important information out of a corpus.

4. Stop word filtering


Helps in removing the unnecessary words out of a text body.
Natural Language Processing
Write down the steps to implement bag of words algorithm

Ans:

The steps to implement bag of words algorithm are as follows:

1. Text Normalization: Collect data and pre-process it

2. Create Dictionary: Make a list of all the unique words occurring in


the corpus. (Vocabulary)

3. Create document vectors: For each document in the corpus, find


out how many times the word from the unique list of words has
occurred.
Natural Language Processing
Explain from the given graph, how the value and occurrence of a word are
related in a corpus?

Ans:
Occurrence and value of a word are inversely proportional.
• The words which occur most (like stop words) have negligible value.
• These words occur the least but add the most value to the corpus.
Natural Language Processing
Classify each of the images according to how well the model’s output matches
the data samples:

1. The model’s output does not match the true function at all. Hence the model
is said to be under fitting and its accuracy is lower.

2. Model is trying to cover all the data samples even if they are out of
alignment. This model is said to be over fitting and has a lower accuracy

3. Model’s performance matches well with the true function and is called a
perfect fit
Evaluation
What is F1 Score in Evaluation?

Ans:
F1 score can be defined as the measure of balance between
precision and recall.
Evaluation
Give an example of a situation wherein false positive would have a
high cost associated with it.

Ans:

• Let us consider a model that predicts that a mail is spam or not.

• If the model always predicts that the mail is spam, people would
not look at it and eventually might lose important information.

• Here False Positive condition (Predicting the mail as spam while


the mail is not spam) would have a high cost.
Evaluation
Why should we avoid using the training data for evaluation?

Ans:

This is because our model will simply remember the whole training
set, and will therefore always predict the correct label for any point
in the training set
Evaluation
Which evaluation metric would be crucial in the following cases?
Justify your answer.

a. Mail Spamming

• If the model always predicts that the mail is spam, people would
not look at it and eventually might lose important information.

• False Positive condition would have a high cost. (predicting the


mail as spam while the mail is not spam)
Evaluation
Which evaluation metric would be crucial in the following cases?
Justify your answer.

b. Gold Mining

• A model saying that there exists treasure at a point and you keep
on digging there but it turns out that it is a false alarm.

• False Positive case is very costly. (predicting there is a treasure


but there is no treasure)
Evaluation
Which evaluation metric would be crucial in the following cases?
Justify your answer.

c. Viral Outbreak

• A deadly virus has started spreading and the model which is


supposed to predict a viral outbreak does not detect it.

• The virus might spread widely and infect a lot of


people. Hence, False Negative can be dangerous.
Evaluation
What are the possible reasons for an AI model not being efficient? Explain.

a. Lack of Training Data: If the data is not sufficient for developing an AI Model,
or if the data is missed while training the model, it will not be efficient.

b. Unauthenticated Data / Wrong Data: If the data is not authenticated and


correct, then the model will not give good results.

c. Inefficient coding / Wrong Algorithms: If the written algorithms are not


correct and relevant, Model will not give desired output. Not Tested: If the model
is not tested properly, then it will not be efficient.
Evaluation
Calculate Accuracy, Precision, Recall and F1 Score for the following Confusion
Matrix on Heart Attack Risk. Also suggest which metric would not be a good
evaluation parameter here and why?

Precision: percentage of true positive cases versus all the cases where the prediction is true.
Recall: It is defined as the fraction of positive cases that are correctly identified

False Positive (impacts Precision): A person is predicted as high risk but does not
have heart attack.

False Negative (impacts Recall): A person is predicted as low risk but has heart
attack.

Therefore, False Negatives miss actual heart patients, hence recall metric need
more improvement. False Negatives are more dangerous than False Positives.
Evaluation
Calculate Accuracy, Precision, Recall and F1 Score for the following Confusion
Matrix on SPAM FILTERING: Also suggest which metric would not be a good
evaluation parameter here and why?

Precision: percentage of true positive cases versus all the cases where the prediction is true.
Recall: It is defined as the fraction of positive cases that are correctly identified

False Positive (impacts Precision): Mail is predicted as “spam” but it is not.

False Negative (impacts Recall): Mail is predicted as “not spam” but spam

Of course, too many False Negatives will make the Spam Filter ineffective. But
False Positives may cause important mails to be missed. Hence, Precision is more
important to improve
Evaluation
Thank you!

You might also like