Professional Documents
Culture Documents
Q.1. What is data?
Ans. Information is a processed data which has been placed in a meaningful and useful
context for an end user. It is placed in a proper context for a human user.
1. Timeliness, 2. Accuracy,
3. Appropriateness, 4. Frequency,
5. Conciseness, 6. Relevancy,
Ans. Neural networks are computing systems modeled after the brain’s mesh like network
of interconnected process Elements called neurons. These are very simpler in
architecture.
Ans. The basic objective of AI (also called heuristic programming, machine intelligence, or
the simulation of cognitive behaviour) is to enable computers to perform such intellectual
tasks as decision making, problem solving, perception, understanding human
communication (in any language, and translate among them), and the like. Proof of this
objective is the blind test suggested by Alan Turing in the 1930s: if an observer who
cannot see the actors (computer and human) cannot tell the difference between them, the
objective is satisfied.
Ans. A genetic algorithm is a search heuristic that is inspired by Charles Darwin's
theory of natural evolution. A genetic algorithm is a search-based algorithm used for
solving optimization problems in machine learning. This algorithm is important
because it solves difficult problems that would take a long time to solve.
Ans. Machine learning is a method of data analysis that automates analytical model
building. It is a branch of artificial intelligence based on the idea that systems can learn
from data, identify patterns and make decisions with minimal human intervention.
Ans. Information: Processed data which is used to trigger certain actions or gain better
understanding of what the data implies is called information.
‘Information is a data that has been processed into a form which is meaningful to the
recipients and is of real or perceived value in the current or the prospective actions or
decisions of the recipient. -Davis and Olson
This recognizes both the value of information in taking a particular decision and in affecting
the decisions or actions which are to be taken in the future. The resources of information
are reusable and they does not lose their value even after the information has been
retrieved and used. Information can be processed and even can be used to draw
generalized conclusions or knowledge. It can be said as a type of organized data.
S.N
Data Information
o.
Data refers to detailed facts about Information refers to only those events
1.
any event. which are concerned with entity.
Objects
Events
Performance
Facts
Meta-Knowledge
Knowledge-base
Many techniques have been developed to deduce knowledge from an expert. They are
termed as
c) Hierarchy-Generation Techniques
d) Protocol Analysis Techniques
f) Sorting Techniques
Neural networks are flexible and can be used for both regression and classification
problems. Any data which can be made numeric can be used in the model, as
neural network is a mathematical model with approximation functions.
Neural networks are good to model with nonlinear data with large number of inputs;
for example, images. It is reliable in an approach of tasks involving many features. It
works by splitting the problem of classification into a layered network of simpler
elements.
Neural networks can be trained with any number of inputs and layers.
Neural networks are black boxes, meaning we cannot know how much each
independent variable is influencing the dependent variables.
Neural networks depend a lot on training data. This leads to the problem of over-
fitting and generalization. The mode relies more on the training data and may be
tuned to the data.
Ans. Machine learning is a method of data analysis that automates analytical model
building. It is a branch of artificial intelligence based on the idea that systems can learn
from data, identify patterns and make decisions with minimal human intervention.
What's required to Did you know?
create good machine
learning systems? In machine learning, a target is
called a label.
Data preparation
capabilities. In statistics, a target is called a
dependent variable.
Algorithms – basic and
advanced. A variable in statistics is called
a feature in machine learning.
Automation and
iterative processes. A transformation in statistics is
called feature creation in
Scalability. machine learning.
Ensemble modeling.
A decision tree helps to decide whether the net gain from a decision is worthwhile.
Let's look at an example of how a decision tree is constructed. We'll use the following data:
A decision tree starts with a decision to be made and the options that can be taken. Don't
forget that there is always an option to decide to do nothing!
The first task is to add possible outcomes to the tree (note: circles represent uncertain
outcomes)
Next we add in the associated costs, outcome probabilities and financial results for each
outcome.
Probability is
If all the outcomes of an event are considered, the total probability must add up to 1
Finally we complete the maths in the model by calculating:
Expected value:
Net gain:
Net gain is calculated by adding together the expected value of each outcome and
deducting the costs associated with the decision.
Ans. Decision trees are useful tools, particularly for situations where financial data and
probability of outcomes are relatively reliable. They are used to compare the costs and
likely values of decision pathways that a business might take. They often include decision
alternatives that lead to multiple possible outcomes, with the likelihood of each outcome
being measured numerically.
A decision tree is a branched flowchart showing multiple pathways for potential decisions
and outcomes. The tree starts with what is called a decision node, which signifies that a
decision must be made.
From the decision node, a branch is created for each of the alternative choices under
consideration. The initial decision might lead to another decision, in which case a new
decision node is created and new branches are added to show each alternative pathway
for the new decision. The result is a series of decision pathways. The flowchart might
include only one or two decisions with only one or two alternatives, or it can become a
complex sequence of many decisions with many alternatives at each node.
Along the decision pathway, there is usually some point at which a decision leads to an
uncertain outcome. That is, a decision could result in multiple possible outcomes, so an
uncertainty node is added to the tree at that point. Branches come from that uncertainty
node showing the different possible outcomes.
Eventually, each pathway reaches a final outcome. The decision tree, then, is a
combination of decision nodes, uncertainty nodes, branches coming from each of these
nodes, and final outcomes as the result of the pathways.
Even in only this simple form, a decision tree is useful to show the possibilities for a
decision. However, a decision tree becomes especially useful when numerical data is
added.
First, each decision usually involves costs. If a company decides to produce a product,
engage in market research, advertise, or any other number of activities, the predicted
costs for those decisions are written on the appropriate branch of the decision tree. Also,
each pathway eventually leads to an outcome that usually results in income. The predicted
amount of income provided by each outcome is added to that branch of the decision tree.
The other numerical data that needs to be provided is the probability of each outcome from
the uncertainty nodes. If an uncertainty node has two branches that are both equally likely,
each should be labelled with a 50 percent, or 0.5, probability. Alternatively, an uncertainty
node might have three branches with respective probabilities of 60 percent, 30 percent,
and 10 percent. In each case, the total of the percentages at each uncertainty node will be
100 percent, representing all possibilities for that node.
With this numerical data, decision makers can calculate the likely return value for each
decision pathway. The value of each final outcome must be multiplied by the probability
that the outcome occurs. The total of the possibilities along each branch represents the
total predicted value for that decision pathway. The costs involved in that decision pathway
must be subtracted to see the final profit that pathway represents.
Long Answer
Q.1. What is Artificial Intelligence all about? Why AI is important for business?
Ans. Artificial intelligence (AI) makes it possible for machines to learn from experience,
adjust to new inputs and perform human-like tasks. Most AI examples that you hear about
today – from chess-playing computers to self-driving cars – rely heavily on deep learning
and natural language processing. Using these technologies, computers can be trained to
accomplish specific tasks by processing large amounts of data and recognizing patterns in
the data.
The term artificial intelligence was coined in 1956, but AI has become more popular today
thanks to increased data volumes, advanced algorithms, and improvements in computing
power and storage.
Early AI research in the 1950s explored topics like problem solving and symbolic methods.
In the 1960s, the US Department of Defense took interest in this type of work and began
training computers to mimic basic human reasoning. For example, the Défense Advanced
Research Projects Agency (DARPA) completed street mapping projects in the 1970s. And
DARPA produced intelligent personal assistants in 2003, long before Siri, Alexa or Cortana
were household names.
This early work paved the way for the automation and formal reasoning that we see in
computers today, including decision support systems and smart search systems that can
be designed to complement and augment human abilities.
While Hollywood movies and science fiction novels depict AI as human-like robots that
take over the world, the current evolution of AI technologies isn’t that scary – or quite that
smart. Instead, AI has evolved to provide many specific benefits in every industry. Keep
reading for modern examples of artificial intelligence in health care, retail and more.
Importance of AI
AI adds intelligence to existing products. Many products you already use will be
improved with AI capabilities, much like Siri was added as a feature to a new
generation of Apple products. Automation, conversational platforms, bots and smart
machines can be combined with large amounts of data to improve many
technologies. Upgrades at home and in the workplace, range from security
intelligence and smart cams to investment analysis.
AI analyzes more and deeper data using neural networks that have many hidden
layers. Building a fraud detection system with five hidden layers used to be
impossible. All that has changed with incredible computer power and big data. You
need lots of data to train deep learning models because they learn directly from the
data.
AI gets the most out of data. When algorithms are self-learning, the data itself is
an asset. The answers are in the data. You just have to apply AI to find them. Since
the role of the data is now more important than ever, it can create a competitive
advantage. If you have the best data in a competitive industry, even if everyone is
applying similar techniques, the best data will win.
Ans. Many companies are currently investing in data and artificial intelligence (AI). Since
the terminology varies, the activities may be called AI, advanced analytics, data science, or
machine learning, but the goals are the same: to increase revenues and efficiency in
current business and to develop new data-enabled offerings. In addition, many companies
see an increasing responsibility to contribute their AI expertise toward humanitarian and
social matters. It is well understood that to stay competitive in the digital economy, the
company’s internal processes and products need to be smart—and smartness comes from
data and AI.
The reality is that there are no shortcuts. Amazon, Google, Apple, and Facebook all used
very different business strategies to gain their current market dominance and global
influence, but their common success is arguably due to their foresight in understanding the
value of data and positioning themselves early. They worked from the inside out, placing
continuous emphasis on human capability building, alongside developing, testing, and
deploying the top technologies internally, so that they could offer the best to their
customers. For established, non-digital companies the road is even rockier. Old
companies have established ways of working, digitally immature people, and legacy
infrastructure.
Setting the Data and AI Vision
The premise for successful data and AI strategy is to know your business goals. What are
your must-win battles? Where do you need to succeed in the future? Access to data will
help in the definition of business priorities, but it is important to remember that data and AI
will not solve your issues in business models, products, and services. Proper uses of data
and AI will help you make more informed decisions, obtain information faster, automate
processes, and enable delivery faster than a human mind—but they will not construct or
replace the lack of business vision and ideas.
Once you have a solid understanding of the data and AI use cases that help your current
business, new data-driven business opportunities should be investigated. These include
data as a business (e.g., selling data) and data partnerships (where new offerings are
created by pooling data from several organizations). Neither topic is easy, but the
opportunities are worth looking into.
The availability of high-quality data is the foundation for successful, productized AI. Data
can be called an asset if it is structured according to the FAIR principles (Findable–
Accessible–Interoperable–Reusable) as suggested by the European Commission (2018).
Data that resides in various systems, in different formats and ontologies, or misses key
attributes (such as unique identifiers), is not an asset. If the data asset is not reusable,
every data science/AI activity will be a separate, possibly large IT exercise. The principle of
‘build once—use many’ is pivotal for maximizing the value of data assets. For example, for
the personalization of an online service, you might want to use behavioral data from the
online and mobile channels, Customer Relationship Management (CRM) data, and
consumer online and offline transactions—not only data from the online service itself. The
goal of a productized data asset is to support all use cases.
Solution Architecture and Technology
Solution architecture and technology refer to the technical side of the data asset
management. Apart from digital native companies, existing companies typically have
plenty of legacy infrastructure. After defining the business & AI vision and conducting data
due diligence, the next step is to have an experienced data and solution architect take a
critical look into the current technical architecture and define the target architecture and its
development roadmap. This task, too, should follow the end-to-end use case logic
accounting for data collection from operating systems (e.g., CRM, Enterprise Resource
Planning (ERP)), data warehouses, cloud environments, analytical environments, and
business-interfacing systems.
Data protection and privacy is of key interest to consumers and those with access to
consumer data. Data protection relates to data collection, processing, and utilization.
According to the General Data Protection Regulation (GDPR) of the European Union
(European Parliament and the Council, 2016), the legitimate interest of data processing
must be defined, and the user informed about the collection, processing, and combination
of their data. The user must be offered mechanisms to opt out and object to data
processing. The level of user identification between data flows between different data-
processing systems must be defined.
Human Skills
The data and AI journey requires new roles in an organization. While the exact role
terminology varies, data and AI roles are needed for four different levels of business
processes:
The optimal data and AI organization structure depends on the overall company size and
organization, culture, the level of AI maturity, and the type of data/AI tasks.
To get things going, establishing a center of excellence (CoE) generally helps to bring
focus to the topic. Depending on where the CoE sits in a company, it will be responsible
for different areas. The CoE may consist of data science and business intelligence teams
only, while the technical teams (data engineering, platforms) reside in IT. Alternatively, the
CoE may cover the technology side, while the data scientists sit in business units. The
optimal setup needs to be carefully analyzed. In our experience, most companies will
benefit from a common technical infrastructure and data asset management, as well as
some form of centralized data science team, which solves the most difficult use cases and
creates a scalable AI portfolio for the use of all business units and functions. The AI
strategists should optimally sit within business units to drive the AI use cases forward, but
in the beginning, they can also reside with the data science teams and help business from
there.
Operating Model
A closely related topic to data and AI organization is the operating model between different
business units. Prioritized business use cases should drive the development of specific
data and AI capabilities as identified within the initial strategic assessment. In order to
have the data experts work on the most important use cases, business leaders should
establish an AI steering group or include the data and AI development into the existing
leadership team meetings.
The head of the CoE (chief data and AI officer) should drive the agenda in the meetings. In
addition to a cross-unit steering group, individual use-case areas should have their own,
operational steering groups.
For the first years of the CoE, we have seen that it often makes sense to centralize
budgets. Budgets drive prioritization, and without a centralized budget, data and AI
activities will not scale up. Typically, individual business units and functions do not want to
carry the costs for companywide capability building (e.g., common data models,
infrastructure, application programming interfaces) even if it would be optimal for the whole
company.
Like the data asset, algorithms can also be treated as the algorithm asset. That means
that over time, the portfolio of machine-learning/AI algorithms will become FAIR. Every
new analytical modeling exercise does not need to start from scratch, but builds on top of
tested code. This will make the data science team more efficient over time. Like software
coding teams, it requires the data science team to use common code repositories and
standards.
It is also important to establish maintenance processes for the data and algorithm assets.
If maintenance processes remain undeployed, development teams remain in a state of
stagnation as their efforts go into keeping the current assets in production. By applying
maintenance processes to data and algorithm portfolios, new solutions can be discovered
and developed.
This article will take the reader through the basics of this algorithm and explains how it
works. It also explains how it has been applied in various fields and highlights some of its
limitations.
The following are some of the basic terminologies that can help us to understand genetic
algorithms:
Population: This is a subset of all the probable solutions that can solve the given
problem.
A genetic algorithm (GA) is a heuristic search algorithm used to solve search and
optimization problems. This algorithm is a subset of evolutionary algorithms, which are
used in computation. Genetic algorithms employ the concept of genetics and natural
selection to provide solutions to problems.
GAs are also based on the behavior of chromosomes and their genetic structure. Every
chromosome plays the role of providing a possible solution. The fitness function helps in
providing the characteristics of all individuals within the population. The greater the
function, the better the solution.
Genetic algorithms follow the following phases to solve complex optimization problems:
Initialization
The genetic algorithm starts by generating an initial population. This initial population
consists of all the probable solutions to the given problem. The most popular technique for
initialization is the use of random binary strings.
Fitness assignment
The fitness function helps in establishing the fitness of all individuals in the population. It
assigns a fitness score to every individual, which further determines the probability of
being chosen for reproduction. The higher the fitness score, the higher the chances of
being chosen for reproduction.
Selection
In this phase, individuals are selected for the reproduction of offspring. The selected
individuals are then arranged in pairs of two to enhance reproduction. These individuals
pass on their genes to the next generation.
The main objective of this phase is to establish the region with high chances of generating
the best solution to the problem (better than the previous generation). The genetic
algorithm uses the fitness proportionate selection technique to ensure that useful solutions
are used for recombination.
Reproduction
This phase involves the creation of a child population. The algorithm employs variation
operators that are applied to the parent population. The two main operators in this phase
include crossover and mutation.
Image Source
Replacement
Generational replacement takes place in this phase, which is a replacement of the old
population with the new child population. The new population consists of higher fitness
scores than the old population, which is an indication that an improved solution has been
generated.
Termination
After replacement has been done, a stopping criterion is used to provide the basis for
termination. The algorithm will terminate after the threshold fitness solution has been
attained. It will identify this solution as the best solution in the population.
Q.4. Explain the relationship between various branches of AI. How does machine
learning work?
Ans. The words data science and machine learning are often used in conjunction,
however, if you are planning to build a career in one of these, it is important to know the
differences between machine learning and data science.
Before doing so, we need to understand a few important terms that are related but
different.
Machine learning – Think of ML as a subset of AI. Same way as humans learns with
experience, machines can learn with data (experience) rather than just following simple
instructions. This is called as machine learning. Machine learning uses 3 types of
algorithms – supervised, unsupervised and reinforced.
Deep learning – Deep learning is a part of Machine learning, which is based on artificial
neural networks (think of neural networks similar to our own human brain). Unlike machine
learning, deep learning uses multiple layers and structures algorithms such that an artificial
neural network is created that learns and makes decisions on its own!
Big Data – Humongous sets of data that can be computationally analysed to understand
and process trends, patterns and human behaviour.
Data Science – How is all the big data analysed? Fine, the machine learns on its own
through machine learning algorithms – but how? Who gives the necessary inputs to a
machine for creating algorithms and models? No points for guessing that it is data science.
Data Science is a uses different methods, algorithms, processes, and systems to extract,
analyse and get insights from data.
If we were to see the relationship between all the above in a simple diagram, this is how it
would look like –
As we see above, Data science and machine learning are closely related and provide
useful insights and generate the necessary trends or ‘experience’. In both, we use
supervised methods of learning i.e. learning from huge data sets.
There are different types of machine learning algorithms, the most common being
clustering, matrix factorization, content-based, recommendations, collaborative filtering
and so on. Machine learning involves the 5 basic steps –
The huge set of data that we receive in the first step is split into the training set and testing
set and the model is built and test using the training set. A significant portion of data is
used for training purposes so that different conditions of input and output can be achieved
and the model built is closest to the required result (recommendation, human behaviour,
trends, etc…).
Once built, the model is tested for efficiency and accuracy using the test data so that it can
be cross-validated.
As we can see, Machine Learning comes into picture only during the data modelling phase
of the Data Science lifecycle. Data Science thus contains machine learning.
Data science can work with manual Machine learning cannot exist without data
methods as well though they are not as science as data has to be first prepared to
efficient as machine algorithms create, train and test the model.
Data science helps define new problems The problem is already known and tools
that can be solved using machine learning and techniques are used to find an
techniques and statistical analysis. intelligent solution.
Well, you cannot choose one. Both Data Science and Machine learning go hand in hand.
Machines cannot learn without data and Data Science is better done with machine
learning as we have discussed above. In the future, data scientists will need at least a
basic understanding of machine learning to model and interpret big data that is generated
every single day.
Q.6. What are support vector machines used for? List Various applications of SVM.
Ans. Support vector machines (SVMs) are a set of supervised learning methods used
for classification, regression and outliers’ detection.
If the number of features is much greater than the number of samples, avoid over-
fitting in choosing Kernel functions and regularization term is crucial.
SVMs do not directly provide probability estimates, these are calculated using an
expensive five-fold cross-validation .
Face detection – SVMc classify parts of the image as a face and non-face and create a
square boundary around the face.
Text and hypertext categorization – SVMs allow Text and hypertext categorization for
both inductive and transudative models. They use training data to classify documents
into different categories. It categorizes on the basis of the score generated and then
compares with the threshold value.
Classification of images – Use of SVMs provides better search accuracy for image
classification. It provides better accuracy in comparison to the traditional query-based
searching techniques.
Protein fold and remote homology detection – Apply SVM algorithms for protein
remote homology detection.
Correct option is D
A. Geoffrey Chaucer
B. Geoffrey Hill
Correct option is C
Correct option is C
4. Choose the correct option regarding machine learning (ML) and artificial
intelligence (AI)
Correct option is D
5. Which of the factors affect the performance of the learner system does not
include?
C. Training scenario
D. Type of feedback
Correct option is A
Correct option is D
7. Successful applications of ML
Correct option is E
A. Analogy
B. Introduction
C. Memorization
D. Deduction
Correct option is B
A. Empirical
B. Logical
C. Phonological
D. Syntactic
Correct option is A
Correct option is E
A. Decimal
B. Hexadecimal
C. Boolean
A. Naive Bayesian
B. PCA
C. Linear Regression
Correct option is B
Artificial Intelligence
Deep Learning
Data Statistics
A. Only (i)
C. All
D. None
Correct option is B
14. What kind of learning algorithm for “Facial identities or facial expressions”?
A. Prediction
B. Recognition Patterns
C. Generating Patterns
Correct option is B
A. Unsupervised Learning
B. Supervised Learning
C. Semi-unsupervised Learning
D. Reinforcement Learning
Correct option is C
16. Real-Time decisions, Game AI, Learning Tasks, Skill Aquisition, and Robot
Navigation are applications of which of the folowing
B. Reinforcement Learning
Correct option is B
D. Reinforcement Learning
Correct option is B
D. Reinforcement Learning
Correct option is B
19. Which of the following is not function of symbolic in the various function
representation of Machine Learning?
D. Decision Trees
Correct option is B
20. Which of the following is not numerical functions in the various function
representation of Machine Learning?
A. Neural Network
C. Case-based
D. Linear Regression
Correct option is C
21. FIND-S Algorithm starts from the most specific hypothesis and generalize it
by considering only
A. Negative
B. Positive
C. Negative or Positive
Correct option is B
A. Negative
B. Positive
C. Both
Correct option is A
A. Solution Space
B. Version Space
C. Elimination Space
Correct option is B
24. Inductive learning is based on the knowledge that if something happens a lot
it is likely to be generally
A. True
B. False Answer
Correct option is A
25. Inductive learning takes examples and generalizes rather than starting
with
A. Inductive
B. Existing
C. Deductive
D. None of these
Correct option is B
26. A drawback of the FIND-S is that it assumes the consistency within the
training set
A. True
B. False
Correct option is A
Pruning
A. All
D. None
Correct option is B
28. Which of the following is a widely used and effective machine learning
algorithm based on the idea of bagging?
A. Decision Tree
B. Random Forest
C. Regression
D. Classification
Correct option is B
29. To find the minimum or the maximum of a function, we set the gradient to
zero because which of the following
D. None of these
Correct option is B
C. Factor analysis
Correct option is A
Correct option is A
A. All
B. Only (ii)
Correct option is C
Unit 2
Ans. Supervised learning is typically done in the context of classification, when we want to
map input to output labels, or regression, when we want to map input to a continuous
output.
Neural Networks.
Ans. As the name implies, multivariate regression is a technique that estimates a single
regression model with more than one outcome variable. When there is more than one
predictor variable in a multivariate regression model, the model is a multivariate multiple
regression.
Ans. Decision trees help you to evaluate your options. Decision Trees are excellent
tools for helping you to choose between several courses of action. They provide a highly
effective structure within which you can lay out options and investigate the possible
outcomes of choosing those options.
Ans. As the goal of a decision tree is that it makes the optimal choice at the end of each
node it needs an algorithm that is capable of doing just that. That algorithm is known as
Hunt’s algorithm, which is both greedy, and recursive. Greedy meaning that at step it
makes the most optimal decision and recursive meaning it splits the larger question into
smaller questions and resolves them the same way.
Ans. The decision to split at each node is made according to the metric called purity. A
node is 100% impure when a node is split evenly 50/50 and 100% pure when all of its data
belongs to a single class.
Q9. What is the difference between supervised and unsupervised machine learning?
Ans. Supervised learning is requiring training labelled data’s. For example, in order to the
classification (a supervised learning task), you’ll need to the first label the data you’ll use to
the train the model to classify data into your labelled groups. Unsupervised learning, in
contrast, does not a require labelling data explicitly.
Q1. What are the different types of Learning/ Training models in ML?
k-NN is a supervised algorithm used for classification. What this means is that we have
some labeled data upfront which we provide to the model for it to understand the dynamics
within that data i.e. train. It then uses those learnings to make inferences on the unseen
data i.e. test. In the case of classification this labeled data is discrete in nature.
Steps
2. Split the original labeled dataset into training and test data.
5. Run k-NN a few times, changing k and checking the evaluation measure.
6. In each iteration, k neighbors vote, majority vote wins and becomes the
ultimate prediction
8. Once you’ve chosen k, use the same training set and now create a new
test set with the people’s ages and incomes that you have no labels for,
and want to predict.
Ans. Regression analysis helps in the prediction of a continuous variable. There are
various scenarios in the real world where we need some future predictions such as
weather condition, sales prediction, marketing trends, etc., for such case we need some
technology which can make predictions more accurately. So for such case we need
Regression analysis which is a statistical method and used in machine learning and data
science. Below are some other reasons for using Regression analysis:
o Regression estimates the relationship between the target and the independent variable.
o By performing the regression, we can confidently determine the most important factor,
the least important factor, and how each factor is affecting the other factors.
Types of Regression
There are various types of regressions which are used in data science and machine
learning. Each type has its own importance on different scenarios, but at the core, all the
regression methods analyze the effect of the independent variable on dependent
variables.
o Linear Regression
o Logistic Regression
o Polynomial Regression
o Ridge Regression
o Lasso Regression
When we provide the input values (data) to the function, it gives the S-curve as follows:
o Binary(0/1, pass/fail)
Q1. What is supervised learning? Explain is types, also explain some of the
important use cases of Supervised learning.
Regression Model
Regression identifies the patterns in the sample data and predicts continuous outcomes.
This algorithm understands the numbers, values, correlations, and groupings. This model
is best for the prediction of products and stocks.
In linear regression, the algorithms assume that there lies a linear relationship between
two variables, input (X) and output (Y). The input variable is an independent variable,
whereas the output variable is a dependent variable. It uses the function, calculates, and
plots the input to a continuous value for output.
In logistic regression, the algorithms predict the discrete values for the set of
independent variables that it has on the list. The algorithm predicts the probability of the
new data so that the output ranges between 0 and 1.
Classification Model
In the
classification technique, the input data is labeled based on historical data. These
algorithms are specially trained to identify particular types of objects. Processing and
analyzing the labeled sample data, weather forecasting, identifying pictures is simple.
Some of the popular classification models are – Decision Trees, Naive Bayes
Classifiers, and Random Forests.
In Naive Bayes Classifiers, the algorithms assume that all the datasets are independent
of each other. It works on large datasets and uses Direct Acyclic Graph (DAG) for
classification purposes. Naive Bayes is suitable for solving multi-class prediction models.
It’s quick and easy to save a lot of your time and handle complex data.
In Random Forests, the algorithm creates decision trees on data samples and then gets
the prediction for each try until it selects the best solutions. It is an advanced version of
decision trees because it reduces the overfitting cons of decision trees by averaging the
result.
In Neural Networks, the algorithms get designed to cluster raw input and recognize
patterns. Neural networks require advanced computational resources. It gets complicated
when there are multiple observations. In other words, data scientists call it ‘black-box’
algorithms.
Supervised learning has many applications across industries and one of the best
algorithms for finding more accurate results. Here is a list of well-known applications of
supervised learning.
Object Recognitions – one of the popular applications is Recatch (prove you are not a
robot.) It is where you have to choose multiple images as per the instruction to get
confirmed that you are a human. You can only access if you can identify correctly, or else
you have to keep on trying to get the correct identifications.
Ans. Support Vector Machine or SVM is one of the most popular Supervised Learning
algorithms, which is used for Classification as well as Regression problems. However,
primarily, it is used for Classification problems in Machine Learning. The goal of the SVM
algorithm is to create the best line or decision boundary that can segregate n-dimensional
space into classes so that we can easily put the new data point in the correct category in
the future. This best decision boundary is called a hyperplane.
SVM chooses the extreme points/vectors that help in creating the hyperplane. These
extreme cases are called as support vectors, and hence algorithm is termed as Support
Vector Machine. Consider the below diagram in which there are two different categories that
are classified using a decision boundary or hyperplane:
Example: SVM can be understood with the example that we have used in the KNN classifier.
Suppose we see a strange cat that also has some features of dogs, so if we want a model
that can accurately identify whether it is a cat or dog, so such a model can be created by
using the SVM algorithm. We will first train our model with lots of images of cats and dogs
so that it can learn about different features of cats and dogs, and then we test it with this
strange creature. So as support vector creates a decision boundary between these two
data (cat and dog) and choose extreme cases (support vectors), it will see the extreme case
of cat and dog. On the basis of the support vectors, it will classify it as a cat. Consider the
below diagram:
SVM algorithm can be used for Face detection, image classification, text categorization, etc.
Types of SVM
o Linear SVM: Linear SVM is used for linearly separable data, which means if a
dataset can be classified into two classes by using a single straight line, then such data is
termed as linearly separable data, and classifier is used called as Linear SVM classifier.
Ans. Supervised learning uses a training set to teach models to yield the desired output.
This training dataset includes inputs and correct outputs, which allow the model to learn
over time. The algorithm measures its accuracy through the loss function, adjusting until
the error has been sufficiently minimized.
Supervised learning can be separated into two types of problems when data mining—
classification and regression:
Although supervised learning can offer businesses advantages, such as deep data
insights and improved automation, there are some challenges when building sustainable
supervised learning models. The following are some of these challenges:
Supervised learning models can be used to build and advance a number of business
applications, including the following:
Case Study1
Solution:
Interestingly, the groups used by Tech Emergence provide only a vague understanding of
how use cases are distributed among different machine learning tasks. For example, Big
Data can be applied to any of the mentioned groups, given that the algorithms process
large and poorly structured datasets, regardless of the industry and operations field this
data comes from. Also, sales tasks usually intersect marketing ones when it comes to
analytics. That’s why we suggest a slightly different breakdown of the most common use
cases.
Digital marketing and online-driven sales are the first application fields that you may think
of for machine learning adoption. People interact with the web and leave a detailed
footprint to be analyzed. While there are tangible results in unsupervised learning
techniques for marketing and sales, the largest value impact is in the supervised learning
field. Let’s have a look.
Lifetime Value. A customer lifetime value that we mentioned before is usually measured
in the net profit this customer brings to a company in the longer run. If you’ve been
tracking most of your customers and accurately documenting their in-funnel and further
purchase behavior, you have enough data to make predictions about most budding
customers early and target sales effort toward them.
Churn. The churn rate defines the number of customers who cease to complete target
actions (e.g. add to cart, leave a comment, checkout, etc.) during a given period. Similar to
lifetime value predictions, sorting “likely-to-churn-soon” from engaged customers will allow
you to 1) analyze the reasons for such behavior, 2) refocus and personalize offerings for
different groups of churning customers.
Sentiment analysis. Skimming through thousands of feedback posts in social media and
comments sections is painstaking work, especially in B2C after a new product or feature
rollout. Sentiment analysis backed by natural language processing allows for aggregating
and yielding analytics on customer feedback. You may play with sentiment analysis
using Google Cloud Natural Language API to understand how this works and what kinds
of analytics may be available.
Here’s how the API analyzes an angry comment by a person who purchased HTC Vive, a
virtual reality headset, on Amazon. Score defines sentiment itself, ranging from very
negative to very positive. Magnitude shows the strength of a sentiment regardless of its
score.
People analytics
Tracking internal operations to get insights is also a powerful task for machine learning.
Most digitalized companies today have enough employee tracking software and historic
data to make predictions on employee performance, retention, and other fundamental
problems of human resource management.
Sales performance. Is there a way to understand why one middle-level sales executive
brings twice as much lead conversion than another middle-level exec sitting in the same
office? Technically, they both send emails, set calls, and participate in conferences, which
somehow result in conversions or lack thereof. Any time we talk about what drives
salespeople performance, we make assumptions prone to bias. A good example of ML
use here is People.ai, a startup which tries to address the problem by tracking all the sales
data, including emails, calls, and CRM interactions to use this data as a supervised
learning set and predict which kinds of actions bring better results. Basically, the algorithm
aids in developing a playbook for sales reps based on successful cases.
Retention. Similar tracking techniques, the use of text sentiment and other metadata
analysis (from emails and social media posts) can be applied to detect possible job-
hopping behavior among candidates.
Human resource allocation. You can use historic data from HR software – sick days,
vacations, holidays, etc. – to make broader predictions on your
workforce. Deloitte disclosed that a number of automotive companies are learning from the
patterns of unscheduled absences to forecast the periods when people are likely to take a
day off and reserve more workforce.
Currently, time-series data can be applied both for internal use to have better planning
capabilities and for customer-facing applications as well. For instance, eCommerce
websites may be interested in tracking time-series data related to Black Friday to better set
discount campaigns and drive more sales. As for the example of customer-facing use,
AltexSoft helped Fareboom.com, airfare provider, build a price-prediction feature that
allows Fareboom customers to choose the best time to purchase their tickets.
Source: Fareboom.com
Security
Spam filtering. According to Statista, 56.87 percent of all emails were spam in March
2017. This number actually keeps dropping – in April 2014 the share of spam was 71.1
percent – as increasingly more email services have adopted spam-filtering algorithms
backed by ML models. The abundance of spam examples provides enough both textual
and metadata to sort out this type of correspondence.
Malicious emails and links. Detecting phishing attacks becomes critical for all IT
departments in organizations, considering the recent case of the Petya virus, which was
distributed among corporate infrastructures through email attachments. Currently, there
are many public datasets that provide labeled records of malware or even URLs that can
be used directly to build classifying models to protect your organization.
Fraud detection. As fraudulent actions are very domain-specific, they mostly rely on
private datasets that organizations have. For example, many banks that have fraud cases
in their data use supervised fraud detection techniques to block potentially fraudulent
money transactions accounting for such variables as transaction time, location, money
amounts, etc.
Entertainment
Last but not the least in the group of supervised machine learning use cases is the
entertainment field, where users are directly interacting with algorithms. These can run the
gamut from face recognition and different visual alterations to turning camera pictures
into artwork-style images.
This path usually belongs to AI startups that plan acquisition and ship software that can be
embedded in other large market products. That’s exactly what happened to MSQRD, a
video filter app, that was acquired by Facebook. MSQRD was developed in three months.
MCQ’s
1. What is classification?
a) when the output variable is a category, such as “red” or “blue” or “disease” and “no
disease”.
b) when the output variable is a real value, such as “dollars” or “weight”.
Ans: Solution A
2. What is regression?
a) When the output variable is a category, such as “red” or “blue” or “disease” and “no
disease”.
b) When the output variable is a real value, such as “dollars” or “weight”.
Ans: Solution B
11. Supervised learning and unsupervised clustering both require at least one
a) hidden attribute.
b) output attribute.
c) input attribute.
d) categorical attribute.
Ans : Solution A
12. Supervised learning differs from unsupervised clustering in that supervised learning
requires
a) at least one input attribute.
b) input attributes to be categorical.
c) at least one output attribute.
d) output attributes to be categorical.
Ans : Solution B
13. A regression model in which more than one independent variable is used to predict the
dependent variable is called
a) a simple linear regression model
b) a multiple regression models
c) an independent model
d) none of the above
Ans : Solution C
14. A term used to describe the case when the independent variables in a multiple
regression model
are correlated is
a) Regression
b) correlation
c) multicollinearity
d) none of the above
Ans : Solution C
15. A multiple regression model has the form: y = 2 + 3×1 + 4×2. As x1 increases by 1 unit
(holding x2 constant), y will
a) increase by 3 units
b) decrease by 3 units
c) increase by 4 units
d) decrease by 4 units
Ans : Solution C
17. A measure of goodness of fit for the estimated regression equation is the
a) multiple coefficient of determination
b) mean square due to error
c) mean square due to regression
d) none of the above
Ans : Solution C
20. For a multiple regression model, SST = 200 and SSE = 50. The multiple coefficient of
determination is
a) 0.25
b) 4.00
c) 0.75
d) none of the above
Ans : Solution B
26. Which statement is true about neural network and linear regression models?
a) Both models require input attributes to be numeric.
b) Both models require numeric attributes to range between 0 and 1.
c) The output of both models is a categorical attribute value.
d) Both techniques build models whose output is determined by a linear sum of weighted
input attribute values.
Ans : Solution A
28. The average positive difference between computed and desired outcome values.
a) root mean squared error
b) mean squared error
c) mean absolute error
d) mean positive error
Ans : Solution D
29. Selecting data so as to assure that each class is properly represented in both the
training and
test set.
a) cross validation
b) stratification
c) verification
d) bootstrapping
Ans : Solution B
30. The standard error is defined as the square root of this computation.
a) The sample variance divided by the total number of sample instances.
b) The population variance divided by the total number of sample instances.
c) The sample variance divided by the sample mean.
d) The population variance divided by the sample mean.
Ans : Solution A
31. Data used to optimize the parameter settings of a supervised learner model.
a) Training
b) Test
c) Verification
d) Validation
Ans : Solution D
Unit-3
Ans. Some use cases for unsupervised learning — more specifically, clustering — include:
Customer segmentation, or understanding different customer groups around which to build
marketing or other business strategies. Genetics, for example clustering DNA patterns to
analyze evolutionary biology.
Ans. K-means is a clustering algorithm that tries to partition a set of points into K sets
(clusters) such that the points in each cluster tend to be near each other. It is
unsupervised because the points have no external classification.
Short Answers.
Because I watched ‘Shameless’, Netflix recommends several other similar shows to watch.
But where is Netflix gathering those recommendations from? Considering it is trying to
predict the future with what show I am going to watch next, Netflix has nothing to base the
predictions or recommendations on (no clear definitive objective). Instead, Netflix looks at
other users who have also watched ‘Shameless’ in the past, and looks at what those users
watched in addition to ‘Shameless’. By doing so, Netflix is clustering its users together
based on similarity of interests. This is exactly how unsupervised learning works. Simply
clustering observations together based on similarity, hoping to make accurate conclusions
based on the clusters.
Back to DBSCAN. DBSCAN is a clustering method that is used in machine learning to
separate clusters of high density from clusters of low density. Given that DBSCAN is
a density based clustering algorithm, it does a great job of seeking areas in the data that
have a high density of observations, versus areas of the data that are not very dense with
observations. DBSCAN can sort data into clusters of varying shapes as well, another strong
advantage. DBSCAN works as such:
around that data point, and then counts how many data points fall
the cluster, by going through each individual point within the cluster,
and counting the number of other data points nearby. Take the graphic
Ans. Clusters and Productivity. Being part of a cluster allows companies to operate more
productively in sourcing inputs; accessing information, technology, and needed
institutions; coordinating with related companies; and measuring and motivating
improvement.
Ans. In Data Science and Machine Learning, KMeans and DBScan are two of the most
popular clustering(unsupervised) algorithms. These are both simple in implementation, but
DBScan is a bit more simple. I’ve just used both of them and I honestly found DBScan
more powerful and interesting in both aspects, implementation and performance.
Meaning that no single algorithm is the best for all the purposes. This means, that there are
situations where DBSCAN is very performant, while sometimes its performance is very bad.
Density clustering (for example DBSCAN) seems to correspond more to human intuitions of
clustering, rather than distance from a central clustering point (for example KMeans).
Density clustering algorithms use the concept of reachability i.e. how many neighbors has a
point within a radius. DBScan is more lovely because it doesn’t need parameter, k, which is
the number of clusters we are trying to find, which KMeans needs. When you don’t know
the number of clusters hidden in the dataset and there’s no way to visualize your dataset,
it’s a good decision to use DBScan. DBSCAN produces a varying number of clusters,
based on the input data.
As a side point, If DBScan fails and you need a clustering algorithm that
automatically detects the number of clusters in your dataset you can try MeanShift
algorithm.
Ans. Cluster analysis has been widely used in many applications such as business
intelligence, image pattern recognition, Web search, biology, and security. In business
intelligence, clustering can be used to organize a large number of customers into groups,
where customers within a group share strong similar characteristics. T his facilitates the
development of business strategies for enhanced customer relationship management.
Moreover, consider a consultant company with a large number of projects. To improve
project management, clustering can be applied to partition projects into categories based
on similarity so that project auditing and diagnosis (to improve project delivery and
outcomes) can be conducted effectively.
Q1. Define clustering. What are the different types of clustering explain in detail?
Clustering is a task of dividing the data sets into a certain number of clusters in such a
manner that the data points belonging to a cluster have similar characteristics. Clusters
are nothing but the grouping of data points such that the distance between the data points
within the clusters is minimal.
In other words, the clusters are regions where the density of similar data points is high. It is
generally used for the analysis of the data set, to find insightful data among huge data sets
and draw inferences from it. Generally, the clusters are seen in a spherical shape, but it is
not necessary as the clusters can be of any shape.
Clustering itself can be categorized into two types viz. Hard Clustering and Soft Clustering.
In hard clustering, one data point can belong to one cluster only. But in soft clustering, the
output provided is a probability likelihood of a data point belonging to each of the pre-
defined numbers of clusters.
Density-Based Clustering
In this method, the clusters are created based upon the density of the data points which
are represented in the data space. The regions that become dense due to the huge
number of data points residing in that region are considered as clusters.
The data points in the sparse region (the region where the data points are very less) are
considered as noise or outliers. The clusters created in these methods can be of arbitrary
shape. Following are the examples of Density-based clustering algorithms:
DBSCAN groups data points together based on the distance metric and criterion for a
minimum number of data points. It takes two parameters – eps and minimum points. Eps
indicates how close the data points should be to be considered as neighbors. The criterion
for minimum points should be completed to consider that region as a dense region.
It is similar in process to DBSCAN, but it attends to one of the drawbacks of the former
algorithm i.e. inability to form clusters from data of arbitrary density. It considers two more
parameters which are core distance and reachability distance. Core distance indicates
whether the data point being considered is core or not by setting a minimum value for it.
Reachability distance is the maximum of core distance and the value of distance metric
that is used for calculating the distance among two data points. One thing to consider
about reachability distance is that its value remains not defined if one of the data points is
a core point.
Hierarchical Clustering
Divisive is the opposite of Agglomerative, it starts off with all the points into one cluster and
divides them to create more clusters. These algorithms create a distance matrix of all the
existing clusters and perform the linkage between the clusters depending on the criteria of
the linkage. The clustering of the data points is represented by using a dendrogram. There
are different types of linkages: –
o Single Linkage: – In single linkage the distance between the two clusters is the
shortest distance between points in those two clusters.
o Complete Linkage: – In complete linkage, the distance between the two clusters is the
farthest distance between points in those two clusters.
o Average Linkage: – In average linkage the distance between the two clusters is the
average distance of every point in the cluster with every point in another cluster.
Fuzzy Clustering
In fuzzy clustering, the assignment of the data points in any of the clusters is not decisive.
Here, one data point can belong to more than one cluster. It provides the outcome as the
probability of the data point belonging to each of the clusters. One of the algorithms used
in fuzzy clustering is Fuzzy c-means clustering.
This algorithm is similar in process to the K-Means clustering and it differs in the
parameters that are involved in the computation like fuzzifier and membership values.
Partitioning Clustering
This method is one of the most popular choices for analysts to create clusters. In
partitioning clustering, the clusters are partitioned based upon the characteristics of the
data points. We need to specify the number of clusters to be created for this clustering
method. These clustering algorithms follow an iterative process to reassign the data points
between clusters based upon the distance. The algorithms that fall into this category are
as follows: –
o K-Means Clustering: – K-Means clustering is one of the most widely used algorithms.
It partitions the data points into k clusters based upon the distance metric used for the
clustering. The value of ‘k’ is to be defined by the user. The distance is calculated between
the data points and the centroids of the clusters.
The data point which is closest to the centroid of the cluster gets assigned to that cluster.
After an iteration, it computes the centroids of those clusters again and the process
continues until a pre-defined number of iterations are completed or when the centroids of
the clusters do not change after an iteration.
This algorithm is also called as k-medoid algorithm. It is also similar in process to the K-
means clustering algorithm with the difference being in the assignment of the center of the
cluster. In PAM, the medoid of the cluster has to be an input data point while this is not
true for K-means clustering as the average of all the data points in a cluster may not
belong to an input data point.
Grid-Based Clustering
In grid-based clustering, the data set is represented into a grid structure which comprises
of grids (also called cells). The overall approach in the algorithms of this method differs
from the rest of the algorithms.
They are more concerned with the value space surrounding the data points rather than the
data points themselves. One of the greatest advantages of these algorithms is its
reduction in computational complexity. This makes it appropriate for dealing with
humongous data sets.
After partitioning the data sets into cells, it computes the density of the cells which helps in
identifying the clusters. A few algorithms based on grid-based clustering are as follows: –
o STING (Statistical Information Grid Approach): – In STING, the data set is divided
recursively in a hierarchical manner. Each cell is further sub-divided into a different
number of cells. It captures the statistical measures of the cells which helps in answering
the queries in a small amount of time.
To understand how this works, let’s continue with the fruit example given above. With
unsupervised learning, you’ll provide the model with the input dataset (the pictures of
the fruits and their characteristics), but you will not provide the output (the names of
the fruits).
The model will use a suitable algorithm to train itself to divide the fruits into different
groups according to the most similar features between them. This kind of
unsupervised learning, called clustering, is the most common.
For example, if you were trying to segment potential consumers into groups for
marketing purposes, an unsupervised clustering method would be a great starting
point.
Because of this, factory workers documented errors in plain text (either in English or
their local language). The company wished to know the causes of common
manufacturing problems, but without a categorization of the errors it was impossible
to perform statistical analysis on the data.
MCQ’s
Unit-3
1. _____ terms are required for building a bayes model.
(A) 1
(B) 2
(C) 3
(D) 4
Answer
Correct option is C
2. Which of the following is the consequence between a node and its predecessors
while creating bayesian network?
(A) Conditionally independent
(B) Functionally dependent
(C) Both Conditionally dependant & Dependant
(D) Dependent
Answer
Correct option is A
3. Why it is needed to make probabilistic systems feasible in the world?
(A) Feasibility
(B) Reliability
(C) Crucial robustness
(D) None of the above
Answer
Correct option is C
4. Bayes rule can be used for:-
(A) Solving queries
(B) Increasing complexity
(C) Answering probabilistic query
(D) Decreasing complexity
Answer
Correct option is C
5. _____ provides way and means of weighing up the desirability of goals and the
likelihood of achieving them.
(A) Utility theory
(B) Decision theory
(C) Bayesian networks
(D) Probability theory
Answer
Correct option is A
Correct option is A
12. The compactness of the bayesian network can be described by
(A) Fully structured
(B) Locally structured
(C) Partially structured
(D) All of the above
Answer
Correct option is B
13. The Expectation Maximization Algorithm has been used to identify conserved
domains in unaligned proteins only. State True or False.
(A) True
(B) False
Answer
Correct option is B
14. Which of the following is correct about the Naive Bayes?
(A) Assumes that all the features in a dataset are independent
(B) Assumes that all the features in a dataset are equally important
(C) Both
(D) All of the above
Answer
Correct option is C
15. Which of the following is false regarding EM Algorithm?
(A) The alignment provides an estimate of the base or amino acid composition of each
column in the site
(B) The column-by-column composition of the site already available is used to
estimate the probability of finding the site at any position in each of the sequences
(C) The row-by-column composition of the site already available is used to estimate
the probability
(D) None of the above
Answer
Correct option is C
16. Naïve Bayes Algorithm is a ________ learning algorithm.
(A) Supervised
(B) Reinforcement
(C) Unsupervised
(D) None of these
Answer
Correct option is A
17. EM algorithm includes two repeated steps, here the step 2 is ______.
(A) The normalization
(B) The maximization step
(C) The minimization step
(D) None of the above
Answer
Correct option is C
18. Examples of Naïve Bayes Algorithm is/are
(A) Spam filtration
(B) Sentimental analysis
(C) Classifying articles
(D) All of the above
Answer
Correct option is D
19. In the intermediate steps of "EM Algorithm", the number of each base in each
column is determined and then converted to fractions.
(A) True
(B) False
Answer
Correct option is A
20. Naïve Bayes algorithm is based on _______ and used for solving classification
problems.
(A) Bayes Theorem
(B) Candidate elimination algorithm
(C) EM algorithm
(D) None of the above
Answer
Correct option is A
21. Types of Naïve Bayes Model:
(A) Gaussian
(B) Multinomial
(C) Bernoulli
(D) All of the above
Answer
Correct option is D
2. Disadvantages of Naïve Bayes Classifier:
(A) Naive Bayes assumes that all features are independent or unrelated, so it cannot
learn the relationship between features.
(B) It performs well in Multi-class predictions as compared to the other Algorithms.
(C) Naïve Bayes is one of the fast and easy ML algorithms to predict a class of
datasets.
(D) It is the most popular choice for text classification problems.
Answer
Correct option is A
23. The benefit of Naïve Bayes:-
(A) Naïve Bayes is one of the fast and easy ML algorithms to predict a class of
datasets.
(B) It is the most popular choice for text classification problems.
(C) It can be used for Binary as well as Multi-class Classifications.
(D) All of the above
Answer
Correct option is D
24. In which of the following types of sampling the information is carried out under
the opinion of an expert?
(A) Convenience sampling
(B) Judgement sampling
(C) Quota sampling
(D) Purposive sampling
Answer
Correct option is B
25. Full form of MDL.
(A) Minimum Description Length
(B) Maximum Description Length
(C) Minimum Domain Length
(D) None of these
Answer
Correct option is A
Unit-4
Very Short Answers
Q1. One of the major sources of data for many major companies is the device
which all of us have in our hands all the time (Smartphone/ Mobile Phones)
Q2. The world of Artificial Intelligence revolves around (Data)
True/False:
Ans Deep Learning is the most advanced form of Artificial Intelligence. In Deep
Learning, the machine is trained with huge amounts of data which helps it in training
itself around the data. Such machines are intelligent enough to develop algorithms for
themselves.
OR
Deep learning is an artificial intelligence (AI) function that imitates the workings of the
human brain in processing data and creating patterns for use in decision making.
OR
Deep learning is a subset of machine learning where artificial neural networks,
algorithms inspired by the human brain, learn from large amounts of data.
Correct option is A
12. The compactness of the bayesian network can be described by
(A) Fully structured
(B) Locally structured
(C) Partially structured
(D) All of the above
Answer
Correct option is B
13. The Expectation Maximization Algorithm has been used to identify conserved
domains in unaligned proteins only. State True or False.
(A) True
(B) False
Answer
Correct option is B
14. Which of the following is correct about the Naive Bayes?
(A) Assumes that all the features in a dataset are independent
(B) Assumes that all the features in a dataset are equally important
(C) Both
(D) All of the above
Answer
Correct option is C
15. Which of the following is false regarding EM Algorithm?
(A) The alignment provides an estimate of the base or amino acid composition of each
column in the site
(B) The column-by-column composition of the site already available is used to
estimate the probability of finding the site at any position in each of the sequences
(C) The row-by-column composition of the site already available is used to estimate
the probability
(D) None of the above
Answer
Correct option is C
16. Naïve Bayes Algorithm is a ________ learning algorithm.
(A) Supervised
(B) Reinforcement
(C) Unsupervised
(D) None of these
Answer
Correct option is
17. EM algorithm includes two repeated steps, here the step 2 is ______.
(A) The normalization
(B) The maximization step
(C) The minimization step
(D) None of the above
Answer
Correct option is C
18. Examples of Naïve Bayes Algorithm is/are
(A) Spam filtration
(B) Sentimental analysis
(C) Classifying articles
(D) All of the above
Answer
Correct option is D
19. In the intermediate steps of "EM Algorithm", the number of each base in each
column is determined and then converted to fractions.
(A) True
(B) False
Answer
Correct option is A
20. Naïve Bayes algorithm is based on _______ and used for solving classification
problems.
(A) Bayes Theorem
(B) Candidate elimination algorithm
(C) EM algorithm
(D) None of the above
Answer
Correct option is A
21. Types of Naïve Bayes Model:
(A) Gaussian
(B) Multinomial
(C) Bernoulli
(D) All of the above
Answer
Ans
Ans. Virtual Personal Assistants, Recommendation systems like Netflix, Face Apps,
Online Fraud Detection
Q11. Where do we collect data from?
Ans. Data can be collected from various sources like –
Surveys
Sensors
Observations
Web scrapping (Internet)
Interviews
Documents and records.
Oral histories
Short Answers
Q1. What is the difference between AI, Machine Learning, and Deep Learning?
Ans. A neural network is a system of programs and data structures that approximates the
functioning of the human brain. A neural network usually involves a large number of
processors operating in parallel, each having its own small sphere of knowledge and
access to data in its local memory.
A neural network is initially “trained” or fed with large amounts of data and rules about
relationships (e.g. “a grandparent is older than a person’s father”). A program can then tell
the network how to behave in response to an external stimulus (e.g. input from a computer
user interacting with the network) or it can initiate the activity itself, within limits of their
access to the external world.
Deep learning uses neural networks to learn useful representations of features directly
from data. For example, you can use a pre-trained neural network to identify and remove
artifacts such as noise from images.
Q3. What is the idea behind the GANs?
Ans. The Generative Adversarial Network (GAN) is a very popular candidate in the field of
machine learning that has showcased its potential to create realistic-looking images and
videos. GANs consist of two networks (D & G) where –
D =”discriminating” network
G = “Generative” network.
The goal is to create data: images, for example, that cannot be distinguished from actual
images. Suppose we want to create an adversarial example of a cat. Network G will
generate images. Network D will classify the images according to whether it is a cat or not.
The cost function of G will be constructed in such a way that it tries to “trick” D into always
classifying its output as cat.
Q4. What is the difference between Stochastic Gradient Descent (SGD) and Batch
Gradient Descent (BGD)?
Ans. Gradient Descent and Stochastic Gradient Descent are algorithms used in linear
regression to find the set of parameters that minimize a loss function.
Batch Gradient Descent – BGD involves MULTIPLE calculations over the full training set
at each step. It is a slower and expensive process if we have very large training data.
However, this is great for convex or relatively smooth error manifolds.
Ans. Data mining is the process of analyzing large data sets and extracting the
useful information from it. Data mining is used by companies to turn raw data into
useful information. It is an interdisciplinary subfield of computer science and
statistics with an overall goal to extract information
OR
Data mining is an automatic or semi-automatic technical process that analyses
large amounts of scattered information to make sense of it and turn it into
knowledge. It looks for anomalies, patterns or correlations among millions of
records to predict results, as indicated by the SAS institute, a world leader in
business analytics.
Example:
Price Comparison websites- They collect data about a product from different sites and
then analyze trends out of it and show up the most appropriate results.
Q7. What do you understand by Data Privacy? Discuss in detail with some
examples.
Ans. Data privacy, sometimes also referred to as information privacy, is an area of
data protection that concerns the proper handling of sensitive data
including, notably, personal data but also other confidential data, such as
certain financial data and intellectual property data, to meet regulatory requirements
as well as protecting the confidentiality and immutability of the data. It focuses on
how to collect, process, share, archive, and delete the data in accordance with the
law.
Privacy, in the broadest sense, is the right of individuals, groups, or
organizations to control who can access, observe, or use something they own, such
as their bodies, property, ideas, data, or information.
Control is established through physical, social, or informational boundaries that help
prevent unwanted access, observation, or use. For example:
A physical boundary, such as a locked front door, helps prevent others from
entering a building without explicit permission in the form of a key to unlock the
door or a person inside opening the door.
A social boundary, such as a members-only club, only allows members to access
and use club resources.
An informational boundary, such as a non-disclosure agreement, restricts what
information can be disclosed to others.
Privacy of information is extremely important in this digital age where everything is
interconnected and can be accessed and used easily. The possibilities of our private
information being extremely vulnerable are very real, which is why we require data
privacy.
Q8. AI and robotics have raised some questions regarding liability. Take for
example the scenario of an ‘autonomous’ or AI-driven robot moving through a
factory. Another robot surprisingly crosses its way and our robot draws aside
to prevent collision. However, by this manoeuvre the robot injures a person.
a) Who can be held liable for damages caused by autonomous systems?
It is actually very difficult to blame anyone in such a scenario. Here is the
situation where AI Ethics come in to the picture. Here, the choices might differ
from person to person and one must understand that nobody is wrong in this
case. Every person has a different perspective and hence he/she takes
decisions according to their moralities.
But still if someone is to be liable then it should be the programmer who has
designed the algorithm of the autonomous vehicle as he/she should have
considered all the exceptional conditions that could arise.
Ans. There are many ways to solve the gradient explosion problem. Some of the best
experimental methods are –
Redesign the network model – In deep neural networks, the gradient explosion can be
solved by redesigning the network with fewer layers. Using a smaller batch size is also
good for network training. In recurrent neural networks, updating in fewer previous time
steps during training (truncated backpropagation over time) can alleviate the gradient burst
problem.
Use the ReLU trigger function – In deep multilayer perceptron neural networks, gradient
explosion can occur due to activation functions, such as the previously popular Sigmoid
and Tanh functions. Using the ReLU trigger function can reduce gradient burst. Adopting
the ReLU trigger function is one of the most popular practices for hidden layers.
Use short and long-term memory networks – In the recurrent neural network, the
gradient explosion may be due to the instability of the training of a certain network. For
example, backpropagation over time essentially converts the recurring network into a deep
multilayer perceptron neural network. The use of short- and long-term memory units
(LSTM) and related gate-like neural structures can reduce the gradient burst problem. The
use of LSTM units is the latest best practice for sequence prediction suitable for recurrent
neural networks.
Use gradient clipping – In very deep multilayer perceptron networks with large batches
and LSTMs with long input sequences, gradient bursts can occur. If the gradient burst still
occurs, you can check and limit the size of the gradient during the training process. This
process is called gradient truncation. There is a simple and effective solution to dealing
with gradient bursts: If the gradients exceed the threshold, cut them off.
Specifically, it checks whether the value of the error gradient exceeds the threshold, and if
it exceeds it, the gradient is truncated and the gradient is set as the threshold. Gradient
truncation can alleviate the gradient burst problem to some extent (gradient truncation, i.e.
the gradient is set as a threshold before the gradient descent step).
Use weight regularization – If the gradient explosion still exists, you can try another
method, which is to check the size of the network weights and penalize the loss function
that produces a larger weight value. This process is called weight regularization and
generally uses either the L1 penalty (the absolute value of the weight) or the L2 penalty
(the square of the weight). Using L1 or L2 penalty terms for loop weights can help alleviate
gradient bursts.
Q2. What is a neural network? Explain various parts of Neural Network. How do they
learn?
Neural networks have to be “taught” in order to get started functioning and learning on
their own. They then can learn from the outputs they have put out and the information they
get in, but it has to start somewhere. There are a few processes that can be used to help
neural networks get started learning.
Training. Neural networks that are trained are given random numbers or weights to begin.
They are either supervised or unsupervised for training. Supervised training involves a
mechanism that gives the network a grade or corrections. Unsupervised training makes
the network work to figure out the inputs without outside help. Most neural networks use
supervised training to help it learn more quickly.
Transfer learning. Transfer learning is a technique that involves giving a neural network a
similar problem that can then be reused in full or in part to accelerate the training and
improve the performance on the problem of interest.
Feature extraction. Feature extraction is taking all of the data to be fed to an input,
removing any redundant data, and bundling it into more manageable segments. This cuts
down on the memory and computation power needed to run a problem through a neural
network, by only giving the network the absolutely necessary information.
ANN’s outputs aren't limited entirely by inputs and results given to them initially by an
expert system. This ability comes in handy for robotics and pattern recognition systems.
This network has the potential for high fault tolerance and is capable of debugging or
diagnosing a network on its own. ANN can go through thousands of log files from a
company and sort them out. It is presently a tedious task done by administrators.
With an enormous number of applications implementations every day, now is the most
appropriate time to know about the applications of neural networks, machine learning, and
artificial intelligence. Some of them are discussed below:
Handwriting Recognition
Neural networks are used to convert handwritten characters into digital characters that a
machine can recognize.
Stock-exchange prediction
The stock exchange is difficult to track and difficult to understand. Many factors affect the
stock market. A neural network can examine a lot of factors and predict the prices daily,
which would help stockbrokers.
Right now, it's still in an initial phase. You should know that there are over three terabytes
of data a day just from the US stock exchange. That's a lot of data to dig through, and you
have to sort it out before you start focusing on even one stock.
This type refers to finding an optimal path to travel between cities in a particular area.
Neural networks help solve the problem of providing higher revenue at minimal costs.
Logistical considerations are enormous, and here we have to find optimal travel paths for
sales professionals moving from town to town.
Image compression
The idea behind the data compression neural network is to store, encrypt, and recreate the
actual image again. We can optimize the size of our data using image compression neural
networks. It is the ideal application to save memory and optimize it.
The last section of ‘What is a neural network?’ article lets you understand the future of
neural networks.
With the way AI and machine learning is being adopted by companies today, we could see
more advancements in the applications of neural networks. There will be personalized
choices for users all over the world. All mobile and web applications try to give you an
enhanced customized experience based on your search history.
Hyper-intelligent virtual assistants will make life easier. If you have ever used Google
assistant, Siri, or any of those assistants, you can comprehend how they're slowly
evolving. They may even predict your email response in the future.
Neural networks will be a lot faster in the future, and neural network tools can get
embedded in every design surface. We already have a little mini neural network that plugs
into an inexpensive processing board, or even into your laptop. Focusing on the hardware,
instead of the software, would make devices even faster.
Neural networks will find its usage in the field of medicine, agriculture, physics,
discoveries, and everything else you can imagine. Neural networks are also used in
shared data systems.
Q4. How Do Convolutional Layers Work in Deep Learning Neural Networks?
Ans. Convolutional layers are the major building blocks used in convolutional neural
networks.
The convolutional neural network, or CNN for short, is a specialized type of neural network
model designed for working with two-dimensional image data, although they can be used
with one-dimensional and three-dimensional data.
Central to the convolutional neural network is the convolutional layer that gives the
network its name. This layer performs an operation called a “convolution“.
In the context of a convolutional neural network, a convolution is a linear operation that
involves the multiplication of a set of weights with the input, much like a traditional neural
network. Given that the technique was designed for two-dimensional input, the
multiplication is performed between an array of input data and a two-dimensional array of
weights, called a filter or a kernel.
The filter is smaller than the input data and the type of multiplication applied between a
filter-sized patch of the input and the filter is a dot product. A dot product is the element-
wise multiplication between the filter-sized patch of the input and filter, which is then
summed, always resulting in a single value. Because it results in a single value, the
operation is often referred to as the “scalar product“.
Using a filter smaller than the input is intentional as it allows the same filter (set of weights)
to be multiplied by the input array multiple times at different points on the input.
Specifically, the filter is applied systematically to each overlapping part or filter-sized patch
of the input data, left to right, top to bottom.
We can better understand the convolution operation by looking at some worked examples
with contrived data and handcrafted filters.
In this section, we’ll look at both a one-dimensional convolutional layer and a two-
dimensional convolutional layer example to both make the convolution operation concrete
and provide a worked example of using the Keras layers.
What is a 1D convolution?
Convolution op- erates on two signals (in 1D) or two images (in 2D): you can think of one
as the “input” signal (or image), and the other (called the kernel) as a “filter” on the input
image, pro- ducing an output image (so convolution takes two images as input and
produces a third as output)
We can define a one-dimensional input that has eight elements all with the value of 0.0,
with a two element bump in the middle with the values 1.0.
[0, 0, 0, 1, 1, 0, 0, 0]
The input to Keras must be three dimensional for a 1D convolutional layer.
The first dimension refers to each input sample; in this case, we only have one sample.
The second dimension refers to the length of each sample; in this case, the length is eight.
The third dimension refers to the number of channels in each sample; in this case, we only
have a single channel.
data = data.reshape(1, 8, 1)
Example of 2D Convolutional Layer
We can expand the bump detection example in the previous section to a vertical line
detector in a two-dimensional image.
Again, we can constrain the input, in this case to a square 8×8 pixel input image with a
single channel (e.g. grayscale) with a single vertical line in the middle.
[0, 0, 0, 1, 1, 0, 0, 0]
[0, 0, 0, 1, 1, 0, 0, 0]
[0, 0, 0, 1, 1, 0, 0, 0]
[0, 0, 0, 1, 1, 0, 0, 0]
[0, 0, 0, 1, 1, 0, 0, 0]
[0, 0, 0, 1, 1, 0, 0, 0]
[0, 0, 0, 1, 1, 0, 0, 0]
[0, 0, 0, 1, 1, 0, 0, 0]
The input to a Conv2D layer must be four-dimensional.
The first dimension defines the samples; in this case, there is only a single sample. The
second dimension defines the number of rows; in this case, eight. The third dimension
defines the number of columns, again eight in this case, and finally the number of
channels, which is one in this case.
Therefore, the input must have the four-dimensional shape [samples, rows, columns,
channels] or [1, 8, 8, 1] in this case.
[0, 0, 0, 1, 1, 0, 0, 0],
[0, 0, 0, 1, 1, 0, 0, 0],
[0, 0, 0, 1, 1, 0, 0, 0],
[0, 0, 0, 1, 1, 0, 0, 0],
[0, 0, 0, 1, 1, 0, 0, 0],
[0, 0, 0, 1, 1, 0, 0, 0],
[0, 0, 0, 1, 1, 0, 0, 0]]
data = asarray(data)
data = data.reshape(1, 8, 8, 1)
MCQs
A. 2
B. 3
C. 4
D. 5
Ans :
A
Explanation: There are two Artificial Neural Network topologies : FeedForward and
Feedback.
A. FeedForward ANN
B. FeedBack ANN
C. Both A and B
D. None of the
Above Ans : B
Explanation: FeedBack ANN loops are allowed. They are used in content
addressable memories.
A. Bayesian Networks
B. Belief Networks
C. Bayes Nets
D. All of the above
Ans : D
Explanation: The full form BN is Bayesian networks and Bayesian networks are also
called Belief Networks or Bayes Nets.
5. What is the name of node which take binary values TRUE (T) and FALSE (F)?
A. Dual Node
B. Binary Node
C. Two-way Node
D. Ordered
Node Ans : B
A. Linear Functions
B. Nonlinear Functions
C. Discrete Functions
D. Exponential Functions
Ans : A
Explanation: Neural networks are complex linear functions with many parameters.
A. node value
B. Weight
C. neurons
D. axon
s Ans : A
Explanation: The output at each node is called its activation or node value.
A. unidirectional
B. bidirectional
C. multidirectional
D. All of the
above Ans : A
A. Unsupervised Learning
B. Reinforcement Learning
C. Supreme Learning
D. Supervised
Learning Ans : C
A. Automotive
B. Aerospace
C. Electronics
D. All of the
above Ans : D
16. A 4-input neuron has weights 1, 2, 3 and 4. The transfer function is linear with
the constant of proportionality being equal to 2. The inputs are 4, 3, 2 and 1
respectively. What will be the output?
A. 30
B. 40
C. 50
D. 6
0 Ans :
B
Explanation: The output is found by multiplying the weights with their respective
inputs, summing the results and multiplying with the transfer function. Therefore:
Output = 2 * (1*4
+ 2*3 + 3*2 + 4*1) = 40.
17. What is back propagation?
Explanation: Back propagation is the transmission of error back through the network
to allow weights to be adjusted so that the network can learn.
18. The network that involves backward links from output to the input and hidden
layers is called
perceptron Ans : C
Explanation: RNN (Recurrent neural network) topology involves backward links from
output to the input and hidden layers.
A. 1957
B. 1958
C. 1959
D. 1960
Ans : B
Solution: (E)
1. Calculate error between the actual value and the predicted value
2. Reiterate until you find the best weights of network
3. Pass an input through the network and get values from output layer
4. Initialize random weight and bias
5. Go to each neurons which contributes to the error and change its
respective values to reduce the error
A. 1, 2, 3, 4, 5
B. 5, 4, 3, 2, 1
C. 3, 2, 1, 5, 4
D. 4, 3, 1, 5, 2
Solution: (D)
A. True
B. False
Solution: (B)
A. Bagging
B. Boosting
C. Stacking
D. None of these
Solution: (A)
In training a neural network, you notice that the loss does not decrease in the
few starting epochs.
A. 1 and 2
B. 2 and 3
C. 1 and 3
D. Any of these
Solution: (D)
Unit 5
(a) [1 point] We can get multiple local optimum solutions if we solve a linear
regression problem by
True False
Solution:
False
(b) [1 point] When a decision tree is grown to full depth, it is more likely to fit
the noise in the data.
True False
Solution:
True
(c) [1 point] When the hypothesis space is richer, over fitting is more likely.
True False
Solution:
True
(d) [1 point] When the feature space is larger, over fitting is more likely.
True False
Solution:
True
(e) [1 point] We can use gradient descent to learn a Gaussian Mixture Model.
True False
Solution:
True
Ques. What is reinforcement?
Ans.
Reinforcement Learning is a part of machine learning. Here, agents are self-trained
on reward and punishment mechanisms. It's about taking the best possible action
or path to gain maximum rewards and minimum punishment through observations in
a specific situation. It acts as a signal to positive and negative behaviours
Ans. Beyond the agent and the environment, one can identify four main subelements
of a reinforcement learning system: a policy, a reward function, a value function,
and, optionally, a model of the environment. A policy defines the learning agent's
way of behaving at a given time.
Short Answers
Ques. Explain Reinforcement Learning. Explain with the help of an example
Ans.
Reinforcement learning Supervised learning
Example: Object
Example: Chess game recognition
Ques5. List Various Applications of Reinforcement Learning
Ans. 1. Rocket engineering – Explore how reinforcement learning is used in the
field of rocket engine development. You’ll find a lot of valuable information on the use
of machine learning in manufacturing industries. See why reinforcement learning is
favored over other machine learning algorithms when it comes to manufacturing
rocket engines.
2. Traffic Light Control – This site provides multiple research papers and project
examples that highlight the use of core reinforcement learning and deep
reinforcement learning in traffic light control. It has tutorials, datasets, and relevant
example papers that use RL as a backbone so that you can make a new finding of
your own.
3. Marketing and advertising – See how to make an AI system learn from a pre-
existing dataset which may be infeasible or unavailable, and how to make AI learn in
real-time by creating advertising content. This is where they have made use of
reinforcement learning.
8. NLP – This article shows the use of reinforcement learning in combination with
Natural Language Processing to beat a question and answer adventure game. This
example might be an inspiration for learners engaged in Natural Language
Processing and gaming solutions.
Long Answers
Ans. Let’s say that a robot has to cross a maze and reach the end point. There
are mines, and the robot can only move one tile at a time. If the robot steps onto a
mine, the robot is dead. The robot has to reach the end point in the shortest time
possible.
The scoring/reward system is as below:
1. The robot loses 1 point at each step. This is done so that the robot takes the
shortest path and reaches the goal as fast as possible.
2. If the robot steps on a mine, the point loss is 100 and the game ends.
4. If the robot reaches the end goal, the robot gets 100 points.
Now, the obvious question is: How do we train a robot to reach the end goal with
the shortest path without stepping on a mine?
In the Q-Table, the columns are the actions and the rows are the states.
Each Q-table score will be the maximum expected future reward that the robot will
get if it takes that action at that state. This is an iterative process, as we need to
improve the Q-Table at each iteration.
Using the above function, we get the values of Q for the cells in the table.
When we start, all the values in the Q-table are zeros.
In our robot example, we have four actions (a=4) and five states (s=5). So we will
build a table with four columns and five rows.
So now the concept of exploration and exploitation trade-off comes into play. This
article has more details.
We’ll use something called the epsilon greedy strategy.
In the beginning, the epsilon rates will be higher. The robot will explore the
environment and randomly choose actions. The logic behind this is that the robot
does not know anything about the environment.
As the robot explores the environment, the epsilon rate decreases and the robot
starts to exploit the environment.
During the process of exploration, the robot progressively becomes more confident
in estimating the Q-values.
For the robot example, there are four actions to choose from: up, down, left, and
right. We are starting the training now — our robot knows nothing about the
environment. So the robot chooses a random action, say right.
We can now update the Q-values for being at the start and moving right using the
Bellman equation.
power = +1
mine = -100
end = +100
We will repeat this again and again until the learning is stopped. In this way the Q-
Table will be updated.
Ques2. What is Deep Reinforcement Learning and Autoencoder Architecture.
Explain Face recognition Application.
Autoencoder Architecture
You’re used to unlocking your door with a key, but maybe not with your face. As
strange as itsounds, our physical appearances can now verify payments, grant
access and improve existing security systems. Protecting physical and digital
possessions is a universal concernwhich benefits everyone, unless you’re a
cybercriminal or a kleptomaniac of course. Facialbiometrics are gradually being
applied to more industries, disrupting design, manufacturing,construction, law
enforcement and healthcare. How is facial recognition software affecting these
different sectors, and who are the companies and organisations behind its
development?
1. Payments
It doesn’t take a genius to work out why businesses want payments to be easy.
Onlines hopping and contactless cards are just two examples that demonstrate the
seamlessness of postmodern purchases. With FaceTech, however, customers
wouldn’t even need their cards. In 2016, MasterCard launched a new selfie pay app
called MasterCard Identity Check. Customers open the app to confirm a payment
using their camera, and that’s that. Facial recognition is already used in store and at
ATMs, but the next step is to do the same for online payments. Chinese ecommerce
firm Alibaba and affiliate payment software Alipay are planning to apply the software
to purchases made over the Internet.
Ans. Machines learn differently than people. For instance, you probably didn’t learn
the difference between a positive and a negative movie review by analyzing tens of
thousands of labeled examples of each. There is, however, a specific subfield of
machine learning that bears a striking resemblance to aspects of how we learn.
Reinforcement learning (RL) is a field that’s been around for a few decades. Lately,
it’s been picking up steam thanks to its integration of deep neural networks (deep
reinforcement learning) and the newsworthy successes it’s accumulated as a result.
At its core though, RL is concerned with how to go about making decisions and taking
sequential actions in a specific environment to maximize a reward. Or, to put a more
personal spin on it, what steps should you take to get promoted at your job, or to
improve your fitness, or to save money to buy a house? We tend to figure out an
optimal approach to accomplish goals like these through some degree of trial and
error, evolving our strategies based on feedback from our environment.
At a basic level, RL works in much the same way. Of course, backed by computing
power, it can explore different strategies (or “policies” in the RL literature) much faster
than we can, often with pretty impressive results (especially for simple environments).
On the other hand, lacking the prior knowledge that humans bring to new situations
and environments, RL approaches also tend to need to explore many more policies
than a human would before finding an optimal one.
As reinforcement learning is a broad field, let’s focus on one specific aspect: model-
based reinforcement learning. As we’ll see, model-based RL attempts to overcome
the issue of a lack of prior knowledge by enabling the agent — whether this agent
happens to be a robot in the real world, an avatar in a virtual one, or just a piece
software that take actions — to construct a functional representation of its
environment.
While model-based reinforcement learning may not have clear commercial
applications at this stage, its potential impact is enormous. After all, as AI becomes
more complex and adaptive — extending beyond a focus on classification and
representation toward more human-centered capabilities — model-based RL will
almost certainly play an essential role in shaping these frontiers.
“The next big step forward in AI will be systems that actually understand their worlds.
The world is only accessed through the lens of experience, so to understand the
world means to be able to predict and control your experience, your sense data, with
some accuracy and flexibility. In other words, understanding means forming a
predictive model of the world and using it to get what you want. This is model-based
reinforcement learning.”
“Model” is one of those terms that gets thrown around a lot in machine learning (and
in scientific disciplines more generally), often with a relatively vague explanation of
what we mean. Fortunately, in reinforcement learning, a model has a very specific
meaning: it refers to the different dynamic states of an environment and how these
states lead to a reward.
Imagine you’re visiting a city that you’ve never been to before and for whatever
reason you don’t have access to a map. You know the general direction from your
hotel to the area where most of the sights of interest are, but there are quite a number
of different possible routes, some of which lead you through a slightly dangerous
neighborhood.
One navigational option is to keep track of all the routes you’ve taken (and the
different streets and landmarks that make up these routes) to begin to create a map
of the area. This map would be incomplete (it would only rely on where you’d already
walked), but would at least allow you to plan a course ahead of time to avoid that
neighborhood while still optimizing for the most direct route. You could even spend
time back in your hotel room drawing out the different possible itineraries on a sheet
of paper and trying to gauge which one seems like the best overall option. You can
think of this as a model-based approach.
Another option — especially if you’re the type of person who’s not big on planning —
would simply be to keep track of the different locations you’d visited (intersections,
parks, and squares for instance) and the actions you took (which way you turned), but
ignore the details of the routes themselves. In this case, whenever you found yourself
in a location you’d already visited, you could favor the directional choice that led to a
good outcome (avoiding the dangerous neighborhood and arriving at your destination
more efficiently) over the directions that led to a negative outcome. You wouldn’t
specifically know the next location you’d arrive at with each decision, but you would at
least have learned a simple procedure for what action to take given a specific
location. This is essentially the approach that model-free RL takes.
As it relates to specific RL terms and concepts, we can say that you, the urban
navigator, are the agent; that the different locations at which you need to make a
directional decision are the states; and that the direction you choose to take from
these states are the actions. The rewards (the feedback based on the agent’s
actions) would most likely be positive anytime an action both got you closer to your
destination and avoided the dangerous neighborhood, zero if you avoided the
neighborhood but failed to get closer to your destination, and negative anytime you
failed to avoid the neighborhood. The policy is whatever strategy you use to
determine what action/direction to take based on your current state/location. Finally,
the value is the expected long-term return (the sum of all your current and future
rewards) based on your current state and policy.
A) Supervised learning
B) unsupervised learning
C) Reinforcement learning
D) None of these
A) Supervised learning
B) unsupervised learning
C) Reinforcement learning
D) None of these
3.In_________ output depends on the state of the current input and the next input
depends on the output of the previous input.
A) Supervised learning
B) unsupervised learning
C) Reinforcement learning
D) None of these
A) negetive
B) positive
C) neutral
D) None of these
ANSWER= B) positive
ANSWER= B) 2
A) Maximizes Performance
B) Sustain Change for a long period of time
C) Too much Reinforcement can lead to overload of states which can
diminish the results
D) None of these
ANSWER= C) Too much Reinforcement can lead to overload of states which can
diminish the results
A) 5
B) 4
C) 2
D) 3
ANSWER= D) 3
A) Supervised learning
B) Unsupervised learning
C) Reinforcement Learning
D) None of the above
A) clustering
B) reinforcement learning
C) semi supervised
D) reinforcement
A) regression
B) reinforcement learning
C) semi supervised
D) classification
A) Unsupervised learning
B) reinforcement learning
C) semi supervised
D) classification
_____processes all the training examples for each iteration of gradient descent.
ANSWER= B) 3
_____is a type of gradient descent which processes 1 training example per iteration.
Which Gradient descent works for larger training examples and that too with lesser
number of iterations.?
A) global maximum
B) global minimum
C) local minimum
D) local maximum
A) Information Gain
B) bagging
C) Entropy
D) none of these
ANSWER=B) bagging
A) Machine Learning
B) bagging
C) Entropy
D) Ensemble learning
A) hard voting
B) soft voting
C) both A and B
D) None of these
In ________the predicted output class is a class with the highest majority of votes
A) hard voting
B) soft voting
C) both A and B
D) None of these
A) hard voting
B) soft voting
C) both A and B
D) None of these
of Macros
Problem
The support staff found it difficult to search or discover these macros for offering
timely customer help, which they believed to be negatively affecting their customer
satisfaction scores on responses to questions about standardized tests. Part of this
searchability problem was the enormous number of macros which took a lot of time
to search through and manage.
Actions Taken
DigitalGenius then automatically suggested the most relevant macros for new
customer inquiries so the support team member does not spend time searching for
macros or manually composing new responses to common customer inquiries.
DigitalGenius claims that its AI Platform achieves this automatic macro suggestion
by using deep learning models to extract the meaning and context of incoming
inquiries and predicting the expected response. In addition, the platform has a
historical response search feature, which the support staff can access.
When asked about this historical response search feature, …. Of DigitalGenius told
us:
“The historical responses feature looked for historical tickets in which customers
asked similar questions to the one agents were working on. We built the search
ourselves, using our own search algorithms. And beyond that, we have a different
UI to the Zendesk search, including the ability for historical response searches to
take place in the app sidebar, so agents don’t have to navigate away from the
page.
The coolest feature is that we prioritized historical tickets that had the highest
CSAT. When agents searched from historical responses, we displayed to them
whether that ticket got a high or low CSAT rating….so we think our feature
promoted the best answers.”
The platform also reportedly predicts the relevant metadata about the case, such as
tags, inquiry type, priority and other case details. With this information, it is able to
analyze and route cases to the appropriate team. For example, if the incoming query
is an account inquiry, the platform routes the request to the community support team,
and for in-depth educational queries, it routes the requests to remote tutors –
eliminating the need for a “human filter” to handle all tickets.
A
screenshot from Zendesk’s “apps support” page for DigitalGenius. The full set of
screenshots and integration details can be found
at: https://www.zendesk.com/apps/support/digitalgenius/
Results
According to DigitalGenius, about 83% of all customer tickets are supported by the
Digital Genius platform integrated with Magoosh’s Zendesk. The company also
claims a 92% accuracy in case tag predictions (tags are used within Zendesk for
case categorization – for example, “refunds” might be a tag for that particular kind of
customer issue). This improvement happened over an initial 6-month period with
Magoosh, which Juan describes as a “learning segment” – stating that new and
updated projects and underway with Magoosh now.
Asked for clarification on what it means to have 83% of messages “supported” by
DigitalGenius, Juan replied:
“Supported in this context means that DigitalGenius AI has assisted Magoosh with
83% of their tickets – whether this is classifying them, suggesting the right macro
or automating a response. The remaining 17% escaped our current AI capabilities,
and had be dealt with manually in order to provide the best possible answer and
avoid a potential wrong answer to a customer.”
Case Study
The client is a US health center. Since the COVID-19 outbreak, the center has been
bursting past capacity. To combat the virus, they decided to go for biometric face
recognition time attendance software.
The COVID-19 pandemic has amplified deficiencies of the client’s health center. To
combat the virus, they have enhanced precautions and provided the front-line care
team with personal protective equipment (PPE). Cleaning and disinfecting touched
surfaces lowered the chance of the virus spreading but didn’t solve the problem.
To get a consultation on facial recognition time clock software, they contacted InData
Labs, an AI and facial recognition service provider.
Solution: face recognition time attendance management system against the COVID-
19
The client emphasized that they needed a real-time smart attendance system using
face recognition techniques. The key focus should be on masked face recognition
because the healthcare team at the center is required to wear masks.
Then, we collected 800+ health center employees’ images, sorted and labelled them
with names. The camera at the center’s entrance captured face data and sent it to
the server for image processing – detection, encoding, and recognition.
Our team researched the latest studies and decided to use AI and ML for face mask
detection and recognition.