Ai and Machine Learning For Business

Unit 1
Very Short Answers
Q.1. What is data?
Ans. Data is a single piece of information. It may consist of numbers, word as

measurements or observations of a set of variables. Data is a lowest level of abstraction
from which information and knowledge are derived.
Q.2. What do you mean by information?
Ans. Information is a processed data which has been placed in a meaningful and useful
context for an end user. It is placed in a proper context for a human user.
Q.3. What are the features of information?
Ans. The features of information include:
1. Timeliness, 2. Accuracy,
3. Appropriateness, 4. Frequency,
5. Conciseness, 6. Relevancy,
7. Understandability, 8. Completeness, 9. Economical.
Q.4. Define the term-information system.
Ans. Information system is a set of people, procedures and resources that collects,

transforms and disseminates information in an organization. It is a set of organized
procedures that when executed provide information for decision-making and control of the
organization.”
What do you mean by artificial Intelligence?
Ans. Artificial intelligence is a field of science and technology science, biology,

psychology, linguistics, mathematics and eng.
Q.5. What are neural networks?
Ans. Neural networks are computing systems modeled after the brain’s mesh like network
of interconnected process Elements called neurons. These are very simpler in
architecture.
Q.6. What do you mean by fuzzy logical control system?
Ans. Fuzzy logical control system is a vary off reaching
Information conclusions based on ambiguous or vague7 of reaching conclusions based on

ambiguous or vague of information. It is a mathematical method of handling imprecise or
subjective information.
Q.7. What are the main goals of AI?
Ans. The basic objective of AI (also called heuristic programming, machine intelligence, or
the simulation of cognitive behaviour) is to enable computers to perform such intellectual
tasks as decision making, problem solving, perception, understanding human
communication (in any language, and translate among them), and the like. Proof of this
objective is the blind test suggested by Alan Turing in the 1930s: if an observer who
cannot see the actors (computer and human) cannot tell the difference between them, the
objective is satisfied.
Q.8. What is genetic algorithm?
Ans. A genetic algorithm is a search heuristic that is inspired by Charles Darwin's
theory of natural evolution. A genetic algorithm is a search-based algorithm used for
solving optimization problems in machine learning. This algorithm is important
because it solves difficult problems that would take a long time to solve.
Q.9. What is artificial neural network?
Ans. The artificial neural network is designed by programming computers to behave

simply like interconnected brain cells. There are around 1000 billion neurons in the
human brain.
Q.10. What exactly is machine learning?
Ans. Machine learning is a method of data analysis that automates analytical model
building. It is a branch of artificial intelligence based on the idea that systems can learn
from data, identify patterns and make decisions with minimal human intervention.
Q.11. How does Netflix use machine learning?
Ans. Netflix uses an ML technology called a “recommendation engine” to suggest

shows and movies to you and other users. As the name suggests, a recommendation
system recommends products and services to users based on available data.
Q.12. What is meant by KDD?
Ans. Knowledge discovery in databases (KDD) is the process of discovering useful

knowledge from a collection of data. This widely used data mining technique is a process
that includes data preparation and selection, data cleansing, incorporating prior knowledge
on data sets and interpreting accurate solutions from the observed results.
Short Answers
Q.1. What do you understand by Information?
Ans. Information: Processed data which is used to trigger certain actions or gain better
understanding of what the data implies is called information.
‘Information is a data that has been processed into a form which is meaningful to the
recipients and is of real or perceived value in the current or the prospective actions or
decisions of the recipient. -Davis and Olson
This recognizes both the value of information in taking a particular decision and in affecting
the decisions or actions which are to be taken in the future. The resources of information
are reusable and they does not lose their value even after the information has been
retrieved and used. Information can be processed and even can be used to draw
generalized conclusions or knowledge. It can be said as a type of organized data.
Q.2. What is the difference between data and Information?
Ans. Difference between Data and Information
S.N
Data Information
o.
Data refers to detailed facts about Information refers to only those events
1.
any event. which are concerned with entity.
Data is generally disorganized and Information is properly arranged, classified

2.
disintegrated in form. and organized..
3. Data is raw in form. Information is in finished form.
4. Data cant be understood by users. Information can be understood by users.
Data does not depend upon

5. Information is based upon data.
information.
Q.3. What is the difference between knowledge representation and knowledge

acquisition?
Ans. Knowledge Representation in AI describes the representation of knowledge.

Basically, it is a study of how the beliefs, intentions, and judgments of an intelligent agent
can be expressed suitably for automated reasoning. One of the primary purposes of
Knowledge Representation includes modeling intelligent behavior for an agent.
Knowledge Representation and Reasoning (KR, KRR) represents information from the real
world for a computer to understand and then utilize this knowledge to solve complex real-
life problems like communicating with human beings in natural language. Knowledge
representation in AI is not just about storing data in a database, it allows a machine to
learn from that knowledge and behave intelligently like a human being.
The different kinds of knowledge that need to be represented in AI include:
Objects
Events
Performance
Facts
Meta-Knowledge
Knowledge-base
Knowledge acquisition is the gathering or collecting knowledge from various sources. It

is the process of adding new knowledge to a knowledge base and refining or improving
knowledge that was previously acquired. Acquisition is the process of expanding the
capabilities of a system or improving its performance at some specified task. So it is the
goal oriented creation and refinement of knowledge. Acquired knowledge may consist of
facts, rules, concepts, procedures, heuristics, formulas, relationships, statistics or any
other useful information. Source of these knowledges may be experts in the domain of
interest, text books, technical papers, database reports, journals and the environments.
The knowledge acquisition is a continuous process and is spread over entire lifetime.
Example of knowledge acquisition is machine learning. It may be process of autonomous
knowledge creation or refinements through the use of computer programs.
Knowledge Acquisition Techniques
Many techniques have been developed to deduce knowledge from an expert. They are
termed as
knowledge acquisition techniques. They are:
a) Diagram Based Techniques
b) Matrix Based Techniques
c) Hierarchy-Generation Techniques
d) Protocol Analysis Techniques
e) Protocol Generation Techniques
f) Sorting Techniques
Q.4. Discuss some advantages and disadvantages of neural networks.
Ans. The following are some of the advantages of neural networks:
 Neural networks are flexible and can be used for both regression and classification
problems. Any data which can be made numeric can be used in the model, as
neural network is a mathematical model with approximation functions.
 Neural networks are good to model with nonlinear data with large number of inputs;
for example, images. It is reliable in an approach of tasks involving many features. It
works by splitting the problem of classification into a layered network of simpler
elements.
 Once trained, the predictions are pretty fast.
 Neural networks can be trained with any number of inputs and layers.
 Neural networks work best with more data points.
Let us take a look at some of the cons of neural networks:
 Neural networks are black boxes, meaning we cannot know how much each
independent variable is influencing the dependent variables.
 It is computationally very expensive and time consuming to train with traditional

CPUs.
 Neural networks depend a lot on training data. This leads to the problem of over-
fitting and generalization. The mode relies more on the training data and may be
tuned to the data.
Q.5. What exactly is machine learning?
Ans. Machine learning is a method of data analysis that automates analytical model
building. It is a branch of artificial intelligence based on the idea that systems can learn
from data, identify patterns and make decisions with minimal human intervention.
What's required to Did you know?
create good machine
learning systems?  In machine learning, a target is
called a label.
 Data preparation
capabilities.  In statistics, a target is called a
dependent variable.
 Algorithms – basic and
advanced.  A variable in statistics is called
a feature in machine learning.
 Automation and
iterative processes.  A transformation in statistics is
called feature creation in
 Scalability. machine learning.
 Ensemble modeling.
Q.6. How are decision trees used in business?
Ans. A decision tree is a mathematical model used to help managers make decisions.
 A decision tree uses estimates and probabilities to calculate likely outcomes.
 A decision tree helps to decide whether the net gain from a decision is worthwhile.
Let's look at an example of how a decision tree is constructed. We'll use the following data:
A decision tree starts with a decision to be made and the options that can be taken. Don't
forget that there is always an option to decide to do nothing!
The first task is to add possible outcomes to the tree (note: circles represent uncertain
outcomes)
Next we add in the associated costs, outcome probabilities and financial results for each
outcome.
These probabilities are particularly important to the outcome of a decision tree.
Probability is
 The percentage chance or possibility that an event will occur
 Ranges between 1 (100%) and 0
 If all the outcomes of an event are considered, the total probability must add up to 1
Finally we complete the maths in the model by calculating:
Expected value:
The financial value of an outcome calculated by multiplying the estimated financial effect

by its probability
Net gain:
The value to be gained from taking a decision.
Net gain is calculated by adding together the expected value of each outcome and
deducting the costs associated with the decision.
BENEFITS OF USING DECISION TREES
 Choices are set out in a logical way
 Potential options & choices are considered at the same time
 Use of probabilities enables the “risk” of the options to be addressed
 Likely costs are considered as well as potential benefits
 Easy to understand & tangible results
DRAWBACKS OF USING DECISION TREES
 Probabilities are just estimates – always prone to error
 Uses quantitative data only – ignores qualitative aspects of decisions
 Assignment of probabilities and expected values prone to bias
 Decision-making technique doesn’t necessarily reduce the amount of risk

Q.7. What is decision tree in management?
Ans. Decision trees are useful tools, particularly for situations where financial data and
probability of outcomes are relatively reliable. They are used to compare the costs and
likely values of decision pathways that a business might take. They often include decision
alternatives that lead to multiple possible outcomes, with the likelihood of each outcome
being measured numerically.
A decision tree is a branched flowchart showing multiple pathways for potential decisions
and outcomes. The tree starts with what is called a decision node, which signifies that a
decision must be made.
An example of a decision tree
From the decision node, a branch is created for each of the alternative choices under
consideration. The initial decision might lead to another decision, in which case a new
decision node is created and new branches are added to show each alternative pathway
for the new decision. The result is a series of decision pathways. The flowchart might
include only one or two decisions with only one or two alternatives, or it can become a
complex sequence of many decisions with many alternatives at each node.
Along the decision pathway, there is usually some point at which a decision leads to an
uncertain outcome. That is, a decision could result in multiple possible outcomes, so an
uncertainty node is added to the tree at that point. Branches come from that uncertainty
node showing the different possible outcomes.
Eventually, each pathway reaches a final outcome. The decision tree, then, is a
combination of decision nodes, uncertainty nodes, branches coming from each of these
nodes, and final outcomes as the result of the pathways.
How to Make Calculations with a Decision Tree
Even in only this simple form, a decision tree is useful to show the possibilities for a
decision. However, a decision tree becomes especially useful when numerical data is
added.
First, each decision usually involves costs. If a company decides to produce a product,
engage in market research, advertise, or any other number of activities, the predicted
costs for those decisions are written on the appropriate branch of the decision tree. Also,
each pathway eventually leads to an outcome that usually results in income. The predicted
amount of income provided by each outcome is added to that branch of the decision tree.
The other numerical data that needs to be provided is the probability of each outcome from
the uncertainty nodes. If an uncertainty node has two branches that are both equally likely,
each should be labelled with a 50 percent, or 0.5, probability. Alternatively, an uncertainty
node might have three branches with respective probabilities of 60 percent, 30 percent,
and 10 percent. In each case, the total of the percentages at each uncertainty node will be
100 percent, representing all possibilities for that node.
With this numerical data, decision makers can calculate the likely return value for each
decision pathway. The value of each final outcome must be multiplied by the probability
that the outcome occurs. The total of the possibilities along each branch represents the
total predicted value for that decision pathway. The costs involved in that decision pathway
must be subtracted to see the final profit that pathway represents.
Long Answer
Q.1. What is Artificial Intelligence all about? Why AI is important for business?
Ans. Artificial intelligence (AI) makes it possible for machines to learn from experience,
adjust to new inputs and perform human-like tasks. Most AI examples that you hear about
today – from chess-playing computers to self-driving cars – rely heavily on deep learning
and natural language processing. Using these technologies, computers can be trained to
accomplish specific tasks by processing large amounts of data and recognizing patterns in
the data.
Artificial Intelligence History
The term artificial intelligence was coined in 1956, but AI has become more popular today
thanks to increased data volumes, advanced algorithms, and improvements in computing
power and storage.
Early AI research in the 1950s explored topics like problem solving and symbolic methods.
In the 1960s, the US Department of Defense took interest in this type of work and began
training computers to mimic basic human reasoning. For example, the Défense Advanced
Research Projects Agency (DARPA) completed street mapping projects in the 1970s. And
DARPA produced intelligent personal assistants in 2003, long before Siri, Alexa or Cortana
were household names.
This early work paved the way for the automation and formal reasoning that we see in
computers today, including decision support systems and smart search systems that can
be designed to complement and augment human abilities.
While Hollywood movies and science fiction novels depict AI as human-like robots that
take over the world, the current evolution of AI technologies isn’t that scary – or quite that
smart. Instead, AI has evolved to provide many specific benefits in every industry. Keep
reading for modern examples of artificial intelligence in health care, retail and more.
1950s–1970s 1980s–2010s Present Day
Neural Networks Machine Learning Deep Learning

Early work with neural Machine learning becomes Deep learning breakthroughs drive AI
networks stirs excitement popular boom.
for “thinking machines.”
Importance of AI
 AI automates repetitive learning and discovery through data. Instead of

automating manual tasks, AI performs frequent, high-volume, computerized tasks.
And it does so reliably and without fatigue. Of course, humans are still essential to
set up the system and ask the right questions.
 AI adds intelligence to existing products. Many products you already use will be
improved with AI capabilities, much like Siri was added as a feature to a new
generation of Apple products. Automation, conversational platforms, bots and smart
machines can be combined with large amounts of data to improve many
technologies. Upgrades at home and in the workplace, range from security
intelligence and smart cams to investment analysis.
 AI adapts through progressive learning algorithms to let the data do the

programming. AI finds structure and regularities in data so that algorithms can
acquire skills. Just as an algorithm can teach itself to play chess, it can teach itself
what product to recommend next online. And the models adapt when given new
data.
 AI analyzes more and deeper data using neural networks that have many hidden
layers. Building a fraud detection system with five hidden layers used to be
impossible. All that has changed with incredible computer power and big data. You
need lots of data to train deep learning models because they learn directly from the
data.
 AI achieves incredible accuracy through deep neural networks. For example,

your interactions with Alexa and Google are all based on deep learning. And these
products keep getting more accurate the more you use them. In the medical field, AI
techniques from deep learning and object recognition can now be used to pinpoint
cancer on medical images with improved accuracy.
 AI gets the most out of data. When algorithms are self-learning, the data itself is
an asset. The answers are in the data. You just have to apply AI to find them. Since
the role of the data is now more important than ever, it can create a competitive
advantage. If you have the best data in a competitive industry, even if everyone is
applying similar techniques, the best data will win.
Q.2. How to Define and Execute Your Data and AI Strategy
Ans. Many companies are currently investing in data and artificial intelligence (AI). Since
the terminology varies, the activities may be called AI, advanced analytics, data science, or
machine learning, but the goals are the same: to increase revenues and efficiency in
current business and to develop new data-enabled offerings. In addition, many companies
see an increasing responsibility to contribute their AI expertise toward humanitarian and
social matters. It is well understood that to stay competitive in the digital economy, the
company’s internal processes and products need to be smart—and smartness comes from
data and AI.
As a result of increased data and AI awareness, many established companies have

commenced targeted data and AI programs with big expectations to turn around the
business and attract star talent. However, a couple of years into the programs, many show
signs of fatigue and unmet expectations, with senior managers and leaders unhappy about
the speed of progress. According to a new study, 70% of companies globally are currently
working on getting the first AI deployment operational (Schmetzer, 2020). Pilots have been
made in selected areas and even data-enabled products may have been launched, but the
desired large-scale business transformation has not taken place.
The reality is that there are no shortcuts. Amazon, Google, Apple, and Facebook all used
very different business strategies to gain their current market dominance and global
influence, but their common success is arguably due to their foresight in understanding the
value of data and positioning themselves early. They worked from the inside out, placing
continuous emphasis on human capability building, alongside developing, testing, and
deploying the top technologies internally, so that they could offer the best to their
customers. For established, non-digital companies the road is even rockier. Old
companies have established ways of working, digitally immature people, and legacy
infrastructure.
Setting the Data and AI Vision
Economic benefits of AI expected by various industries and countries are assumed to be

high.
The premise for successful data and AI strategy is to know your business goals. What are
your must-win battles? Where do you need to succeed in the future? Access to data will
help in the definition of business priorities, but it is important to remember that data and AI
will not solve your issues in business models, products, and services. Proper uses of data
and AI will help you make more informed decisions, obtain information faster, automate
processes, and enable delivery faster than a human mind—but they will not construct or
replace the lack of business vision and ideas.
Once you have a solid understanding of the data and AI use cases that help your current
business, new data-driven business opportunities should be investigated. These include
data as a business (e.g., selling data) and data partnerships (where new offerings are
created by pooling data from several organizations). Neither topic is easy, but the
opportunities are worth looking into.
Data Management and Data Governance
The availability of high-quality data is the foundation for successful, productized AI. Data
can be called an asset if it is structured according to the FAIR principles (Findable–
Accessible–Interoperable–Reusable) as suggested by the European Commission (2018).
Data that resides in various systems, in different formats and ontologies, or misses key
attributes (such as unique identifiers), is not an asset. If the data asset is not reusable,
every data science/AI activity will be a separate, possibly large IT exercise. The principle of
‘build once—use many’ is pivotal for maximizing the value of data assets. For example, for
the personalization of an online service, you might want to use behavioral data from the
online and mobile channels, Customer Relationship Management (CRM) data, and
consumer online and offline transactions—not only data from the online service itself. The
goal of a productized data asset is to support all use cases.
Solution Architecture and Technology
Solution architecture and technology refer to the technical side of the data asset
management. Apart from digital native companies, existing companies typically have
plenty of legacy infrastructure. After defining the business & AI vision and conducting data
due diligence, the next step is to have an experienced data and solution architect take a
critical look into the current technical architecture and define the target architecture and its
development roadmap. This task, too, should follow the end-to-end use case logic
accounting for data collection from operating systems (e.g., CRM, Enterprise Resource
Planning (ERP)), data warehouses, cloud environments, analytical environments, and
business-interfacing systems.
Data and AI Protection, Privacy, and Regulation
Data protection and privacy is of key interest to consumers and those with access to
consumer data. Data protection relates to data collection, processing, and utilization.
According to the General Data Protection Regulation (GDPR) of the European Union
(European Parliament and the Council, 2016), the legitimate interest of data processing
must be defined, and the user informed about the collection, processing, and combination
of their data. The user must be offered mechanisms to opt out and object to data
processing. The level of user identification between data flows between different data-
processing systems must be defined.
Human Skills
The data and AI journey requires new roles in an organization. While the exact role
terminology varies, data and AI roles are needed for four different levels of business
processes:
1. Business units and business functions (e.g. sales, marketing, finance)

2. Data science (and business intelligence)
3. Data management
4. Data platforms and technical solutions
Data and AI Organization
The optimal data and AI organization structure depends on the overall company size and
organization, culture, the level of AI maturity, and the type of data/AI tasks.
To get things going, establishing a center of excellence (CoE) generally helps to bring
focus to the topic. Depending on where the CoE sits in a company, it will be responsible
for different areas. The CoE may consist of data science and business intelligence teams
only, while the technical teams (data engineering, platforms) reside in IT. Alternatively, the
CoE may cover the technology side, while the data scientists sit in business units. The
optimal setup needs to be carefully analyzed. In our experience, most companies will
benefit from a common technical infrastructure and data asset management, as well as
some form of centralized data science team, which solves the most difficult use cases and
creates a scalable AI portfolio for the use of all business units and functions. The AI
strategists should optimally sit within business units to drive the AI use cases forward, but
in the beginning, they can also reside with the data science teams and help business from
there.
Operating Model
A closely related topic to data and AI organization is the operating model between different
business units. Prioritized business use cases should drive the development of specific
data and AI capabilities as identified within the initial strategic assessment. In order to
have the data experts work on the most important use cases, business leaders should
establish an AI steering group or include the data and AI development into the existing
leadership team meetings.
The head of the CoE (chief data and AI officer) should drive the agenda in the meetings. In
addition to a cross-unit steering group, individual use-case areas should have their own,
operational steering groups.
For the first years of the CoE, we have seen that it often makes sense to centralize
budgets. Budgets drive prioritization, and without a centralized budget, data and AI
activities will not scale up. Typically, individual business units and functions do not want to
carry the costs for companywide capability building (e.g., common data models,
infrastructure, application programming interfaces) even if it would be optimal for the whole
company.
Data Science and Machine Learning/AI Algorithms
Like the data asset, algorithms can also be treated as the algorithm asset. That means
that over time, the portfolio of machine-learning/AI algorithms will become FAIR. Every
new analytical modeling exercise does not need to start from scratch, but builds on top of
tested code. This will make the data science team more efficient over time. Like software
coding teams, it requires the data science team to use common code repositories and
standards.
It is also important to establish maintenance processes for the data and algorithm assets.
If maintenance processes remain undeployed, development teams remain in a state of
stagnation as their efforts go into keeping the current assets in production. By applying
maintenance processes to data and algorithm portfolios, new solutions can be discovered
and developed.
Q3 What is Genetic algorithm? Explain the working of Genetic Algorithm.
Ans: A genetic algorithm is a search-based algorithm used for solving optimization

problems in machine learning. This algorithm is important because it solves difficult
problems that would take a long time to solve. It has been used in various real-life
applications such as data centers, electronic circuit design, code-breaking, image
processing, and artificial creativity.
This article will take the reader through the basics of this algorithm and explains how it
works. It also explains how it has been applied in various fields and highlights some of its
limitations.
Genetic algorithm (GA) explained
The following are some of the basic terminologies that can help us to understand genetic
algorithms:
 Population: This is a subset of all the probable solutions that can solve the given
problem.
 Chromosomes: A chromosome is one of the solutions in the population.
 Gene: This is an element in a chromosome.
 Allele: This is the value given to a gene in a specific chromosome.
 Fitness function: This is a function that uses a specific input to produce an

improved output. The solution is used as the input while the output is in the form of
solution suitability.
 Genetic operators: In genetic algorithms, the best individuals mate to reproduce
an offspring that is better than the parents. Genetic operators are used for changing
the genetic composition of this next generation.
A genetic algorithm (GA) is a heuristic search algorithm used to solve search and
optimization problems. This algorithm is a subset of evolutionary algorithms, which are
used in computation. Genetic algorithms employ the concept of genetics and natural
selection to provide solutions to problems.
These algorithms have better intelligence than random search algorithms because they

use historical data to take the search to the best performing region within the solution
space.
GAs are also based on the behavior of chromosomes and their genetic structure. Every
chromosome plays the role of providing a possible solution. The fitness function helps in
providing the characteristics of all individuals within the population. The greater the
function, the better the solution.
Advantages of genetic algorithm
 It has excellent parallel capabilities.
 It can optimize various problems such as discrete functions, multi-objective

problems, and continuous functions.
 It provides answers that improve over time.
 A genetic algorithm does not need derivative information.
How genetic algorithms work
Genetic algorithms use the evolutionary generational cycle to produce high-quality

solutions. They use various operations that increase or replace the population to provide
an improved fit solution.
Genetic algorithms follow the following phases to solve complex optimization problems:
Initialization
The genetic algorithm starts by generating an initial population. This initial population
consists of all the probable solutions to the given problem. The most popular technique for
initialization is the use of random binary strings.
Fitness assignment
The fitness function helps in establishing the fitness of all individuals in the population. It
assigns a fitness score to every individual, which further determines the probability of
being chosen for reproduction. The higher the fitness score, the higher the chances of
being chosen for reproduction.
Selection
In this phase, individuals are selected for the reproduction of offspring. The selected
individuals are then arranged in pairs of two to enhance reproduction. These individuals
pass on their genes to the next generation.
The main objective of this phase is to establish the region with high chances of generating
the best solution to the problem (better than the previous generation). The genetic
algorithm uses the fitness proportionate selection technique to ensure that useful solutions
are used for recombination.
Reproduction
This phase involves the creation of a child population. The algorithm employs variation
operators that are applied to the parent population. The two main operators in this phase
include crossover and mutation.
1. Crossover: This operator swaps the genetic information of two parents to

reproduce an offspring. It is performed on parent pairs that are selected randomly to
generate a child population of equal size as the parent population.
2. Mutation: This operator adds new genetic information to the new child population.
This is achieved by flipping some bits in the chromosome. Mutation solves the
problem of local minimum and enhances diversification. The following image shows
how mutation is done.
Image Source
Replacement
Generational replacement takes place in this phase, which is a replacement of the old
population with the new child population. The new population consists of higher fitness
scores than the old population, which is an indication that an improved solution has been
generated.
Termination
After replacement has been done, a stopping criterion is used to provide the basis for
termination. The algorithm will terminate after the threshold fitness solution has been
attained. It will identify this solution as the best solution in the population.
Q.4. Explain the relationship between various branches of AI. How does machine
learning work?
Ans. The words data science and machine learning are often used in conjunction,
however, if you are planning to build a career in one of these, it is important to know the
differences between machine learning and data science.
Before doing so, we need to understand a few important terms that are related but
different.
AI (Artificial intelligence) – AI or machine intelligence refers to the intelligent decisions

made by machines at par with their human counterparts. It is a study where we enable
machines to learn through experience and make it intelligent enough to perform human-
like tasks.
Machine learning – Think of ML as a subset of AI. Same way as humans learns with
experience, machines can learn with data (experience) rather than just following simple
instructions. This is called as machine learning. Machine learning uses 3 types of
algorithms – supervised, unsupervised and reinforced.
Deep learning – Deep learning is a part of Machine learning, which is based on artificial
neural networks (think of neural networks similar to our own human brain). Unlike machine
learning, deep learning uses multiple layers and structures algorithms such that an artificial
neural network is created that learns and makes decisions on its own!
Big Data – Humongous sets of data that can be computationally analysed to understand
and process trends, patterns and human behaviour.
Data Science – How is all the big data analysed? Fine, the machine learns on its own
through machine learning algorithms – but how? Who gives the necessary inputs to a
machine for creating algorithms and models? No points for guessing that it is data science.
Data Science is a uses different methods, algorithms, processes, and systems to extract,
analyse and get insights from data.
If we were to see the relationship between all the above in a simple diagram, this is how it
would look like –
Artificial Intelligence (AI)

Artificial Intelligence includes both Machine learning and Data science which are
correlated. Thus, data science is also a part (the most popular and most important one) of
AI.
As we see above, Data science and machine learning are closely related and provide
useful insights and generate the necessary trends or ‘experience’. In both, we use
supervised methods of learning i.e. learning from huge data sets.
Working of Machine Learning
There are different types of machine learning algorithms, the most common being
clustering, matrix factorization, content-based, recommendations, collaborative filtering
and so on. Machine learning involves the 5 basic steps –
The huge set of data that we receive in the first step is split into the training set and testing
set and the model is built and test using the training set. A significant portion of data is
used for training purposes so that different conditions of input and output can be achieved
and the model built is closest to the required result (recommendation, human behaviour,
trends, etc…).
Once built, the model is tested for efficiency and accuracy using the test data so that it can
be cross-validated.
As we can see, Machine Learning comes into picture only during the data modelling phase
of the Data Science lifecycle. Data Science thus contains machine learning.
Q.5. Which is better data science or machine learning?

Ans. Here is a side by side comparison for easy reference and a quick recap of all that we
have discussed and derived so far –
Data-Science Machine Learning
It is an interdisciplinary field where It is a part of data science where tools and

unstructured data is cleaned, filtered, techniques are used to create algorithms
analyzed and business innovations are so that the machine can learn from data
churned out of the result. via experience.
It comes only in the data modeling stage of

It has a vast scope
data science.
Data science can work with manual Machine learning cannot exist without data
methods as well though they are not as science as data has to be first prepared to
efficient as machine algorithms create, train and test the model.
Data science helps define new problems The problem is already known and tools
that can be solved using machine learning and techniques are used to find an
techniques and statistical analysis. intelligent solution.
Knowledge of SQL is not necessary.

Knowledge of SQL is necessary to perform
Programs are written in languages like R,
operations on data.
Python, Java, Lisp etc…
Machine learning is a single step in data

science that uses the other steps of data
Data science is a complete process.
science to create the best suitable
algorithm for predictive analysis.
Machine learning is a subset of AI and also

a connection between AI and data science
Data science is not a subset of AI.
since it evolves as more and more data is
processed.
How to choose between Data Science and Machine learning?
Well, you cannot choose one. Both Data Science and Machine learning go hand in hand.
Machines cannot learn without data and Data Science is better done with machine
learning as we have discussed above. In the future, data scientists will need at least a
basic understanding of machine learning to model and interpret big data that is generated
every single day.
Q.6. What are support vector machines used for? List Various applications of SVM.
Ans. Support vector machines (SVMs) are a set of supervised learning methods used
for classification, regression and outliers’ detection.
The advantages of support vector machines are:
 Effective in high dimensional spaces.

 Still effective in cases where number of dimensions is greater than the number of
samples.
 Uses a subset of training points in the decision function (called support vectors), so it
is also memory efficient.
 Versatile: different Kernel functions can be specified for the decision function.
Common kernels are provided, but it is also possible to specify custom kernels.
The disadvantages of support vector machines include:
 If the number of features is much greater than the number of samples, avoid over-
fitting in choosing Kernel functions and regularization term is crucial.
 SVMs do not directly provide probability estimates, these are calculated using an
expensive five-fold cross-validation .
SVMs depends on supervised learning algorithms. The aim of using SVM is to correctly

classify unseen data. SVMs have a number of applications in several fields.
Some common applications of SVM are-
 Face detection – SVMc classify parts of the image as a face and non-face and create a
square boundary around the face.
 Text and hypertext categorization – SVMs allow Text and hypertext categorization for
both inductive and transudative models. They use training data to classify documents
into different categories. It categorizes on the basis of the score generated and then
compares with the threshold value.
 Classification of images – Use of SVMs provides better search accuracy for image
classification. It provides better accuracy in comparison to the traditional query-based
searching techniques.
 Bioinformatics – It includes protein classification and cancer classification. We use

SVM for identifying the classification of genes, patients on the basis of genes and other
biological problems.
 Protein fold and remote homology detection – Apply SVM algorithms for protein
remote homology detection.
 Handwriting recognition – We use SVMs to recognize handwritten characters used

widely.
 Generalized predictive control(GPC) – Use SVM based GPC to control chaotic

dynamics with useful parameters.
MCQ’s
1. What is Machine Learning (ML)?
A. The autonomous acquisition of knowledge through the use of manual

programs
B. The selective acquisition of knowledge through the use of computer

programs
C. The selective acquisition of knowledge through the use of manual programs
D. The autonomous acquisition of knowledge through the use of computer

programs
Correct option is D
2. Father of Machine Learning (ML)
A. Geoffrey Chaucer
B. Geoffrey Hill
C. Geoffrey Everest Hinton
D. None of the above
Correct option is C
3. Which is FALSE regarding regression?
A. It may be used for interpretation

B. It is used for prediction
C. It discovers causal relationships
D. It relates inputs to outputs
Correct option is C
4. Choose the correct option regarding machine learning (ML) and artificial
intelligence (AI)
A. ML is a set of techniques that turns a dataset into a software
B. AI is a software that can emulate the human mind
C. ML is an alternate way of programming intelligent machines
D. All of the above
Correct option is D
5. Which of the factors affect the performance of the learner system does not
include?
A. Good data structures
B. Representation scheme used
C. Training scenario
D. Type of feedback
Correct option is A
6. In general, to have a well-defined learning problem, we must identity which of

the following
A. The class of tasks
B. The measure of performance to be improved
C. The source of experience
D. All of the above
Correct option is D
7. Successful applications of ML
A. Learning to recognize spoken words
B. Learning to drive an autonomous vehicle
C. Learning to classify new astronomical structures

D. Learning to play world-class backgammon
E. All of the above
Correct option is E
8. Which of the following does not include different learning methods
A. Analogy
B. Introduction
C. Memorization
D. Deduction
Correct option is B
9. In language understanding, the levels of knowledge that does not include?
A. Empirical
B. Logical
C. Phonological
D. Syntactic
Correct option is A
10. Designing a machine learning approach involves:-
A. Choosing the type of training experience
B. Choosing the target function to be learned
C. Choosing a representation for the target function
D. Choosing a function approximation algorithm
E. All of the above
Correct option is E
11. Concept learning inferred a valued function from training

examples of its input and output.
A. Decimal
B. Hexadecimal
C. Boolean
D. All of the above

Correct option is C
12. Which of the following is not a supervised learning?
A. Naive Bayesian
B. PCA
C. Linear Regression
D. Decision Tree Answer
Correct option is B
13. What is Machine Learning?
 Artificial Intelligence
 Deep Learning
 Data Statistics
A. Only (i)
B. (i) and (ii)
C. All
D. None
Correct option is B
14. What kind of learning algorithm for “Facial identities or facial expressions”?
A. Prediction
B. Recognition Patterns
C. Generating Patterns
D. Recognizing Anomalies Answer
Correct option is B
15. Which of the following is not type of learning?
A. Unsupervised Learning
B. Supervised Learning
C. Semi-unsupervised Learning
D. Reinforcement Learning
Correct option is C
16. Real-Time decisions, Game AI, Learning Tasks, Skill Aquisition, and Robot
Navigation are applications of which of the folowing
A. Supervised Learning: Classification
B. Reinforcement Learning
C. Unsupervised Learning: Clustering
D. Unsupervised Learning: Regression
Correct option is B
17. Targetted marketing, Recommended Systems, and Customer Segmentation

are applications in which of the following
A. Supervised Learning: Classification
B. Unsupervised Learning: Clustering
C. Unsupervised Learning: Regression
Correct option is B
18. Fraud Detection, Image Classification, Diagnostic, and Customer Retention

are applications in which of the following
A. Unsupervised Learning: Regression
B. Supervised Learning: Classification
C. Unsupervised Learning: Clustering
Correct option is B
19. Which of the following is not function of symbolic in the various function
representation of Machine Learning?
A. Rules in propotional Logic
B. Hidden-Markov Models (HMM)
C. Rules in first-order predicate logic
D. Decision Trees
Correct option is B
20. Which of the following is not numerical functions in the various function
representation of Machine Learning?
A. Neural Network
B. Support Vector Machines
C. Case-based
D. Linear Regression
Correct option is C
21. FIND-S Algorithm starts from the most specific hypothesis and generalize it
by considering only
A. Negative
B. Positive
C. Negative or Positive
Correct option is B
22. FIND-S algorithm ignores
A. Negative
B. Positive
C. Both
Correct option is A
23. The Candidate-Elimination Algorithm represents the .
A. Solution Space
B. Version Space
C. Elimination Space
D. All of the above
Correct option is B
24. Inductive learning is based on the knowledge that if something happens a lot
it is likely to be generally
A. True
B. False Answer
Correct option is A
25. Inductive learning takes examples and generalizes rather than starting
with
A. Inductive
B. Existing
C. Deductive
D. None of these
Correct option is B
26. A drawback of the FIND-S is that it assumes the consistency within the
training set
A. True
B. False
Correct option is A
27. What strategies can help reduce overfitting in decision trees?
 Enforce a maximum depth for the tree
 Enforce a minimum number of samples in leaf nodes
 Pruning
 Make sure each leaf node is one pure class
A. All
B. (i), (ii) and (iii)
C. (i), (iii), (iv)
D. None
Correct option is B
28. Which of the following is a widely used and effective machine learning
algorithm based on the idea of bagging?
A. Decision Tree
B. Random Forest
C. Regression
D. Classification
Correct option is B
29. To find the minimum or the maximum of a function, we set the gradient to
zero because which of the following
A. Depends on the type of problem
B. The value of the gradient at extrema of a function is always zero
C. Both (A) and (B)
D. None of these
Correct option is B
30. Which of the following is a disadvantage of decision trees?
A. Decision trees are prone to be overfit
B. Decision trees are robust to outliers
C. Factor analysis
Correct option is A
31. What is perceptron?
A. A single layer feed-forward neural network with pre-processing
B. A neural network that contains feedback
C. A double layer auto-associative neural network
D. An auto-associative neural network
Correct option is A
32. Which of the following is true for neural networks?
 The training time depends on the size of the
 Neural networks can be simulated on a conventional
 Artificial neurons are identical in operation to biological
A. All
B. Only (ii)
C. (i) and (ii)

D. None
Correct option is C
Unit 2
Q.1 What is supervised learning in simple words?
Ans. Supervised learning, also known as supervised machine learning, is a subcategory

of machine learning and artificial intelligence. It is defined by its use of labeled
datasets to train algorithms that to classify data or predict outcomes accurately.
Q2. Where is supervised learning used?
Ans. Supervised learning is typically done in the context of classification, when we want to
map input to output labels, or regression, when we want to map input to a continuous
output.
Q3. What are the types of supervised learning?
Ans. Different Types of Supervised Learning
 Regression. In regression, a single output value is produced using training data.
 Classification. It involves grouping the data into classes.
 Naive Bayesian Model.
 Random Forest Model.
 Neural Networks.
 Support Vector Machines.
Q4. What is a multivariate regression model?
Ans. As the name implies, multivariate regression is a technique that estimates a single
regression model with more than one outcome variable. When there is more than one
predictor variable in a multivariate regression model, the model is a multivariate multiple
regression.
Q5. What is model evaluation metrics?

Ans. the evaluation metrics for evaluating the performance of a machine learning model,
which is an integral component of any data science project. It aims to estimate the
generalization accuracy of a model on the future (unseen/out-of-sample) data.
Q6. What is decision tree and example?
Ans. Decision trees help you to evaluate your options. Decision Trees are excellent
tools for helping you to choose between several courses of action. They provide a highly
effective structure within which you can lay out options and investigate the possible
outcomes of choosing those options.
Q7. What is the final objective of decision tree?
Ans. As the goal of a decision tree is that it makes the optimal choice at the end of each
node it needs an algorithm that is capable of doing just that. That algorithm is known as
Hunt’s algorithm, which is both greedy, and recursive. Greedy meaning that at step it
makes the most optimal decision and recursive meaning it splits the larger question into
smaller questions and resolves them the same way.
Q8. What is purity?
Ans. The decision to split at each node is made according to the metric called purity. A
node is 100% impure when a node is split evenly 50/50 and 100% pure when all of its data
belongs to a single class.
Q9. What is the difference between supervised and unsupervised machine learning?
Ans. Supervised learning is requiring training labelled data’s. For example, in order to the
classification (a supervised learning task), you’ll need to the first label the data you’ll use to
the train the model to classify data into your labelled groups. Unsupervised learning, in
contrast, does not a require labelling data explicitly.
Q10. List some disadvantages of supervised learning.
Ans. Disadvantages of Supervised Learning.
 Computation time is vast for supervised learning.
 Unwanted data downs efficiency.
 Pre-processing of data is no less than a big challenge.
 Always in need of updates.
 Anyone can overfit supervised algorithms easily.

Short Answers
Q1. What are the different types of Learning/ Training models in ML?
Ans. ML algorithms can be primarily classified depending on the presence/absence of

target variables.
A. Supervised learning: [Target is present]

The machine learns using labelled data. The model is trained on an existing data set
before it starts making decisions with the new data.
The target variable is continuous : Linear Regression, polynomial Regression, quadratic
Regression.
The target variable is categorical : Logistic regression, Naive Bayes, KNN, SVM,
Decision Tree, Gradient Boosting, ADA boosting, Bagging, Random forest etc.
B. Unsupervised learning: [Target is absent]

The machine is trained on unlabelled data and without any proper guidance. It
automatically infers patterns and relationships in the data by creating clusters. The
model learns through observations and deduced structures in the data.
Principal component Analysis, Factor analysis, Singular Value Decomposition etc.
C. Reinforcement Learning:

The model learns through a trial and error method. This kind of learning involves an
agent that will interact with the environment to create actions and then discover errors
or rewards of that action.
Q2. What is KNN algorithm in machine learning?
Ans. K-Nearest Neighbors (K-NN)
k-NN is a supervised algorithm used for classification. What this means is that we have
some labeled data upfront which we provide to the model for it to understand the dynamics
within that data i.e. train. It then uses those learnings to make inferences on the unseen
data i.e. test. In the case of classification this labeled data is discrete in nature.
Steps
1. Decide on your similarity or distance metric.
2. Split the original labeled dataset into training and test data.
3. Pick an evaluation metric.
4. Decide upon the value of k. Here k refers to the number of closest

neighbors we will consider while doing the majority voting of target labels.
5. Run k-NN a few times, changing k and checking the evaluation measure.
6. In each iteration, k neighbors vote, majority vote wins and becomes the
ultimate prediction
7. Optimize k by picking the one with the best evaluation measure.
8. Once you’ve chosen k, use the same training set and now create a new
test set with the people’s ages and incomes that you have no labels for,
and want to predict.
Q3. Why do we use Regression Analysis? Name various type of regression
Ans. Regression analysis helps in the prediction of a continuous variable. There are
various scenarios in the real world where we need some future predictions such as
weather condition, sales prediction, marketing trends, etc., for such case we need some
technology which can make predictions more accurately. So for such case we need
Regression analysis which is a statistical method and used in machine learning and data
science. Below are some other reasons for using Regression analysis:
o Regression estimates the relationship between the target and the independent variable.
o It is used to find the trends in data.
o It helps to predict real/continuous values.
o By performing the regression, we can confidently determine the most important factor,
the least important factor, and how each factor is affecting the other factors.
Types of Regression
There are various types of regressions which are used in data science and machine
learning. Each type has its own importance on different scenarios, but at the core, all the
regression methods analyze the effect of the independent variable on dependent
variables.
Here some important types of regression which are given below:
o Linear Regression
o Logistic Regression
o Polynomial Regression
o Support Vector Regression
o Decision Tree Regression
o Random Forest Regression
o Ridge Regression
o Lasso Regression
Q4. What do you understand by logistic regression?
Ans. Logistic Regression:
o Logistic regression is another supervised learning algorithm which is

used to solve the classification problems. In classification
problems, we have dependent variables in a binary or discrete
format such as 0 or 1.
o Logistic regression algorithm works with the categorical variable such

as 0 or 1, Yes or No, True or False, Spam or not spam, etc.
o It is a predictive analysis algorithm which works on the concept of probability.
o Logistic regression is a type of regression, but it is different from the

linear regression algorithm in the term how they are used.
o Logistic regression uses sigmoid function or logistic function which

is a complex cost function. This sigmoid function is used to model the
data in 34 logistic regression. The function can be
represented as:
o f(x)= Output between the 0 and 1 value.
o x= input to the function
o e= base of natural logarithm.
When we provide the input values (data) to the function, it gives the S-curve as follows:
o It uses the concept of threshold levels, values above the threshold

level are rounded up to 1, and values below the threshold level are
rounded up to 0.
There are three types of logistic regression:
o Binary(0/1, pass/fail)
o Multi(cats, dogs, lions)
o Ordinal(low, medium, high)

Long Answers
Q1. What is supervised learning? Explain is types, also explain some of the
important use cases of Supervised learning.
Ans. Supervised learning is the process of training an algorithm to map an input to a

specific output. In this method, developers select the kind of information to feed within the
algorithms to get the desired results. The algorithms get both inputs & outputs. Then the
next step is creating rules that map the inputs with outputs. The training process continues
until the highest level of performance is achievable.
Supervised learning is of two types – regression and classification.
Regression Model
Regression identifies the patterns in the sample data and predicts continuous outcomes.
This algorithm understands the numbers, values, correlations, and groupings. This model
is best for the prediction of products and stocks.
Regression models are of two types – Linear and Logistic regressions.
In linear regression, the algorithms assume that there lies a linear relationship between
two variables, input (X) and output (Y). The input variable is an independent variable,
whereas the output variable is a dependent variable. It uses the function, calculates, and
plots the input to a continuous value for output.
In logistic regression, the algorithms predict the discrete values for the set of
independent variables that it has on the list. The algorithm predicts the probability of the
new data so that the output ranges between 0 and 1.
Classification Model
In the
classification technique, the input data is labeled based on historical data. These
algorithms are specially trained to identify particular types of objects. Processing and
analyzing the labeled sample data, weather forecasting, identifying pictures is simple.
Some of the popular classification models are – Decision Trees, Naive Bayes
Classifiers, and Random Forests.
In Decision Trees, the classifiers are references to feature values. It uses a tree-like

model of decisions and their consequences. It’s an algorithm that only contains conditional
control statements. Every branch in the decision tree symbolizes a feature of the dataset.
In Naive Bayes Classifiers, the algorithms assume that all the datasets are independent
of each other. It works on large datasets and uses Direct Acyclic Graph (DAG) for
classification purposes. Naive Bayes is suitable for solving multi-class prediction models.
It’s quick and easy to save a lot of your time and handle complex data.
In Random Forests, the algorithm creates decision trees on data samples and then gets
the prediction for each try until it selects the best solutions. It is an advanced version of
decision trees because it reduces the overfitting cons of decision trees by averaging the
result.
In Neural Networks, the algorithms get designed to cluster raw input and recognize
patterns. Neural networks require advanced computational resources. It gets complicated
when there are multiple observations. In other words, data scientists call it ‘black-box’
algorithms.
In the Support Vector Method (SVM), the algorithm separates hyperplanes as

discriminative classifiers. SVM is closely related to kernel networks, and its output is in the
form of an optimal hyperplane, best for two-group classification problems.
Supervised Learning Use Cases
Supervised learning has many applications across industries and one of the best
algorithms for finding more accurate results. Here is a list of well-known applications of
supervised learning.
Spam detection – supervised learning methods have immense use of detecting mail,

whether it is spam or not. Using different keywords and content, it recognizes and sends a
specific email to the relevant categorical tabs or into the spam category.
Bioinformatics – one of the best applications of bioinformatics is the storage of biological

information of human beings. That includes – fingertips, iris textures, eyes, swabs, and so
on. All the smart devices are capable of storing fingerprints so that every time you want to
unlock your devices, it asks to authenticate either through fingertips or facial recognition.
Object Recognitions – one of the popular applications is Recatch (prove you are not a
robot.) It is where you have to choose multiple images as per the instruction to get
confirmed that you are a human. You can only access if you can identify correctly, or else
you have to keep on trying to get the correct identifications.
Q2. Explain Support Vector Machine. List their types also.
Ans. Support Vector Machine or SVM is one of the most popular Supervised Learning
algorithms, which is used for Classification as well as Regression problems. However,
primarily, it is used for Classification problems in Machine Learning. The goal of the SVM
algorithm is to create the best line or decision boundary that can segregate n-dimensional
space into classes so that we can easily put the new data point in the correct category in
the future. This best decision boundary is called a hyperplane.
SVM chooses the extreme points/vectors that help in creating the hyperplane. These
extreme cases are called as support vectors, and hence algorithm is termed as Support
Vector Machine. Consider the below diagram in which there are two different categories that
are classified using a decision boundary or hyperplane:
Example: SVM can be understood with the example that we have used in the KNN classifier.
Suppose we see a strange cat that also has some features of dogs, so if we want a model
that can accurately identify whether it is a cat or dog, so such a model can be created by
using the SVM algorithm. We will first train our model with lots of images of cats and dogs
so that it can learn about different features of cats and dogs, and then we test it with this
strange creature. So as support vector creates a decision boundary between these two
data (cat and dog) and choose extreme cases (support vectors), it will see the extreme case
of cat and dog. On the basis of the support vectors, it will classify it as a cat. Consider the
below diagram:
SVM algorithm can be used for Face detection, image classification, text categorization, etc.
Types of SVM
SVM can be of two types:
o Linear SVM: Linear SVM is used for linearly separable data, which means if a
dataset can be classified into two classes by using a single straight line, then such data is
termed as linearly separable data, and classifier is used called as Linear SVM classifier.
o Non-linear SVM: Non-Linear SVM is used for non-linearly separated

data, which means if a dataset cannot be classified by using a straight line, then such data is
termed as non-linear data and classifier used is called as Non-linear SVM classifier.
Q3. How supervised learning works? Explain some challenges in Supervised

learning. List some examples of supervise learning.
Ans. Supervised learning uses a training set to teach models to yield the desired output.
This training dataset includes inputs and correct outputs, which allow the model to learn
over time. The algorithm measures its accuracy through the loss function, adjusting until
the error has been sufficiently minimized.
Supervised learning can be separated into two types of problems when data mining—
classification and regression:
 Classification uses an algorithm to accurately assign test data into specific

categories. It recognizes specific entities within the dataset and attempts to draw
some conclusions on how those entities should be labeled or defined. Common
classification algorithms are linear classifiers, support vector machines (SVM),
decision trees, k-nearest neighbor, and random forest, which are described in
more detail below.
 Regression is used to understand the relationship between dependent and

independent variables. It is commonly used to make projections, such as for sales
revenue for a given business. Linear regression, logistical regression, and
polynomial regression are popular regression algorithms.
Although supervised learning can offer businesses advantages, such as deep data
insights and improved automation, there are some challenges when building sustainable
supervised learning models. The following are some of these challenges:
 Supervised learning models can require certain levels of expertise to structure

accurately.
 Training supervised learning models can be very time intensive.
 Datasets can have a higher likelihood of human error, resulting in algorithms

learning incorrectly.
 Unlike unsupervised learning models, supervised learning cannot cluster or

classify data on its own.
Supervised learning examples
Supervised learning models can be used to build and advance a number of business
applications, including the following:
 Image- and object-recognition: Supervised learning algorithms can be used to

locate, isolate, and categorize objects out of videos or images, making them
useful when applied to various computer vision techniques and imagery analysis.
 Predictive analytics: A widespread use case for supervised learning models is in

creating predictive analytics systems to provide deep insights into various
business data points. This allows enterprises to anticipate certain results based on
a given output variable, helping business leaders justify decisions or pivot for the
benefit of the organization.
 Customer sentiment analysis: Using supervised machine learning algorithms,

organizations can extract and classify important pieces of information from large
volumes of data—including context, emotion, and intent—with very little human
intervention. This can be incredibly useful when gaining a better understanding of
customer interactions and can be used to improve brand engagement efforts.
 Spam detection: Spam detection is another example of a supervised learning

model. Using supervised classification algorithms, organizations can train
databases to recognize patterns or anomalies in new data to organize spam and
non-spam-related correspondences effectively.
Case Study1
In November 2016, Tech Emergence published the results of a small survey among

artificial intelligence experts to outline low-hanging-fruit applications in machine learning
for medium and large companies. While there were only 26 respondents who could vote
multiple times, they confirmed what was evident already.
Please note that the survey covered both supervised and unsupervised learning. While
supervised learning covers the lion’s share of ML applications, in data security the
unsupervised style is dominant.
Solution:
Interestingly, the groups used by Tech Emergence provide only a vague understanding of
how use cases are distributed among different machine learning tasks. For example, Big
Data can be applied to any of the mentioned groups, given that the algorithms process
large and poorly structured datasets, regardless of the industry and operations field this
data comes from. Also, sales tasks usually intersect marketing ones when it comes to
analytics. That’s why we suggest a slightly different breakdown of the most common use
cases.
Marketing and Sales
Digital marketing and online-driven sales are the first application fields that you may think
of for machine learning adoption. People interact with the web and leave a detailed
footprint to be analyzed. While there are tangible results in unsupervised learning
techniques for marketing and sales, the largest value impact is in the supervised learning
field. Let’s have a look.
Lifetime Value. A customer lifetime value that we mentioned before is usually measured
in the net profit this customer brings to a company in the longer run. If you’ve been
tracking most of your customers and accurately documenting their in-funnel and further
purchase behavior, you have enough data to make predictions about most budding
customers early and target sales effort toward them.
Churn. The churn rate defines the number of customers who cease to complete target
actions (e.g. add to cart, leave a comment, checkout, etc.) during a given period. Similar to
lifetime value predictions, sorting “likely-to-churn-soon” from engaged customers will allow
you to 1) analyze the reasons for such behavior, 2) refocus and personalize offerings for
different groups of churning customers.
Sentiment analysis. Skimming through thousands of feedback posts in social media and
comments sections is painstaking work, especially in B2C after a new product or feature
rollout. Sentiment analysis backed by natural language processing allows for aggregating
and yielding analytics on customer feedback. You may play with sentiment analysis
using Google Cloud Natural Language API to understand how this works and what kinds
of analytics may be available.
Here’s how the API analyzes an angry comment by a person who purchased HTC Vive, a
virtual reality headset, on Amazon. Score defines sentiment itself, ranging from very
negative to very positive. Magnitude shows the strength of a sentiment regardless of its
score.
Recommendations. Recommendation sections are something we can’t imagine modern

eCommerce or media without. The common practice is to recommend other popular
products or the ones you want to sell most. It doesn’t require machine learning algorithms
at all. But if you want to engage customers with deep personalization, you can apply
machine learning techniques to define the products that this customer is most likely to buy
next and put them on top of the recommendation list. Also, Netflix, YouTube, and other
video streaming services operate in similar way, tailoring their recommendations to a
viewer’s lifetime behavior.
People analytics
Tracking internal operations to get insights is also a powerful task for machine learning.
Most digitalized companies today have enough employee tracking software and historic
data to make predictions on employee performance, retention, and other fundamental
problems of human resource management.
Sales performance. Is there a way to understand why one middle-level sales executive
brings twice as much lead conversion than another middle-level exec sitting in the same
office? Technically, they both send emails, set calls, and participate in conferences, which
somehow result in conversions or lack thereof. Any time we talk about what drives
salespeople performance, we make assumptions prone to bias. A good example of ML
use here is People.ai, a startup which tries to address the problem by tracking all the sales
data, including emails, calls, and CRM interactions to use this data as a supervised
learning set and predict which kinds of actions bring better results. Basically, the algorithm
aids in developing a playbook for sales reps based on successful cases.
Retention. Similar tracking techniques, the use of text sentiment and other metadata
analysis (from emails and social media posts) can be applied to detect possible job-
hopping behavior among candidates.
Human resource allocation. You can use historic data from HR software – sick days,
vacations, holidays, etc. – to make broader predictions on your
workforce. Deloitte disclosed that a number of automotive companies are learning from the
patterns of unscheduled absences to forecast the periods when people are likely to take a
day off and reserve more workforce.
Time-series market forecasting
Time-series forecasting is a specific branch of machine learning and statistics that

addresses predicting time-dependent events. These may be seasonal or cyclic fluctuations
in any market figures. In the general case, time-series forecasting considers such time-
dependent changes as holidays, seasons, or other events that impact sales, prices, and
customer activities. Check our time-series forecasting story to learn more.
Currently, time-series data can be applied both for internal use to have better planning
capabilities and for customer-facing applications as well. For instance, eCommerce
websites may be interested in tracking time-series data related to Black Friday to better set
discount campaigns and drive more sales. As for the example of customer-facing use,
AltexSoft helped Fareboom.com, airfare provider, build a price-prediction feature that
allows Fareboom customers to choose the best time to purchase their tickets.
Source: Fareboom.com
Security
As we mentioned above, most cyber-security techniques revolve around unsupervised

learning, especially the methods that address anomaly detection, i.e. finding outlying data
items that may pose a threat. However, there are several use cases where mostly
supervised learning is used.
Spam filtering. According to Statista, 56.87 percent of all emails were spam in March
2017. This number actually keeps dropping – in April 2014 the share of spam was 71.1
percent – as increasingly more email services have adopted spam-filtering algorithms
backed by ML models. The abundance of spam examples provides enough both textual
and metadata to sort out this type of correspondence.
Malicious emails and links. Detecting phishing attacks becomes critical for all IT
departments in organizations, considering the recent case of the Petya virus, which was
distributed among corporate infrastructures through email attachments. Currently, there
are many public datasets that provide labeled records of malware or even URLs that can
be used directly to build classifying models to protect your organization.
Fraud detection. As fraudulent actions are very domain-specific, they mostly rely on
private datasets that organizations have. For example, many banks that have fraud cases
in their data use supervised fraud detection techniques to block potentially fraudulent
money transactions accounting for such variables as transaction time, location, money
amounts, etc.
Asset maintenance and IoT
Digitalization goes beyond internal IT infrastructures only. As corporate assets become

smart with the Internet-of-Things surge, various smart sensors can gather and stream
asset data directly to private or public clouds where it can be centralized and further used
for resource management and supply chain optimization.
Logistics. Settling logistics scenarios is a very dynamic task, as managers should account

for delivery time, budget, weather factors, driver’s personal characteristics, and other
changing data. Given that supply chain management is a common problem for most
businesses that have physical assets, the datasets are already there. So, building AI-
backed recommendation systems is another opportunity that can be adopted with relative
ease.
Outage predictions. Another on-surface opportunity is to use the history of machinery

outages to predict failures early. Complex ML algorithms can draw predictions based on
unobvious factors that humans may not detect. This allows for providing timely
maintenance for lower cost. And this approach fits well for the industries where asset
management is highly regulated – like air travel – and assets are usually over-maintained
to comply with security protocols.
Entertainment
Last but not the least in the group of supervised machine learning use cases is the
entertainment field, where users are directly interacting with algorithms. These can run the
gamut from face recognition and different visual alterations to turning camera pictures
into artwork-style images.
This path usually belongs to AI startups that plan acquisition and ship software that can be
embedded in other large market products. That’s exactly what happened to MSQRD, a
video filter app, that was acquired by Facebook. MSQRD was developed in three months.
MCQ’s
1. What is classification?
a) when the output variable is a category, such as “red” or “blue” or “disease” and “no
disease”.
b) when the output variable is a real value, such as “dollars” or “weight”.
Ans: Solution A
2. What is regression?
a) When the output variable is a category, such as “red” or “blue” or “disease” and “no
disease”.
b) When the output variable is a real value, such as “dollars” or “weight”.
Ans: Solution B
3. What is supervised learning?

a) All data is unlabelled and the algorithms learn to inherent structure from the input data
b) All data is labelled and the algorithms learn to predict the output from the input data
c) It is a framework for learning where an agent interacts with an environment and receives
a reward for each interaction
d) Some data is labelled but most of it is unlabelled and a mixture of supervised and
unsupervised techniques can be used.
Ans: Solution B
4. What is Unsupervised learning?

Ans: Solution A
5. What is Semi-Supervised learning?

Ans: Solution D
6. What is Reinforcement learning?
Ans: Solution C
7. Sentiment Analysis is an example of:

a)Regression,
b)Classification
c)Clustering
d)Reinforcement Learning
Options:
A. 1 Only
B. 1 and 2
C. 1 and 3
D. 1, 2 and 4
Ans : Solution D
8. The process of forming general concept definitions from examples of concepts to be

learned.
a) Deduction
b) abduction
c) induction
d) conjunction
Ans : Solution C
9. Computers are best at learning

a) facts.
b) concepts.
c) procedures.
d) principles.
Ans : Solution A
10. Data used to build a data mining model.

a) validation data
b) training data
c) test data
d) hidden data
Ans : Solution B
11. Supervised learning and unsupervised clustering both require at least one
a) hidden attribute.
b) output attribute.
c) input attribute.
d) categorical attribute.
Ans : Solution A
12. Supervised learning differs from unsupervised clustering in that supervised learning
requires
a) at least one input attribute.
b) input attributes to be categorical.
c) at least one output attribute.
d) output attributes to be categorical.
Ans : Solution B
13. A regression model in which more than one independent variable is used to predict the
dependent variable is called
a) a simple linear regression model
b) a multiple regression models
c) an independent model
d) none of the above
Ans : Solution C
14. A term used to describe the case when the independent variables in a multiple
regression model
are correlated is
a) Regression
b) correlation
c) multicollinearity
Ans : Solution C
15. A multiple regression model has the form: y = 2 + 3×1 + 4×2. As x1 increases by 1 unit
(holding x2 constant), y will
a) increase by 3 units
b) decrease by 3 units
c) increase by 4 units
d) decrease by 4 units
Ans : Solution C
16. A multiple regression model has

a) only one independent variable
b) more than one dependent variable
c) more than one independent variable
Ans : Solution B
17. A measure of goodness of fit for the estimated regression equation is the
a) multiple coefficient of determination
b) mean square due to error
c) mean square due to regression
Ans : Solution C
18. The adjusted multiple coefficient of determination accounts for

a) the number of dependent variables in the model
b) the number of independent variables in the model
c) unusually large predictors
Ans : Solution D
19. The multiple coefficient of determination is computed by

a) dividing SSR by SST
b) dividing SST by SSR
c) dividing SST by SSE
Ans : Solution C
20. For a multiple regression model, SST = 200 and SSE = 50. The multiple coefficient of
determination is
a) 0.25
b) 4.00
c) 0.75
Ans : Solution B
21. A nearest neighbor approach is best used

a) with large-sized datasets.
b) when irrelevant attributes have been removed from the data.
c) when a generalized model of the data is desirable.
d) when an explanation of what has been found is of primary importance.
Ans : Solution B
22. Another name for an output attribute.

a) predictive variable
b) independent variable
c) estimated variable
d) dependent variable
Ans : Solution B
23. Classification problems are distinguished from estimation problems in that

a) classification problems require the output attribute to be numeric.
b) classification problems require the output attribute to be categorical.
c) classification problems do not allow an output attribute.
d) classification problems are designed to predict future outcome.
Ans : Solution C
24. Which statement is true about prediction problems?

a) The output attribute must be categorical.
b) The output attribute must be numeric.
c) The resultant model is designed to determine future outcomes.
d) The resultant model is designed to classify current behavior.
Ans : Solution D
25. Which statement about outliers is true?
a) Outliers should be identified and removed from a dataset.
b) Outliers should be part of the training dataset but should not be present in the test
data.
c) Outliers should be part of the test dataset but should not be present in the training
data.
d) The nature of the problem determines how outliers are used.
Ans : Solution D
26. Which statement is true about neural network and linear regression models?
a) Both models require input attributes to be numeric.
b) Both models require numeric attributes to range between 0 and 1.
c) The output of both models is a categorical attribute value.
d) Both techniques build models whose output is determined by a linear sum of weighted
input attribute values.
Ans : Solution A
27. Which of the following is a common use of unsupervised clustering?

a) detect outliers
b) determine a best set of input attributes for supervised learning
c) evaluate the likely performance of a supervised learner model
d) determine if meaningful relationships can be found in a dataset
Ans : Solution A
28. The average positive difference between computed and desired outcome values.
a) root mean squared error
b) mean squared error
c) mean absolute error
d) mean positive error
Ans : Solution D
29. Selecting data so as to assure that each class is properly represented in both the
training and
test set.
a) cross validation
b) stratification
c) verification
d) bootstrapping
Ans : Solution B
30. The standard error is defined as the square root of this computation.
a) The sample variance divided by the total number of sample instances.
b) The population variance divided by the total number of sample instances.
c) The sample variance divided by the sample mean.
d) The population variance divided by the sample mean.
Ans : Solution A
31. Data used to optimize the parameter settings of a supervised learner model.
a) Training
b) Test
c) Verification
d) Validation
Ans : Solution D
Unit-3
Very Short Answers
Q1. What is example of unsupervised learning?
Ans. Some use cases for unsupervised learning — more specifically, clustering — include:
Customer segmentation, or understanding different customer groups around which to build
marketing or other business strategies. Genetics, for example clustering DNA patterns to
analyze evolutionary biology.
Q2. What can unsupervised learning be used for?
Ans. Unsupervised learning is commonly used for finding meaningful patterns and

groupings inherent in data, extracting generative features, and exploratory purposes
Q3. What is unsupervised learning?
Ans. Unsupervised learning, also known as unsupervised machine learning, uses

machine learning algorithms to analyze and cluster unlabeled datasets. These
algorithms discover hidden patterns or data groupings without the need for human
intervention.
Q4. Why Clustering is called unsupervised learning?
Ans. Clustering is an unsupervised machine learning task that automatically divides

the data into clusters, or groups of similar items. It does this without having been told
how the groups should look ahead of time.
Q5. Is K means supervised or unsupervised?
Ans. K-means is a clustering algorithm that tries to partition a set of points into K sets
(clusters) such that the points in each cluster tend to be near each other. It is
unsupervised because the points have no external classification.
Q6. Is Dbscan supervised or unsupervised?
Ans. DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a

popular unsupervised learning method utilized in model building and machine learning
algorithms.
Short Answers.
Q1. What is DBSCAN? When to Use it?

Ans. DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a
popular unsupervised learning method utilized in model building and machine learning
algorithms. Before we go any further, we need to define what an “unsupervised” learning
method is. Unsupervised learning methods are when there is no clear objective or
outcome we are seeking to find. Instead, we are clustering the data together based on the
similarity of observations. To help clarify, let’s take Netflix as an example. Based on
previous shows you have watched in the past, Netflix will recommend shows for you to
watch next. Anyone who has ever watched or been on Netflix has seen the screen below
with recommendations (Yes, this image is taken directly from my Netflix account and if you
have never watched Shameless before I suggest you get on that ASAP).
Because I watched ‘Shameless’, Netflix recommends several other similar shows to watch.
But where is Netflix gathering those recommendations from? Considering it is trying to
predict the future with what show I am going to watch next, Netflix has nothing to base the
predictions or recommendations on (no clear definitive objective). Instead, Netflix looks at
other users who have also watched ‘Shameless’ in the past, and looks at what those users
watched in addition to ‘Shameless’. By doing so, Netflix is clustering its users together
based on similarity of interests. This is exactly how unsupervised learning works. Simply
clustering observations together based on similarity, hoping to make accurate conclusions
based on the clusters.
Back to DBSCAN. DBSCAN is a clustering method that is used in machine learning to
separate clusters of high density from clusters of low density. Given that DBSCAN is
a density based clustering algorithm, it does a great job of seeking areas in the data that
have a high density of observations, versus areas of the data that are not very dense with
observations. DBSCAN can sort data into clusters of varying shapes as well, another strong
advantage. DBSCAN works as such:
 Divides the dataset into n dimensions
 For each point in the dataset, DBSCAN forms an n dimensional shape
around that data point, and then counts how many data points fall
within that shape.
 DBSCAN counts this shape as a cluster. DBSCAN iteratively expands
the cluster, by going through each individual point within the cluster,
and counting the number of other data points nearby. Take the graphic
below for an example:
Q2. Why is clustering important in businesses?
Ans. Clusters and Productivity. Being part of a cluster allows companies to operate more
productively in sourcing inputs; accessing information, technology, and needed
institutions; coordinating with related companies; and measuring and motivating
improvement.
Q3. Why is KMeans better than DBSCAN?
Ans. In Data Science and Machine Learning, KMeans and DBScan are two of the most
popular clustering(unsupervised) algorithms. These are both simple in implementation, but
DBScan is a bit more simple. I’ve just used both of them and I honestly found DBScan
more powerful and interesting in both aspects, implementation and performance.
Meaning that no single algorithm is the best for all the purposes. This means, that there are
situations where DBSCAN is very performant, while sometimes its performance is very bad.
Density clustering (for example DBSCAN) seems to correspond more to human intuitions of
clustering, rather than distance from a central clustering point (for example KMeans).
Density clustering algorithms use the concept of reachability i.e. how many neighbors has a
point within a radius. DBScan is more lovely because it doesn’t need parameter, k, which is
the number of clusters we are trying to find, which KMeans needs. When you don’t know
the number of clusters hidden in the dataset and there’s no way to visualize your dataset,
it’s a good decision to use DBScan. DBSCAN produces a varying number of clusters,
based on the input data.
Here’s a list of advantages of KMeans and DBScan:

 KMeans is much faster than DBScan
 DBScan doesn’t need number of clusters
Here’s a list of disadvantages of KMeans and DBScan:
 K-means need the number of clusters hidden in the dataset

 DBScan doesn’t work well over clusters with different densities
 DBScan needs a careful selection of its parameters
As a side point, If DBScan fails and you need a clustering algorithm that
automatically detects the number of clusters in your dataset you can try MeanShift
algorithm.
Q4. How clustering can be used in business analytics?
Ans. Cluster analysis has been widely used in many applications such as business
intelligence, image pattern recognition, Web search, biology, and security. In business
intelligence, clustering can be used to organize a large number of customers into groups,
where customers within a group share strong similar characteristics. T his facilitates the
development of business strategies for enhanced customer relationship management.
Moreover, consider a consultant company with a large number of projects. To improve
project management, clustering can be applied to partition projects into categories based
on similarity so that project auditing and diagnosis (to improve project delivery and
outcomes) can be conducted effectively.
Data clustering is under vigorous development. Contributing areas of research include

data mining, statistics, machine learning, spatial database technology, information
retrieval, Web search, biology, marketing, and many other application areas. Owing to the
huge amounts of data collected in databases, cluster analysis has recently become a
highly active topic in data mining research.
Q5. Why clusters is known as unsupervised learning?
Ans. Clustering is known as unsupervised learning because the class label information is

not present. For this reason, clustering is a form of learning by observation, rather than
learning by examples. In data mining, efforts have focused on finding methods for efficient
and effective cluster analysis in large databases. Active themes of research focus on the
scalability of clustering methods, the effectiveness of methods for clustering complex
shapes (e.g., nonconvex) and types of data (e.g., text, graphs, and images), high-
dimensional clustering techniques (e.g., clustering objects with thousands of features), and
methods for clustering mixed numerical and nominal data in large databases. Clustering is
known as unsupervised learning because the class label information is not present. For
this reason, clustering is a form of learning by observation, rather than learning by
examples. In data mining, efforts have focused on finding methods for efficient and
effective cluster analysis in large databases. Active themes of research focus on the
scalability of clustering methods, the effectiveness of methods for clustering complex
shapes (e.g., nonconvex) and types of data (e.g., text, graphs, and images), high-
dimensional clustering techniques (e.g., clustering objects with thousands of features), and
methods for clustering mixed numerical and nominal data in large databases.
Long Answers
Q1. Define clustering. What are the different types of clustering explain in detail?
Ans. Clustering is a type of unsupervised learning method of machine learning. In the

unsupervised learning method, the inferences are drawn from the data sets which do not
contain labelled output variable. It is an exploratory data analysis technique that allows us
to analyse the multivariate data sets.
Clustering is a task of dividing the data sets into a certain number of clusters in such a
manner that the data points belonging to a cluster have similar characteristics. Clusters
are nothing but the grouping of data points such that the distance between the data points
within the clusters is minimal.
In other words, the clusters are regions where the density of similar data points is high. It is
generally used for the analysis of the data set, to find insightful data among huge data sets
and draw inferences from it. Generally, the clusters are seen in a spherical shape, but it is
not necessary as the clusters can be of any shape.
Clustering itself can be categorized into two types viz. Hard Clustering and Soft Clustering.
In hard clustering, one data point can belong to one cluster only. But in soft clustering, the
output provided is a probability likelihood of a data point belonging to each of the pre-
defined numbers of clusters.
Density-Based Clustering
In this method, the clusters are created based upon the density of the data points which
are represented in the data space. The regions that become dense due to the huge
number of data points residing in that region are considered as clusters.
The data points in the sparse region (the region where the data points are very less) are
considered as noise or outliers. The clusters created in these methods can be of arbitrary
shape. Following are the examples of Density-based clustering algorithms:
DBSCAN (Density-Based Spatial Clustering of Applications with Noise)
DBSCAN groups data points together based on the distance metric and criterion for a
minimum number of data points. It takes two parameters – eps and minimum points. Eps
indicates how close the data points should be to be considered as neighbors. The criterion
for minimum points should be completed to consider that region as a dense region.
OPTICS (Ordering Points to Identify Clustering Structure)
It is similar in process to DBSCAN, but it attends to one of the drawbacks of the former
algorithm i.e. inability to form clusters from data of arbitrary density. It considers two more
parameters which are core distance and reachability distance. Core distance indicates
whether the data point being considered is core or not by setting a minimum value for it.
Reachability distance is the maximum of core distance and the value of distance metric
that is used for calculating the distance among two data points. One thing to consider
about reachability distance is that its value remains not defined if one of the data points is
a core point.
HDBSCAN (Hierarchical Density-Based Spatial Clustering of Applications with Noise)
HDBSCAN is a density-based clustering method that extends the DBSCAN methodology

by converting it to a hierarchical clustering algorithm.
Hierarchical Clustering
Hierarchical Clustering groups (Agglomerative or also called as Bottom-Up Approach) or

divides (Divisive or also called as Top-Down Approach) the clusters based on the distance
metrics. In Agglomerative clustering, each data point acts as a cluster initially, and then it
groups the clusters one by one.
Divisive is the opposite of Agglomerative, it starts off with all the points into one cluster and
divides them to create more clusters. These algorithms create a distance matrix of all the
existing clusters and perform the linkage between the clusters depending on the criteria of
the linkage. The clustering of the data points is represented by using a dendrogram. There
are different types of linkages: –
o Single Linkage: – In single linkage the distance between the two clusters is the
shortest distance between points in those two clusters.
o Complete Linkage: – In complete linkage, the distance between the two clusters is the
farthest distance between points in those two clusters.
o Average Linkage: – In average linkage the distance between the two clusters is the
average distance of every point in the cluster with every point in another cluster.
Fuzzy Clustering
In fuzzy clustering, the assignment of the data points in any of the clusters is not decisive.
Here, one data point can belong to more than one cluster. It provides the outcome as the
probability of the data point belonging to each of the clusters. One of the algorithms used
in fuzzy clustering is Fuzzy c-means clustering.
This algorithm is similar in process to the K-Means clustering and it differs in the
parameters that are involved in the computation like fuzzifier and membership values.
Partitioning Clustering
This method is one of the most popular choices for analysts to create clusters. In
partitioning clustering, the clusters are partitioned based upon the characteristics of the
data points. We need to specify the number of clusters to be created for this clustering
method. These clustering algorithms follow an iterative process to reassign the data points
between clusters based upon the distance. The algorithms that fall into this category are
as follows: –
o K-Means Clustering: – K-Means clustering is one of the most widely used algorithms.
It partitions the data points into k clusters based upon the distance metric used for the
clustering. The value of ‘k’ is to be defined by the user. The distance is calculated between
the data points and the centroids of the clusters.
The data point which is closest to the centroid of the cluster gets assigned to that cluster.
After an iteration, it computes the centroids of those clusters again and the process
continues until a pre-defined number of iterations are completed or when the centroids of
the clusters do not change after an iteration.
It is a very computationally expensive algorithm as it computes the distance of every data

point with the centroids of all the clusters at each iteration. This makes it difficult for
implementing the same for huge data sets.
PAM (Partitioning Around Medoids)
This algorithm is also called as k-medoid algorithm. It is also similar in process to the K-
means clustering algorithm with the difference being in the assignment of the center of the
cluster. In PAM, the medoid of the cluster has to be an input data point while this is not
true for K-means clustering as the average of all the data points in a cluster may not
belong to an input data point.
o CLARA (Clustering Large Applications): – CLARA is an extension to the PAM

algorithm where the computation time has been reduced to make it perform better for large
data sets. To accomplish this, it selects a certain portion of data arbitrarily among the
whole data set as a representative of the actual data. It applies the PAM algorithm to
multiple samples of the data and chooses the best clusters from a number of iterations.
Grid-Based Clustering
In grid-based clustering, the data set is represented into a grid structure which comprises
of grids (also called cells). The overall approach in the algorithms of this method differs
from the rest of the algorithms.
They are more concerned with the value space surrounding the data points rather than the
data points themselves. One of the greatest advantages of these algorithms is its
reduction in computational complexity. This makes it appropriate for dealing with
humongous data sets.
After partitioning the data sets into cells, it computes the density of the cells which helps in
identifying the clusters. A few algorithms based on grid-based clustering are as follows: –
o STING (Statistical Information Grid Approach): – In STING, the data set is divided
recursively in a hierarchical manner. Each cell is further sub-divided into a different
number of cells. It captures the statistical measures of the cells which helps in answering
the queries in a small amount of time.
o WaveCluster: – In this algorithm, the data space is represented in form of wavelets.

The data space composes an n-dimensional signal which helps in identifying the clusters.
The parts of the signal with a lower frequency and high amplitude indicate that the data
points are concentrated. These regions are identified as clusters by the algorithm.
 CLIQUE (Clustering in Quest): – CLIQUE is a combination of density-based and

grid-based clustering algorithm. It partitions the data space and identifies the sub-
spaces using the Apriori principle. It identifies the clusters by calculating the
densities of the cells.
Q2. How unsupervised machine learning works? When to use unsupervised

machine learning?
Ans. unsupervised learning works by teaching the model to identify patterns on its
own (hence unsupervised) from unlabeled data. This means that an input is provided,
but not an output.
To understand how this works, let’s continue with the fruit example given above. With
unsupervised learning, you’ll provide the model with the input dataset (the pictures of
the fruits and their characteristics), but you will not provide the output (the names of
the fruits).
The model will use a suitable algorithm to train itself to divide the fruits into different
groups according to the most similar features between them. This kind of
unsupervised learning, called clustering, is the most common.
In contrast to supervised learning, unsupervised learning can handle large volumes of

data in real time. And because the model will automatically identify structure in data
(classification), it’s useful in cases where a human would have a hard time finding
trends within the data on their own.
For example, if you were trying to segment potential consumers into groups for
marketing purposes, an unsupervised clustering method would be a great starting
point.
Here are some examples of use cases for unsupervised learning:
 Grouping customers by their purchase behavior

 Finding correlations in customer data (for instance, people who buy a
certain style bag may also be interested in a certain style of shoes)
 Segmenting data by purchase history
 Classifying people based on different interests
 Grouping inventories by manufacturing and sales metrics
Wood explained to us that he once worked for a pharmaceutical company with

manufacturing facilities around the world. The software the company used to record
errors that happened in their facilities did not have a drop-down menu with common
error options to choose from.
Because of this, factory workers documented errors in plain text (either in English or
their local language). The company wished to know the causes of common
manufacturing problems, but without a categorization of the errors it was impossible
to perform statistical analysis on the data.
Wood used an unsupervised learning algorithm to discover commonalities in the

errors. He was able to identify the biggest themes and produce statistics such as pie-
chart breakdowns of the common manufacturing problems in the company. Wood
says:
MCQ’s
Unit-3
1. _____ terms are required for building a bayes model.
(A) 1
(B) 2
(C) 3
(D) 4
Answer
Correct option is C
2. Which of the following is the consequence between a node and its predecessors
while creating bayesian network?
(A) Conditionally independent
(B) Functionally dependent
(C) Both Conditionally dependant & Dependant
(D) Dependent
Answer
Correct option is A
3. Why it is needed to make probabilistic systems feasible in the world?
(A) Feasibility
(B) Reliability
(C) Crucial robustness
(D) None of the above
Answer
Correct option is C
4. Bayes rule can be used for:-
(A) Solving queries
(B) Increasing complexity
(C) Answering probabilistic query
(D) Decreasing complexity
Answer
Correct option is C
5. _____ provides way and means of weighing up the desirability of goals and the
likelihood of achieving them.
(A) Utility theory
(B) Decision theory
(C) Bayesian networks
(D) Probability theory
Answer
Correct option is A
6. Which of the following provided by the Bayesian Network?

(A) Complete description of the problem
(B) Partial description of the domain
(C) Complete description of the domain
(D) All of the above
Answer
Correct option is C
7. Probability provides a way of summarizing the ______ that comes from our
laziness and ignorances.
(A) Belief
(B) Uncertaintity
(C) Joint probability distributions
(D) Randomness
Answer
Correct option is B
8. The entries in the full joint probability distribution can be calculated as
(A) Using variables
(B) Both Using variables & information
(C) Using information
Answer
Correct option is C
9. Causal chain (For example, Smoking cause cancer) gives rise to:-
(A) Conditionally Independence
(B) Conditionally Dependence
(C) Both
Answer
Correct option is A
10. The bayesian network can be used to answer any query by using:-
(A) Full distribution
(B) Joint distribution
(C) Partial distribution
Answer
Correct option is B
11. Bayesian networks allow compact specification of:-
(A) Joint probability distributions
(B) Belief
(C) Propositional logic statements
Answer
Correct option is A
12. The compactness of the bayesian network can be described by
(A) Fully structured
(B) Locally structured
(C) Partially structured
Answer
Correct option is B
13. The Expectation Maximization Algorithm has been used to identify conserved
domains in unaligned proteins only. State True or False.
(A) True
(B) False
Answer
Correct option is B
14. Which of the following is correct about the Naive Bayes?
(A) Assumes that all the features in a dataset are independent
(B) Assumes that all the features in a dataset are equally important
(C) Both
Answer
Correct option is C
15. Which of the following is false regarding EM Algorithm?
(A) The alignment provides an estimate of the base or amino acid composition of each
column in the site
(B) The column-by-column composition of the site already available is used to
estimate the probability of finding the site at any position in each of the sequences
(C) The row-by-column composition of the site already available is used to estimate
the probability
Answer
Correct option is C
16. Naïve Bayes Algorithm is a ________ learning algorithm.
(A) Supervised
(B) Reinforcement
(C) Unsupervised
(D) None of these
Answer
Correct option is A
17. EM algorithm includes two repeated steps, here the step 2 is ______.
(A) The normalization
(B) The maximization step
(C) The minimization step
Answer
Correct option is C
18. Examples of Naïve Bayes Algorithm is/are
(A) Spam filtration
(B) Sentimental analysis
(C) Classifying articles
Answer
Correct option is D
19. In the intermediate steps of "EM Algorithm", the number of each base in each
column is determined and then converted to fractions.
(A) True
(B) False
Answer
Correct option is A
20. Naïve Bayes algorithm is based on _______ and used for solving classification
problems.
(A) Bayes Theorem
(B) Candidate elimination algorithm
(C) EM algorithm
Answer
Correct option is A
21. Types of Naïve Bayes Model:
(A) Gaussian
(B) Multinomial
(C) Bernoulli
Answer
Correct option is D
2. Disadvantages of Naïve Bayes Classifier:
(A) Naive Bayes assumes that all features are independent or unrelated, so it cannot
learn the relationship between features.
(B) It performs well in Multi-class predictions as compared to the other Algorithms.
(C) Naïve Bayes is one of the fast and easy ML algorithms to predict a class of
datasets.
(D) It is the most popular choice for text classification problems.
Answer
Correct option is A
23. The benefit of Naïve Bayes:-
(A) Naïve Bayes is one of the fast and easy ML algorithms to predict a class of
datasets.
(B) It is the most popular choice for text classification problems.
(C) It can be used for Binary as well as Multi-class Classifications.
Answer
Correct option is D
24. In which of the following types of sampling the information is carried out under
the opinion of an expert?
(A) Convenience sampling
(B) Judgement sampling
(C) Quota sampling
(D) Purposive sampling
Answer
Correct option is B
25. Full form of MDL.
(A) Minimum Description Length
(B) Maximum Description Length
(C) Minimum Domain Length
(D) None of these
Answer
Correct option is A
Unit-4
Very Short Answers
Fill in the blanks:
Q1. One of the major sources of data for many major companies is the device
which all of us have in our hands all the time (Smartphone/ Mobile Phones)
Q2. The world of Artificial Intelligence revolves around (Data)
True/False:
Q3. All the apps collect some kind of data. (True)
Q4. What do you understand by Machine Learning?

Ans. Machine Learning is a subset of Artificial Intelligence which enables machines
to improve at tasks with experience (data). The intention of Machine Learning is to
enable machines to learn by themselves using the provided data and make accurate
Predictions/ Decisions.
OR
Machine learning focuses on the development of computer programs that can access
data and use it to learn for themselves.
OR
Machine learning is a data analytics technique that teaches computers to do what
comes naturally to humans and animals: learn from experience.
Q5. What do you understand by Deep Learning?
Ans Deep Learning is the most advanced form of Artificial Intelligence. In Deep
Learning, the machine is trained with huge amounts of data which helps it in training
itself around the data. Such machines are intelligent enough to develop algorithms for
themselves.
OR
Deep learning is an artificial intelligence (AI) function that imitates the workings of the
human brain in processing data and creating patterns for use in decision making.
OR
Deep learning is a subset of machine learning where artificial neural networks,
algorithms inspired by the human brain, learn from large amounts of data.
Correct option is A
12. The compactness of the bayesian network can be described by
(A) Fully structured
(B) Locally structured
(C) Partially structured
Answer
Correct option is B
13. The Expectation Maximization Algorithm has been used to identify conserved
domains in unaligned proteins only. State True or False.
(A) True
(B) False
Answer
Correct option is B
14. Which of the following is correct about the Naive Bayes?
(A) Assumes that all the features in a dataset are independent
(B) Assumes that all the features in a dataset are equally important
(C) Both
Answer
Correct option is C
15. Which of the following is false regarding EM Algorithm?
(A) The alignment provides an estimate of the base or amino acid composition of each
column in the site
(B) The column-by-column composition of the site already available is used to
estimate the probability of finding the site at any position in each of the sequences
(C) The row-by-column composition of the site already available is used to estimate
the probability
Answer
Correct option is C
16. Naïve Bayes Algorithm is a ________ learning algorithm.
(A) Supervised
(B) Reinforcement
(C) Unsupervised
(D) None of these
Answer
Correct option is
17. EM algorithm includes two repeated steps, here the step 2 is ______.
(A) The normalization
(B) The maximization step
(C) The minimization step
Answer
Correct option is C
18. Examples of Naïve Bayes Algorithm is/are
(A) Spam filtration
(B) Sentimental analysis
(C) Classifying articles
Answer
Correct option is D
19. In the intermediate steps of "EM Algorithm", the number of each base in each
column is determined and then converted to fractions.
(A) True
(B) False
Answer
Correct option is A
20. Naïve Bayes algorithm is based on _______ and used for solving classification
problems.
(A) Bayes Theorem
(B) Candidate elimination algorithm
(C) EM algorithm
Answer
Correct option is A
21. Types of Naïve Bayes Model:
(A) Gaussian
(B) Multinomial
(C) Bernoulli
Answer
Q6. What are the three domains of AI?
Ans
● Data Science/ Big Data

● Computer Vision
● Natural Language Processing (NLP)
Q7. Name any two examples of Data science?
(Any two out of the following)

Ans
Price Comparison Websites/ Website Recommendations/ Fraud and Risk

detection/ Internet search/ Personalized healthcare recommendations /
Optimizing Traffic routes in real-time / image tagging.
Q8. Name any two examples of Computer vision?
Ans. Self-Driving cars/ Autonomous vehicles Face Lock in Smartphones/

MedicalImaging/ Facial recognition /Security Systems / Waste Management /
Satellite imaging.
Q9. Name any two examples of Natural Language Processing?
Ans. Email filters/Smart assistants/ Sentiment Analysis/Automatic

Summarization/Search results / Language translation / Digital phone calls.
Q10. Name any two examples of Machine Learning?
Ans. Virtual Personal Assistants, Recommendation systems like Netflix, Face Apps,
Online Fraud Detection
Q11. Where do we collect data from?
Ans. Data can be collected from various sources like –
 Surveys
 Sensors
 Observations
 Web scrapping (Internet)
 Interviews
 Documents and records.
 Oral histories
Short Answers
Q1. What is the difference between AI, Machine Learning, and Deep Learning?
Ans. AI (Artificial Intelligence): Artificial Intelligence refers to the study, development,

and application of computer techniques that allow computers to acquire certain skills of
human intelligence.
ML (Machine Learning): Machine learning is a subset of Artificial Intelligence where

people “train” machines to recognize patterns based on data and make their predictions.
ML algorithms are mathematical algorithms that allow machines to learn by imitating the
way we humans learn.
DL (Deep Learning): Deep Learning is a subset of ML in which the machine is able to

reason and draw its own conclusions, learning by itself. Deep Learning uses algorithms
that mimic human perception inspired by our brain and the connection between neurons.
Most deep learning methods use neural network architecture.
Q2. What is a neural network?
Ans. A neural network is a system of programs and data structures that approximates the
functioning of the human brain. A neural network usually involves a large number of
processors operating in parallel, each having its own small sphere of knowledge and
access to data in its local memory.
A neural network is initially “trained” or fed with large amounts of data and rules about
relationships (e.g. “a grandparent is older than a person’s father”). A program can then tell
the network how to behave in response to an external stimulus (e.g. input from a computer
user interacting with the network) or it can initiate the activity itself, within limits of their
access to the external world.
Deep learning uses neural networks to learn useful representations of features directly
from data. For example, you can use a pre-trained neural network to identify and remove
artifacts such as noise from images.

Q3. What is the idea behind the GANs?
Ans. The Generative Adversarial Network (GAN) is a very popular candidate in the field of
machine learning that has showcased its potential to create realistic-looking images and
videos. GANs consist of two networks (D & G) where –
D =”discriminating” network
G = “Generative” network.
The goal is to create data: images, for example, that cannot be distinguished from actual
images. Suppose we want to create an adversarial example of a cat. Network G will
generate images. Network D will classify the images according to whether it is a cat or not.
The cost function of G will be constructed in such a way that it tries to “trick” D into always
classifying its output as cat.
Q4. What is the difference between Stochastic Gradient Descent (SGD) and Batch
Gradient Descent (BGD)?
Ans. Gradient Descent and Stochastic Gradient Descent are algorithms used in linear
regression to find the set of parameters that minimize a loss function.
Batch Gradient Descent – BGD involves MULTIPLE calculations over the full training set
at each step. It is a slower and expensive process if we have very large training data.
However, this is great for convex or relatively smooth error manifolds.
Stochastic Gradient Descent: SGD picks up a RANDOM instance of training data at

each step and then computes the gradient. This makes SGD a faster process than BGD.
Q5. What is Natural Language Processing? Give an example of it.
Ans. Natural Language Processing, abbreviated as NLP, is a branch of artificial

intelligence that deals with the interaction between machine/computers and humans
using the natural language. Natural language refers to language that is spoken and
written by people, and natural language processing (NLP) attempts to extract
information from the spoken and written word using algorithms.
OR
Natural Language Processing, or NLP, is the sub-field of AI that is focused on
enabling machine/computers to understand and process human languages. AI is
a subfield of Linguistics, Computer Science, Information Engineering, and
Artificial Intelligence concerned with the interactions between computers and
human (natural) languages, in particular how to program computers to process
and analyze large amounts of natural language data.
OR
In NLP, we teach machines how to understand and communicate in
human language. Natural language refers to speech analysis in both audible
speeches, as well as text of a language. NLP systems capture meaning from an
input of words (sentences, paragraphs, pages, etc.)
Q6. What is data mining? Explain with example.
Ans. Data mining is the process of analyzing large data sets and extracting the
useful information from it. Data mining is used by companies to turn raw data into
useful information. It is an interdisciplinary subfield of computer science and
statistics with an overall goal to extract information
OR
Data mining is an automatic or semi-automatic technical process that analyses
large amounts of scattered information to make sense of it and turn it into
knowledge. It looks for anomalies, patterns or correlations among millions of
records to predict results, as indicated by the SAS institute, a world leader in
business analytics.
Example:
Price Comparison websites- They collect data about a product from different sites and
then analyze trends out of it and show up the most appropriate results.
Data mining is also known as Knowledge Discovery in Data (KDD)
Q7. What do you understand by Data Privacy? Discuss in detail with some
examples.
Ans. Data privacy, sometimes also referred to as information privacy, is an area of
data protection that concerns the proper handling of sensitive data
including, notably, personal data but also other confidential data, such as
certain financial data and intellectual property data, to meet regulatory requirements
as well as protecting the confidentiality and immutability of the data. It focuses on
how to collect, process, share, archive, and delete the data in accordance with the
law.
Privacy, in the broadest sense, is the right of individuals, groups, or
organizations to control who can access, observe, or use something they own, such
as their bodies, property, ideas, data, or information.
Control is established through physical, social, or informational boundaries that help
prevent unwanted access, observation, or use. For example:
 A physical boundary, such as a locked front door, helps prevent others from
entering a building without explicit permission in the form of a key to unlock the
door or a person inside opening the door.
 A social boundary, such as a members-only club, only allows members to access
and use club resources.
 An informational boundary, such as a non-disclosure agreement, restricts what
information can be disclosed to others.
Privacy of information is extremely important in this digital age where everything is
interconnected and can be accessed and used easily. The possibilities of our private
information being extremely vulnerable are very real, which is why we require data
privacy.
Q8. AI and robotics have raised some questions regarding liability. Take for
example the scenario of an ‘autonomous’ or AI-driven robot moving through a
factory. Another robot surprisingly crosses its way and our robot draws aside
to prevent collision. However, by this manoeuvre the robot injures a person.
a) Who can be held liable for damages caused by autonomous systems?
It is actually very difficult to blame anyone in such a scenario. Here is the
situation where AI Ethics come in to the picture. Here, the choices might differ
from person to person and one must understand that nobody is wrong in this
case. Every person has a different perspective and hence he/she takes
decisions according to their moralities.
But still if someone is to be liable then it should be the programmer who has
designed the algorithm of the autonomous vehicle as he/she should have
considered all the exceptional conditions that could arise.
b) List two AI Ethics.
(Any two out of the following)

AI Bias, AI Access, Data privacy, AI for kids.
Long Answers
Q1. How will you solve the gradient explosion problem?
Ans. There are many ways to solve the gradient explosion problem. Some of the best
experimental methods are –
Redesign the network model – In deep neural networks, the gradient explosion can be
solved by redesigning the network with fewer layers. Using a smaller batch size is also
good for network training. In recurrent neural networks, updating in fewer previous time
steps during training (truncated backpropagation over time) can alleviate the gradient burst
problem.
Use the ReLU trigger function – In deep multilayer perceptron neural networks, gradient
explosion can occur due to activation functions, such as the previously popular Sigmoid
and Tanh functions. Using the ReLU trigger function can reduce gradient burst. Adopting
the ReLU trigger function is one of the most popular practices for hidden layers.
Use short and long-term memory networks – In the recurrent neural network, the
gradient explosion may be due to the instability of the training of a certain network. For
example, backpropagation over time essentially converts the recurring network into a deep
multilayer perceptron neural network. The use of short- and long-term memory units
(LSTM) and related gate-like neural structures can reduce the gradient burst problem. The
use of LSTM units is the latest best practice for sequence prediction suitable for recurrent
neural networks.
Use gradient clipping – In very deep multilayer perceptron networks with large batches
and LSTMs with long input sequences, gradient bursts can occur. If the gradient burst still
occurs, you can check and limit the size of the gradient during the training process. This
process is called gradient truncation. There is a simple and effective solution to dealing
with gradient bursts: If the gradients exceed the threshold, cut them off.
Specifically, it checks whether the value of the error gradient exceeds the threshold, and if
it exceeds it, the gradient is truncated and the gradient is set as the threshold. Gradient
truncation can alleviate the gradient burst problem to some extent (gradient truncation, i.e.
the gradient is set as a threshold before the gradient descent step).
Use weight regularization – If the gradient explosion still exists, you can try another
method, which is to check the size of the network weights and penalize the loss function
that produces a larger weight value. This process is called weight regularization and
generally uses either the L1 penalty (the absolute value of the weight) or the L2 penalty
(the square of the weight). Using L1 or L2 penalty terms for loop weights can help alleviate
gradient bursts.
Q2. What is a neural network? Explain various parts of Neural Network. How do they
learn?
Ans. In the simplest terms, an artificial neural network (ANN) is an example of machine

learning that takes information, and helps the computer generate an output based on their
knowledge and examples. Machines utilize neural networks and algorithms to help them
adapt and learn without having to be reprogrammed. Neural networks are mimics of the
human brain, where each neuron or node is responsible for solving a small part of the
problem. They pass on what they know and have learned to the other neurons in the
network, until the interconnected nodes are able to solve the problem and give an output.
Trial and error are a huge part of neural networks and are key in helping the nodes learn.
Neural networks are different from computational statistical models because they can learn
from new information—computational machine learning is also designed to make accurate
predictions, while statistical models are designed to learn about the relationship between
variables.
In simple terms, neural networks are fairly easy to understand because they function like
the human brain. There is an information input, the information flows between
interconnected neurons or nodes inside the network through deep hidden layers and uses
algorithms to learn about them, and then the solution is put in an output neuron layer,
giving the final prediction or determination.
Parts of a neural network.
There are many elements to a neural network that help it work, including;
 Neurons—each neuron or node is a function that takes the output from the layer
ahead of it, and spits out a number between 1 and 0, representing true or false
 The input layer and input neurons
 Hidden layers—these are full of many neurons and a neural network can have
many hidden layers inside
 Output layer—this is where the result comes after the information is segmented
through all the hidden layers
 Synapse—this is the connection between neurons and layers inside a neural
network
These parts work together to create a neural network that can help make predictions and
solve problems. An input is received by input neurons in the input layer, and the
information then goes through the synapse connection to the hidden layers. Each neuron
inside a hidden layer has a connection to another node in another layer. When the neuron
gets information, it sends along some information to the next connected neuron.
Algorithms are key in helping dissect the information. The amount of information, or
weight, it sends is determined by a mathematical activation function, and the result of the
activation function will be a number between 0 and 1. Each layer also has a bias that it
calculates in as part of the activation function. The output of that activation function is the
input for the next hidden layer, until you get to the output layer. The eventual output in the
output layer will be 0 or 1, true or false, to answer the question or make the prediction.
How neural networks learn.
Neural networks have to be “taught” in order to get started functioning and learning on
their own. They then can learn from the outputs they have put out and the information they
get in, but it has to start somewhere. There are a few processes that can be used to help
neural networks get started learning.
Training. Neural networks that are trained are given random numbers or weights to begin.
They are either supervised or unsupervised for training. Supervised training involves a
mechanism that gives the network a grade or corrections. Unsupervised training makes
the network work to figure out the inputs without outside help. Most neural networks use
supervised training to help it learn more quickly.
Transfer learning. Transfer learning is a technique that involves giving a neural network a
similar problem that can then be reused in full or in part to accelerate the training and
improve the performance on the problem of interest.
Feature extraction. Feature extraction is taking all of the data to be fed to an input,
removing any redundant data, and bundling it into more manageable segments. This cuts
down on the memory and computation power needed to run a problem through a neural
network, by only giving the network the absolutely necessary information.
Q3. List some advantages and application of neural Networks.

Ans. Advantages of Neural Network
ANN’s outputs aren't limited entirely by inputs and results given to them initially by an
expert system. This ability comes in handy for robotics and pattern recognition systems.
This network has the potential for high fault tolerance and is capable of debugging or
diagnosing a network on its own. ANN can go through thousands of log files from a
company and sort them out. It is presently a tedious task done by administrators.
Nonlinear systems can find shortcuts to reach computationally expensive solutions. We

see this in banking where they have an Excel spreadsheet, and then they start building
codes around that sheet. In over 20 years, they might create a repertoire of all these
functions, and the neural network comes up with the same answers done in days, weeks,
or even a month for a large bank.
Applications of Neural Network
With an enormous number of applications implementations every day, now is the most
appropriate time to know about the applications of neural networks, machine learning, and
artificial intelligence. Some of them are discussed below:
Handwriting Recognition
Neural networks are used to convert handwritten characters into digital characters that a
machine can recognize.
Stock-exchange prediction
The stock exchange is difficult to track and difficult to understand. Many factors affect the
stock market. A neural network can examine a lot of factors and predict the prices daily,
which would help stockbrokers.
Right now, it's still in an initial phase. You should know that there are over three terabytes
of data a day just from the US stock exchange. That's a lot of data to dig through, and you
have to sort it out before you start focusing on even one stock.
Traveling issues of sales professionals
This type refers to finding an optimal path to travel between cities in a particular area.
Neural networks help solve the problem of providing higher revenue at minimal costs.
Logistical considerations are enormous, and here we have to find optimal travel paths for
sales professionals moving from town to town.
Image compression
The idea behind the data compression neural network is to store, encrypt, and recreate the
actual image again. We can optimize the size of our data using image compression neural
networks. It is the ideal application to save memory and optimize it.
The last section of ‘What is a neural network?’ article lets you understand the future of
neural networks.
Future of Neural Networks
With the way AI and machine learning is being adopted by companies today, we could see
more advancements in the applications of neural networks. There will be personalized
choices for users all over the world. All mobile and web applications try to give you an
enhanced customized experience based on your search history.
Hyper-intelligent virtual assistants will make life easier. If you have ever used Google
assistant, Siri, or any of those assistants, you can comprehend how they're slowly
evolving. They may even predict your email response in the future.
We can expect a few intriguing discoveries on algorithms to support learning methods. We

are just in the infant stage of applying artificial intelligence and neural networks to the real
world.
Neural networks will be a lot faster in the future, and neural network tools can get
embedded in every design surface. We already have a little mini neural network that plugs
into an inexpensive processing board, or even into your laptop. Focusing on the hardware,
instead of the software, would make devices even faster.
Neural networks will find its usage in the field of medicine, agriculture, physics,
discoveries, and everything else you can imagine. Neural networks are also used in
shared data systems.
Q4. How Do Convolutional Layers Work in Deep Learning Neural Networks?
Ans. Convolutional layers are the major building blocks used in convolutional neural
networks.
The convolutional neural network, or CNN for short, is a specialized type of neural network
model designed for working with two-dimensional image data, although they can be used
with one-dimensional and three-dimensional data.
Central to the convolutional neural network is the convolutional layer that gives the
network its name. This layer performs an operation called a “convolution“.
In the context of a convolutional neural network, a convolution is a linear operation that
involves the multiplication of a set of weights with the input, much like a traditional neural
network. Given that the technique was designed for two-dimensional input, the
multiplication is performed between an array of input data and a two-dimensional array of
weights, called a filter or a kernel.
The filter is smaller than the input data and the type of multiplication applied between a
filter-sized patch of the input and the filter is a dot product. A dot product is the element-
wise multiplication between the filter-sized patch of the input and filter, which is then
summed, always resulting in a single value. Because it results in a single value, the
operation is often referred to as the “scalar product“.
Using a filter smaller than the input is intentional as it allows the same filter (set of weights)
to be multiplied by the input array multiple times at different points on the input.
Specifically, the filter is applied systematically to each overlapping part or filter-sized patch
of the input data, left to right, top to bottom.
Worked Example of Convolutional Layers
The Keras deep learning library provides a suite of convolutional layers.
We can better understand the convolution operation by looking at some worked examples
with contrived data and handcrafted filters.
In this section, we’ll look at both a one-dimensional convolutional layer and a two-
dimensional convolutional layer example to both make the convolution operation concrete
and provide a worked example of using the Keras layers.
What is a 1D convolution?
Convolution operates on two signals (in 1D) or two images (in 2D): you can think of one
as the “input” signal (or image), and the other (called the kernel) as a “filter” on the input
image, producing an output image (so convolution takes two images as input and
produces a third as output)
The 2D Convolution Layer

A filter or a kernel in a conv2D layer “slides” over the 2D input data, performing an
elementwise multiplication. ... The kernel will perform the same operation for every location
it slides over, transforming a 2D matrix of features into a different 2D matrix of features.
Example of 1D Convolutional Layer
We can define a one-dimensional input that has eight elements all with the value of 0.0,
with a two element bump in the middle with the values 1.0.
[0, 0, 0, 1, 1, 0, 0, 0]
The input to Keras must be three dimensional for a 1D convolutional layer.
The first dimension refers to each input sample; in this case, we only have one sample.
The second dimension refers to the length of each sample; in this case, the length is eight.
The third dimension refers to the number of channels in each sample; in this case, we only
have a single channel.
Therefore, the shape of the input array will be [1, 8, 1].
# define input data
data = asarray([0, 0, 0, 1, 1, 0, 0, 0])
data = data.reshape(1, 8, 1)
Example of 2D Convolutional Layer
We can expand the bump detection example in the previous section to a vertical line
detector in a two-dimensional image.
Again, we can constrain the input, in this case to a square 8×8 pixel input image with a
single channel (e.g. grayscale) with a single vertical line in the middle.
[0, 0, 0, 1, 1, 0, 0, 0]
[0, 0, 0, 1, 1, 0, 0, 0]
[0, 0, 0, 1, 1, 0, 0, 0]
[0, 0, 0, 1, 1, 0, 0, 0]
[0, 0, 0, 1, 1, 0, 0, 0]
[0, 0, 0, 1, 1, 0, 0, 0]
[0, 0, 0, 1, 1, 0, 0, 0]
[0, 0, 0, 1, 1, 0, 0, 0]
The input to a Conv2D layer must be four-dimensional.
The first dimension defines the samples; in this case, there is only a single sample. The
second dimension defines the number of rows; in this case, eight. The third dimension
defines the number of columns, again eight in this case, and finally the number of
channels, which is one in this case.
Therefore, the input must have the four-dimensional shape [samples, rows, columns,
channels] or [1, 8, 8, 1] in this case.
# define input data
data = [[0, 0, 0, 1, 1, 0, 0, 0],
[0, 0, 0, 1, 1, 0, 0, 0],
[0, 0, 0, 1, 1, 0, 0, 0],
[0, 0, 0, 1, 1, 0, 0, 0],
[0, 0, 0, 1, 1, 0, 0, 0],
[0, 0, 0, 1, 1, 0, 0, 0],
[0, 0, 0, 1, 1, 0, 0, 0],
[0, 0, 0, 1, 1, 0, 0, 0]]
data = asarray(data)
data = data.reshape(1, 8, 8, 1)
MCQs
1. Who was the inventor of the first neurocomputer?
A. Dr. John Hecht-Nielsen

B. Dr. Robert Hecht-Nielsen
C. Dr. Alex Hecht-Nielsen
D. Dr. Steve Hecht-
Nielsen Ans : B
Explanation: The inventor of the first neurocomputer, Dr. Robert Hecht-Nielsen.
2. How many types of Artificial Neural Networks?
A. 2
B. 3
C. 4
D. 5
Ans :
A
Explanation: There are two Artificial Neural Network topologies : FeedForward and
Feedback.
3. In which ANN, loops are allowed?
A. FeedForward ANN
B. FeedBack ANN
C. Both A and B
D. None of the
Above Ans : B
Explanation: FeedBack ANN loops are allowed. They are used in content
addressable memories.
4. What is the full form of BN in Neural Networks?
A. Bayesian Networks
B. Belief Networks
C. Bayes Nets
D. All of the above
Ans : D
Explanation: The full form BN is Bayesian networks and Bayesian networks are also
called Belief Networks or Bayes Nets.
5. What is the name of node which take binary values TRUE (T) and FALSE (F)?
A. Dual Node
B. Binary Node
C. Two-way Node
D. Ordered
Node Ans : B
Explanation: Boolean nodes : They represent propositions, taking binary values

TRUE (T) and FALSE (F).
6. What is an auto-associative network?
A. a neural network that contains no loops

B. a neural network that contains feedback
C. a neural network that has only one loop
D. a single layer feed-forward neural network with pre-
processing Ans : B
Explanation: An auto-associative network is equivalent to a neural network that

contains feedback. The number of feedback paths(loops) does not have to be one.
7. What is Neuro software?
A. A software used to analyze neurons

B. It is powerful and easy neural network
C. Designed to aid experts in real world
D. It is software used by
Neurosurgeon Ans : B
Explanation: Neuro software is powerful and easy neural network.
8. Neural Networks are complex with many parameters.
A. Linear Functions
B. Nonlinear Functions
C. Discrete Functions
D. Exponential Functions
Ans : A
Explanation: Neural networks are complex linear functions with many parameters.
9. Which of the following is not the promise of artificial neural network?
A. It can explain result

B. It can survive the failure of some nodes
C. It has inherent parallelism
D. It can handle
noise Ans : A
Explanation: The artificial Neural Network (ANN) cannot explain result.
10. The output at each node is called .
A. node value
B. Weight
C. neurons
D. axon
s Ans : A
Explanation: The output at each node is called its activation or node value.
11. What is full form of ANNs?
A. Artificial Neural Node

B. AI Neural Networks
C. Artificial Neural Networks
D. Artificial Neural
numbers Ans : C
Explanation: Artificial Neural Networks is the full form of ANNs.
12. In FeedForward ANN, information flow is .
A. unidirectional
B. bidirectional
C. multidirectional
D. All of the
above Ans : A
Explanation: FeedForward ANN the information flow is unidirectional.

13. Which of the following is not an Machine Learning strategies in ANNs?
A. Unsupervised Learning
B. Reinforcement Learning
C. Supreme Learning
D. Supervised
Learning Ans : C
Explanation: Supreme Learning is not an Machine Learning strategies in ANNs.
14. Which of the following is an Applications of Neural Networks?
A. Automotive
B. Aerospace
C. Electronics
D. All of the
above Ans : D
Explanation: All above are appliction of Neural Networks.
15. What is perceptron?
A. a single layer feed-forward neural network with pre-processing

B. an auto-associative neural network
C. a double layer auto-associative neural network
D. a neural network that contains
feedback Ans : A
Explanation: The perceptron is a single layer feed-forward neural network.
16. A 4-input neuron has weights 1, 2, 3 and 4. The transfer function is linear with
the constant of proportionality being equal to 2. The inputs are 4, 3, 2 and 1
respectively. What will be the output?
A. 30
B. 40
C. 50
D. 6
0 Ans :
B
Explanation: The output is found by multiplying the weights with their respective
inputs, summing the results and multiplying with the transfer function. Therefore:
Output = 2 * (1*4
+ 2*3 + 3*2 + 4*1) = 40.
17. What is back propagation?
A. It is another name given to the curvy function in the perceptron

B. It is the transmission of error back through the network to adjust the inputs
C. It is the transmission of error back through the network to allow weights to be
adjusted so that the network can learn
D. None of the
Above Ans : C
Explanation: Back propagation is the transmission of error back through the network
to allow weights to be adjusted so that the network can learn.
18. The network that involves backward links from output to the input and hidden
layers is called
A. Self organizing map

B. Perceptrons
C. Recurrent neural network
D. Multi layered
perceptron Ans : C
Explanation: RNN (Recurrent neural network) topology involves backward links from
output to the input and hidden layers.
19. The first artificial neural network was invented in .
A. 1957
B. 1958
C. 1959
D. 1960
Ans : B
Explanation: The first artificial neural network was invented in 1958.
ANN is composed of large number of highly interconnected processing

elements(neurons) working in unison to solve problems.
a) True
b) False
Ans: True
A neural network model is said to be inspired from the human brain.

The neural network consists of many neurons, each neuron takes an input,
processes it and gives an output. Here’s a diagrammatic representation of a
real neuron.
Which of the following statement(s) correctly represents a real neuron?
A. A neuron has a single input and a single output only
B. A neuron has multiple inputs but a single output only
C. A neuron has a single input but multiple outputs
D. A neuron has multiple inputs and multiple outputs

E. All of the above statements are valid
Solution: (E)
What are the steps for using a gradient descent algorithm?
1. Calculate error between the actual value and the predicted value
2. Reiterate until you find the best weights of network
3. Pass an input through the network and get values from output layer
4. Initialize random weight and bias
5. Go to each neurons which contributes to the error and change its
respective values to reduce the error
A. 1, 2, 3, 4, 5
B. 5, 4, 3, 2, 1
C. 3, 2, 1, 5, 4
D. 4, 3, 1, 5, 2
Solution: (D)
“Convolutional Neural Networks can perform various types of transformation

(rotations or scaling) in an input”. Is the statement correct True or False?
A. True
B. False
Solution: (B)
Which of the following techniques perform similar operations as dropout in a

neural network?
A. Bagging
B. Boosting
C. Stacking
D. None of these
Solution: (A)
In training a neural network, you notice that the loss does not decrease in the
few starting epochs.
The reasons for this could be:
1. The learning is rate is low

2. Regularization parameter is high
3. Stuck at local minima
What according to you are the probable reasons?
A. 1 and 2
B. 2 and 3
C. 1 and 3
D. Any of these
Solution: (D)
Unit 5
Very Short Answers
True False Questions.
(a) [1 point] We can get multiple local optimum solutions if we solve a linear
regression problem by
minimizing the sum of squared errors using gradient descent.
True False
Solution:
False
(b) [1 point] When a decision tree is grown to full depth, it is more likely to fit
the noise in the data.
True False
Solution:
True
(c) [1 point] When the hypothesis space is richer, over fitting is more likely.
True False
Solution:
True
(d) [1 point] When the feature space is larger, over fitting is more likely.
True False
Solution:
True
(e) [1 point] We can use gradient descent to learn a Gaussian Mixture Model.
True False
Solution:
True
Ques. What is reinforcement?
Ans.
Reinforcement Learning is a part of machine learning. Here, agents are self-trained
on reward and punishment mechanisms. It's about taking the best possible action
or path to gain maximum rewards and minimum punishment through observations in
a specific situation. It acts as a signal to positive and negative behaviours
Ques. How do you teach deep reinforcement learning?

Ans. Reinforcement Learning Workflow
1. Create the Environment. First you need to define the environment within
which the agent operates, including the interface between agent and
environment. ...
2. Define the Reward. ...
3. Create the Agent. ...
4. Train and Validate the Agent. ...
5. Deploy the Policy.
Ques. What are the main components of reinforcement learning?
Ans. Beyond the agent and the environment, one can identify four main subelements
of a reinforcement learning system: a policy, a reward function, a value function,
and, optionally, a model of the environment. A policy defines the learning agent's
way of behaving at a given time.
Ques. What are the 3 basic elements of reinforcement theory?
Ans. Reinforcement theory has three primary mechanisms behind it: selective

exposure, selective perception, and selective retention.
Ques. What are the types of reinforcement theory?

Ans.
There are four primary approaches to reinforcement theory: positive reinforcement,
negative reinforcement, positive punishment, and negative punishment
Short Answers
Ques. Explain Reinforcement Learning. Explain with the help of an example
Ans. Reinforcement learning addresses the question of how an autonomous

agent that senses and acts in its environment can learn to choose optimal
actions to achieve its goals.
 Consider building a learning robot. The robot, or agent, has a
set of sensors to observe the state of its environment, and a
set of actions it can perform to alter this state.
 Its task is to learn a control strategy, or policy, for choosing actions that
achieve its goals.
 The goals of the agent can be defined by a reward function
that assigns a numerical value to each distinct action the
agent may take from each distinct state.
 This reward function may be built into the robot, or known
only to an external teacher who provides the reward value
for each action performed by the robot.
 The task of the robot is to perform sequences of actions,
observe their consequences, and learn a control policy.
 The control policy is one that, from any initial state, chooses
actions that maximize the reward accumulated over time by
the agent.
Example:
 A mobile robot may have sensors such as a camera and
sonars, and actions such as "move forward" and "turn."
 The robot may have a goal of docking onto its battery charger whenever
its battery level is low.
 The goal of docking to the battery charger can be captured by assigning
a positive reward (Eg.,
+100) to state-action transitions that immediately result in a
connection to the charger and a reward of zero to every
other state-action transition.
Ques. Explain Reinforcement learning problem characteristics

Ans. Reinforcement learning problem characteristics
1. Delayed reward: The task of the agent is to learn a target

function 𝜋 that maps from the current state s to the optimal
action a = 𝜋 (s). In reinforcement learning, training information is
not available in (s, 𝜋 (s)). Instead, the trainer provides only a
sequence of immediate reward values as the agent executes its
sequence of actions. The agent, therefore, faces the problem of
temporal credit assignment: determining which of the actions
in its sequence are to be credited with producing the eventual
rewards.
2. Exploration: In reinforcement learning, the agent influences the

distribution of training examples by the action sequence it
chooses. This raises the question of which experimentation
strategy produces most effective learning. The learner faces a
trade-off in choosing whether to favor exploration of unknown
states and actions, or exploitation of states and actions that it
has already learned will yield high reward.
3. Partially observable states: The agent's sensors can perceive

the entire state of the environment at each time step, in many
practical situations sensors provide only partial information. In such
cases, the agent needs to consider its previous observations
together with its current sensor data when choosing actions, and
the best policy may be one that chooses actions specifically to
improve the observability of the environment.
4. Life-long learning: Robot requires to learn several related tasks

within the same environment, using the same sensors. For
example, a mobile robot may need to learn how to dock on its
battery charger, how to navigate through narrow corridors, and
how to pick up output from laser printers. This setting raises the
possibility of using previously obtained experience or knowledge to
reduce sample complexity when learning new tasks.
Ques4. Difference between Reinforcement learning and Supervised learning:
Ans.
Reinforcement learning Supervised learning
Reinforcement learning is all about making

decisions sequentially. In simple words we can In Supervised learning the
say that the output depends on the state of the decision is made on the
current input and the next input depends on the initial input or the input
output of the previous input given at the start
Supervised learning the

In Reinforcement learning decision is dependent, decisions are independent
So we give labels to sequences of dependent of each other so labels are
decisions given to each decision.
Example: Object
Example: Chess game recognition
Ques5. List Various Applications of Reinforcement Learning
Ans. 1. Rocket engineering – Explore how reinforcement learning is used in the
field of rocket engine development. You’ll find a lot of valuable information on the use
of machine learning in manufacturing industries. See why reinforcement learning is
favored over other machine learning algorithms when it comes to manufacturing
rocket engines.
2. Traffic Light Control – This site provides multiple research papers and project
examples that highlight the use of core reinforcement learning and deep
reinforcement learning in traffic light control. It has tutorials, datasets, and relevant
example papers that use RL as a backbone so that you can make a new finding of
your own.
3. Marketing and advertising – See how to make an AI system learn from a pre-
existing dataset which may be infeasible or unavailable, and how to make AI learn in
real-time by creating advertising content. This is where they have made use of
reinforcement learning.
4. Reinforcement Learning in Marketing | by Deepthi A R – This example focuses

on the changing business dynamics to which marketers need to adapt. The AI
equipped with a reinforcement learning scheme can learn from real-time changes
and help devise a proper marketing strategy. This article highlights the changing
business environment as a problem and reinforcement learning as a solution to it.
5. Robotics – This video demonstrates the use of reinforcement learning in robotics.

The aim is to show the implementation of autonomous reinforcement learning agents
for robotics. A prime example of using reinforcement learning in robotics.
6. Recommendation – Recommendation systems are widely used in eCommerce

and business sites for product advertisement. There’s always a recommendation
section displayed in many popular platforms such as YouTube, Google, etc. The
ability of AI to learn from real-time user interactions, and then suggest them content,
would not have been possible without reinforcement learning. This article shows the
use of reinforcement learning algorithms and practical implementations in
recommendation systems.
7. Healthcare – Healthcare is a huge industry with many state-of-the-art

technologies bound to it, where the use of AI is not new. The main question here is
how to optimize AI in healthcare, and make it learn based on real-time experiences.
This is where reinforcement learning comes in. Reinforcement learning has
undeniable value for healthcare, with its ability to regulate ultimate behaviors. With
RL, healthcare systems can provide more detailed and accurate treatment at
reduced costs.
8. NLP – This article shows the use of reinforcement learning in combination with
Natural Language Processing to beat a question and answer adventure game. This
example might be an inspiration for learners engaged in Natural Language
Processing and gaming solutions.
9. Trading – Deep reinforcement learning is a force to reckon with when it comes to

the stock trading market. The example here demonstrates how deep reinforcement
learning techniques can be used to analyze the stock trading market, and provide
proper investment reports. Only an AI equipped with reinforcement learning can
provide accurate stock market reports.
Long Answers
Q1. What is Q function write an algorithm for learning Q?
Ans. Let’s say that a robot has to cross a maze and reach the end point. There
are mines, and the robot can only move one tile at a time. If the robot steps onto a
mine, the robot is dead. The robot has to reach the end point in the shortest time
possible.
The scoring/reward system is as below:
1. The robot loses 1 point at each step. This is done so that the robot takes the
shortest path and reaches the goal as fast as possible.
2. If the robot steps on a mine, the point loss is 100 and the game ends.
3. If the robot gets power ⚡️, it gains 1 point.
4. If the robot reaches the end goal, the robot gets 100 points.
Now, the obvious question is: How do we train a robot to reach the end goal with
the shortest path without stepping on a mine?
So, how do we solve this?
Introducing the Q-Table

Q-Table is just a fancy name for a simple lookup table where we calculate the
maximum expected future rewards for action at each state. Basically, this table will
guide us to the best action at each state.
There will be four numbers of actions at each non-edge tile. When a robot is at a
state it can either move up or down or right or left.
So, let’s model this environment in our Q-Table.
In the Q-Table, the columns are the actions and the rows are the states.
Each Q-table score will be the maximum expected future reward that the robot will
get if it takes that action at that state. This is an iterative process, as we need to
improve the Q-Table at each iteration.
But the questions are:
 How do we calculate the values of the Q-table?

 Are the values available or predefined?
To learn each value of the Q-table, we use the Q-Learning algorithm.
Mathematics: the Q-Learning algorithm
Q-function
The Q-function uses the Bellman equation and takes two inputs: state (s) and
action (a).
Using the above function, we get the values of Q for the cells in the table.
When we start, all the values in the Q-table are zeros.
There is an iterative process of updating the values. As we start to explore the

environment, the Q-function gives us better and better approximations by
continuously updating the Q-values in the table.
Now, let’s understand how the updating takes place.
Introducing the Q-learning algorithm process

Each of the colored boxes is one step. Let’s understand each of these steps in detail.
Step 1: initialize the Q-Table

We will first build a Q-table. There are n columns, where n= number of actions.
There are m rows, where m= number of states. We will initialise the values at 0.
In our robot example, we have four actions (a=4) and five states (s=5). So we will
build a table with four columns and five rows.
Steps 2 and 3: choose and perform an action

This combination of steps is done for an undefined amount of time. This means that
this step runs until the time we stop the training, or the training loop stops as defined
in the code.
We will choose an action (a) in the state (s) based on the Q-Table. But, as
mentioned earlier, when the episode initially starts, every Q-value is 0.
So now the concept of exploration and exploitation trade-off comes into play. This
article has more details.
We’ll use something called the epsilon greedy strategy.
In the beginning, the epsilon rates will be higher. The robot will explore the
environment and randomly choose actions. The logic behind this is that the robot
does not know anything about the environment.
As the robot explores the environment, the epsilon rate decreases and the robot
starts to exploit the environment.
During the process of exploration, the robot progressively becomes more confident
in estimating the Q-values.
For the robot example, there are four actions to choose from: up, down, left, and
right. We are starting the training now — our robot knows nothing about the
environment. So the robot chooses a random action, say right.
We can now update the Q-values for being at the start and moving right using the
Bellman equation.
Steps 4 and 5: evaluate

Now we have taken an action and observed an outcome and reward.We need to
update the function Q(s,a).
In the case of the robot game, to reiterate the scoring/reward structure is:
 power = +1
 mine = -100
 end = +100
We will repeat this again and again until the learning is stopped. In this way the Q-
Table will be updated.
Ques2. What is Deep Reinforcement Learning and Autoencoder Architecture.
Explain Face recognition Application.
Ans. Deep reinforcement learning combines artificial neural networks with a

reinforcement learning architecture that enables software-defined agents to learn the
best actions possible in virtual environment in order to attain their goals. Deep
reinforcement learning is the combination of reinforcement learning (RL) and deep
learning. This field of research has been able to solve a wide range of complex
decision making tasks that were previously out of reach for a machine. Thus, deep
RL opens up many new applications in domains such as healthcare, robotics, smart
grids, finance, and many more. This manuscript provides an introduction to deep
reinforcement learning models, algorithms and techniques. Particular focus is on the
aspects related to generalization and how deep RL can be used for practical
applications. We assume the reader is familiar with basic machine learning concepts.
Autoencoder Architecture
An autoencoder is a neural network architecture capable of discovering structure

within data in order to develop a compressed representation of the input. ... Because
autoencoders learn how to compress the data based on attributes .Auto encoders
are an unsupervised learning technique in which we leverage neural networks for the
task of representation learning. Specifically, we'll design a neural network
architecture such that we impose a bottleneck in the network which forces a
compressed knowledge representation of the original input. If the input features were
each independent of one another, this compression and subsequent reconstruction
would be a very difficult task. However, if some sort of structure exists in the data (ie.
correlations between input features), this structure can be learned and consequently
leveraged when forcing the input through the network's bottleneck.
- Facial recognition is being used in many businesses
You’re used to unlocking your door with a key, but maybe not with your face. As
strange as itsounds, our physical appearances can now verify payments, grant
access and improve existing security systems. Protecting physical and digital
possessions is a universal concernwhich benefits everyone, unless you’re a
cybercriminal or a kleptomaniac of course. Facialbiometrics are gradually being
applied to more industries, disrupting design, manufacturing,construction, law
enforcement and healthcare. How is facial recognition software affecting these
different sectors, and who are the companies and organisations behind its
development?
1. Payments
It doesn’t take a genius to work out why businesses want payments to be easy.
Onlines hopping and contactless cards are just two examples that demonstrate the
seamlessness of postmodern purchases. With FaceTech, however, customers
wouldn’t even need their cards. In 2016, MasterCard launched a new selfie pay app
called MasterCard Identity Check. Customers open the app to confirm a payment
using their camera, and that’s that. Facial recognition is already used in store and at
ATMs, but the next step is to do the same for online payments. Chinese ecommerce
firm Alibaba and affiliate payment software Alipay are planning to apply the software
to purchases made over the Internet.
2. Access and security
As well as verifying a payment, facial biometrics can be integrated with physical

devices and objects. Instead of using passcodes, mobile phones and other
consumer electronics will be accessed via owners’ facial features. Apple, Samsung
and Xiaomi Corp.have all installed FaceTech in their phones. This is only a small
scale example, though. In future, it looks like consumers will be able to get into their
cars, houses, and other secure physical locations simply by looking at them. Jaguar
is already working on walking gait ID – a potential parallel to facial recognition
technology. Other corporations are likely to take advantage of this, too. Innovative
facial security could be especially useful for a company or organisation that handles
sensitive data and needs to keep tight controls on who enters their facilities.
Ques3. What are reinforcement learning models?
Ans. Machines learn differently than people. For instance, you probably didn’t learn
the difference between a positive and a negative movie review by analyzing tens of
thousands of labeled examples of each. There is, however, a specific subfield of
machine learning that bears a striking resemblance to aspects of how we learn.
Reinforcement learning (RL) is a field that’s been around for a few decades. Lately,
it’s been picking up steam thanks to its integration of deep neural networks (deep
reinforcement learning) and the newsworthy successes it’s accumulated as a result.
At its core though, RL is concerned with how to go about making decisions and taking
sequential actions in a specific environment to maximize a reward. Or, to put a more
personal spin on it, what steps should you take to get promoted at your job, or to
improve your fitness, or to save money to buy a house? We tend to figure out an
optimal approach to accomplish goals like these through some degree of trial and
error, evolving our strategies based on feedback from our environment.
At a basic level, RL works in much the same way. Of course, backed by computing
power, it can explore different strategies (or “policies” in the RL literature) much faster
than we can, often with pretty impressive results (especially for simple environments).
On the other hand, lacking the prior knowledge that humans bring to new situations
and environments, RL approaches also tend to need to explore many more policies
than a human would before finding an optimal one.
As reinforcement learning is a broad field, let’s focus on one specific aspect: model-
based reinforcement learning. As we’ll see, model-based RL attempts to overcome
the issue of a lack of prior knowledge by enabling the agent — whether this agent
happens to be a robot in the real world, an avatar in a virtual one, or just a piece
software that take actions — to construct a functional representation of its
environment.
While model-based reinforcement learning may not have clear commercial
applications at this stage, its potential impact is enormous. After all, as AI becomes
more complex and adaptive — extending beyond a focus on classification and
representation toward more human-centered capabilities — model-based RL will
almost certainly play an essential role in shaping these frontiers.
“The next big step forward in AI will be systems that actually understand their worlds.
The world is only accessed through the lens of experience, so to understand the
world means to be able to predict and control your experience, your sense data, with
some accuracy and flexibility. In other words, understanding means forming a
predictive model of the world and using it to get what you want. This is model-based
reinforcement learning.”
To Model or Not to Model
“Model” is one of those terms that gets thrown around a lot in machine learning (and
in scientific disciplines more generally), often with a relatively vague explanation of
what we mean. Fortunately, in reinforcement learning, a model has a very specific
meaning: it refers to the different dynamic states of an environment and how these
states lead to a reward.
Model-based RL entails constructing such a model. Model-free RL, conversely,

forgoes this environmental information and only concerns itself with determining what
action to take given a specific state. As a result, model-based RL tends to emphasize
planning, whereas model-free RL tends to emphasize learning (that said, a lot of
learning also goes on in model-based RL). The distinction between these two
approaches can seem a bit abstract, so let’s consider a real-world analogy.
Imagine you’re visiting a city that you’ve never been to before and for whatever
reason you don’t have access to a map. You know the general direction from your
hotel to the area where most of the sights of interest are, but there are quite a number
of different possible routes, some of which lead you through a slightly dangerous
neighborhood.
One navigational option is to keep track of all the routes you’ve taken (and the
different streets and landmarks that make up these routes) to begin to create a map
of the area. This map would be incomplete (it would only rely on where you’d already
walked), but would at least allow you to plan a course ahead of time to avoid that
neighborhood while still optimizing for the most direct route. You could even spend
time back in your hotel room drawing out the different possible itineraries on a sheet
of paper and trying to gauge which one seems like the best overall option. You can
think of this as a model-based approach.
Another option — especially if you’re the type of person who’s not big on planning —
would simply be to keep track of the different locations you’d visited (intersections,
parks, and squares for instance) and the actions you took (which way you turned), but
ignore the details of the routes themselves. In this case, whenever you found yourself
in a location you’d already visited, you could favor the directional choice that led to a
good outcome (avoiding the dangerous neighborhood and arriving at your destination
more efficiently) over the directions that led to a negative outcome. You wouldn’t
specifically know the next location you’d arrive at with each decision, but you would at
least have learned a simple procedure for what action to take given a specific
location. This is essentially the approach that model-free RL takes.
As it relates to specific RL terms and concepts, we can say that you, the urban
navigator, are the agent; that the different locations at which you need to make a
directional decision are the states; and that the direction you choose to take from
these states are the actions. The rewards (the feedback based on the agent’s
actions) would most likely be positive anytime an action both got you closer to your
destination and avoided the dangerous neighborhood, zero if you avoided the
neighborhood but failed to get closer to your destination, and negative anytime you
failed to avoid the neighborhood. The policy is whatever strategy you use to
determine what action/direction to take based on your current state/location. Finally,
the value is the expected long-term return (the sum of all your current and future
rewards) based on your current state and policy.
In general, the core function of RL algorithms is to determine a policy that maximizes

this long-term return, though there are a variety of different methods and
algorithms to accomplish this. And again, the major difference between model-based
and model-free RL is simply that the former incorporates a model of the agent’s
environment, specifically one that influences how the agent’s overall policy is
determined.
MCQ
1._______ is an area of Machine Learning in which about taking suitable action to

maximize reward in a particular situation.
A) Supervised learning
B) unsupervised learning
C) Reinforcement learning
D) None of these
ANSWER= C) Reinforcement learning
2._______is all about making decisions sequentially
D) None of these
3.In_________ output depends on the state of the current input and the next input
depends on the output of the previous input.
D) None of these
4._________Reinforcement is defined as when an event, occurs due to a particular

behavior.
A) negetive
B) positive
C) neutral
D) None of these
ANSWER= B) positive
5.There are _______ types of reinforcement.

A) 3
B) 2
C) 4
D) None of these
ANSWER= B) 2
6.Which of the following is not Advantages of reinforcement learning?
A) Maximizes Performance
B) Sustain Change for a long period of time
C) Too much Reinforcement can lead to overload of states which can
diminish the results
D) None of these
ANSWER= C) Too much Reinforcement can lead to overload of states which can
diminish the results
7.Reinforcement learning is one of ______ basic machine learning paradigms
A) 5
B) 4
C) 2
D) 3
ANSWER= D) 3
8.________is a type of Machine Learning paradigms in which a learning algorithm is

trained not on preset data but rather based on a feedback system.
B) Unsupervised learning
C) Reinforcement Learning
D) None of the above
ANSWER= C) Reinforcement Learning
In _________ model keeps on increasing its performance using a Reward Feedback

to learn the behavior or pattern
A) clustering
B) reinforcement learning
C) semi supervised
D) reinforcement
ANSWER= B) reinforcement learning
A ________ problem is when the output variable is a real value.
A) regression
C) semi supervised
D) classification
______is Computationally complex.
A) Unsupervised learning
C) semi supervised
D) classification
ANSWER= A) Unsupervised learning
_____processes all the training examples for each iteration of gradient descent.
A) Stochastic Gradient Descent

B) Batch Gradient Descent
C) Mini Batch gradient descent
ANSWER= B) Batch Gradient Descent
Gradient Descent is an optimization algorithm used for,
A) Certain Changes in algorithm

B) minimizing the cost function in various machine learning algorithms
C) maximizing the cost function in various machine learning algorithms
D) remaining same the cost function in various machine learning
algorithms
ANSWER= B) minimizing the cost function in various machine learning algorithms
There are how many types of Gradient Descent?

A) 4
B) 3
C) 2
D) 1
ANSWER= B) 3
_____is a type of gradient descent which processes 1 training example per iteration.
A) Batch Gradient Descent

B) Stochastic Gradient Descent
D) none of these
ANSWER= B) Stochastic Gradient Descent
Which is the fastest gradient descent?

D) none of these
ANSWER= C) Mini Batch gradient descent
Which is quite faster than batch gradient descent?

D) none of the above
ANSWER= B) Stochastic Gradient Descent
Which Gradient descent works for larger training examples and that too with lesser
number of iterations.?

ANSWER= C) Mini Batch gradient descent
______, the algorithm follows a straight path towards the minimum.

ANSWER= A) Batch Gradient Descent
If the cost function is convex, then it converges to a _____
A) global maximum
B) global minimum
C) local minimum
D) local maximum
ANSWER= B) global minimum
Bootstrap and Aggregation, commonly known as_______
A) Information Gain
B) bagging
C) Entropy
D) none of these
ANSWER=B) bagging
Random Forest has _________ as base learning models
A) multiple decision trees

B) bagging
C) Entropy
D) none of these
ANSWER=A) multiple decision trees
________helps improve machine learning results by combining several models.
A) Machine Learning
B) bagging
C) Entropy
D) Ensemble learning
ANSWER=D) Ensemble learning
In voting classifier which of the following does not exist?
A) hard voting
B) soft voting
C) both A and B
D) None of these
ANSWER=D) None of these
In ________the predicted output class is a class with the highest majority of votes
A) hard voting
B) soft voting
C) both A and B
D) None of these
ANSWER= A) hard voting
In ________the output class is the prediction based on the average of probability

given to that class.
A) hard voting
B) soft voting
C) both A and B
D) None of these
ANSWER= B) soft voting

Case Study
Online Education Company Improves Customer Support with Autosuggestion
of Macros
Problem
Magoosh’s support staff comprises two teams of 50 agents: A community support

team for handling account inquiries, and remote tutors to handle in-depth questions
for specific tests. Magoosh uses Zendesk to handle their customer support requests.
It has over 900 macros on ZenDesk, which are pre-written, standard responses to
common questions asked by the company’s customers.
The support staff found it difficult to search or discover these macros for offering
timely customer help, which they believed to be negatively affecting their customer
satisfaction scores on responses to questions about standardized tests. Part of this
searchability problem was the enormous number of macros which took a lot of time
to search through and manage.
Actions Taken
The DigitalGenius AI Platform was integrated with Magoosh’s Zendesk console.

DigitalGenius trained a deep neural network to analyze incoming customer inquiries
based on historical customer logs – learning how Magoosh’s support staff replied to
various incoming inquiries.
DigitalGenius then automatically suggested the most relevant macros for new
customer inquiries so the support team member does not spend time searching for
macros or manually composing new responses to common customer inquiries.
DigitalGenius claims that its AI Platform achieves this automatic macro suggestion
by using deep learning models to extract the meaning and context of incoming
inquiries and predicting the expected response. In addition, the platform has a
historical response search feature, which the support staff can access.
When asked about this historical response search feature, …. Of DigitalGenius told
us:
“The historical responses feature looked for historical tickets in which customers
asked similar questions to the one agents were working on. We built the search
ourselves, using our own search algorithms. And beyond that, we have a different
UI to the Zendesk search, including the ability for historical response searches to
take place in the app sidebar, so agents don’t have to navigate away from the
page.
The coolest feature is that we prioritized historical tickets that had the highest
CSAT. When agents searched from historical responses, we displayed to them
whether that ticket got a high or low CSAT rating….so we think our feature
promoted the best answers.”
The platform also reportedly predicts the relevant metadata about the case, such as
tags, inquiry type, priority and other case details. With this information, it is able to
analyze and route cases to the appropriate team. For example, if the incoming query
is an account inquiry, the platform routes the request to the community support team,
and for in-depth educational queries, it routes the requests to remote tutors –
eliminating the need for a “human filter” to handle all tickets.
A
screenshot from Zendesk’s “apps support” page for DigitalGenius. The full set of
screenshots and integration details can be found
at: https://www.zendesk.com/apps/support/digitalgenius/
Results
According to DigitalGenius, about 83% of all customer tickets are supported by the
Digital Genius platform integrated with Magoosh’s Zendesk. The company also
claims a 92% accuracy in case tag predictions (tags are used within Zendesk for
case categorization – for example, “refunds” might be a tag for that particular kind of
customer issue). This improvement happened over an initial 6-month period with
Magoosh, which Juan describes as a “learning segment” – stating that new and
updated projects and underway with Magoosh now.
Asked for clarification on what it means to have 83% of messages “supported” by
DigitalGenius, Juan replied:
“Supported in this context means that DigitalGenius AI has assisted Magoosh with
83% of their tickets – whether this is classifying them, suggesting the right macro
or automating a response. The remaining 17% escaped our current AI capabilities,
and had be dealt with manually in order to provide the best possible answer and
avoid a potential wrong answer to a customer.”
Case Study
Face Recognition Time Attendance Management System

Client
The client is a US health center. Since the COVID-19 outbreak, the center has been
bursting past capacity. To combat the virus, they decided to go for biometric face
recognition time attendance software.
Challenge: prevent virus spread through touched surfaces
The COVID-19 pandemic has amplified deficiencies of the client’s health center. To
combat the virus, they have enhanced precautions and provided the front-line care
team with personal protective equipment (PPE). Cleaning and disinfecting touched
surfaces lowered the chance of the virus spreading but didn’t solve the problem.
In this case, biometric touchless authentication could become a solution. Changing

the way healthcare workers clock in and out could decrease the spread of the virus.
They needed a solution that would allow masked face recognition.
To get a consultation on facial recognition time clock software, they contacted InData
Labs, an AI and facial recognition service provider.
Solution: face recognition time attendance management system against the COVID-
19
The client emphasized that they needed a real-time smart attendance system using
face recognition techniques. The key focus should be on masked face recognition
because the healthcare team at the center is required to wear masks.
To start, we researched masked face datasets and 80+ of open-source solutions

related to face mask detection.
Our team of engineers defined the workflow of the future face recognition time
attendance software:
Then, we collected 800+ health center employees’ images, sorted and labelled them
with names. The camera at the center’s entrance captured face data and sent it to
the server for image processing – detection, encoding, and recognition.
Mask Face Detection
Our team researched the latest studies and decided to use AI and ML for face mask
detection and recognition.
We trained our solution to recognize face attributes like:
 Face (shape: round-squared-oval)

 Forehead (low-average-high)
 Eyes (setting, eye centers, interpupillary distance)
 Unique facial features (wrinkles, facial hair, etc.)
 Detecting and recognizing these facial features, the system can verify the
employee’s identity with mask on.
How the system works:
 An employee tries to enter the facility with their face

 Face gets detected and recognized
 If it’s a match, the employee gets in, if it’s not, the security team gets notified
about the unwanted person on the premises
 Time clock entry added into the system
Within the beta-testing period, we were enhancing face recognition accuracy by

experimenting with different types of masks, wrong mask-wearing, bad lighting, and
multiple people in the frame.
As a result, we’ve achieved a 91% accuracy in masked face detection.
Result: a decline in coronavirus cases among employees

Our solution is a custom face recognition attendance system with mask detection
feature. It allows for touchless authentication and serves the health center amid the
coronavirus crisis. Thanks to that, employees wearing masks can easily travel in and
out of facilities. The solution also provides clock in/out time entry.
Benefits of a Real-Time Face Recognition Time Attendance System:

Ai and Machine Learning For Business

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ai and Machine Learning For Business

Uploaded by

Copyright:

Available Formats

Unit 1

Very Short Answers

Ans. Data is a single piece of information. It may consist of numbers, word as

Q.2. What do you mean by information?

Q.3. What are the features of information?

Ans. The features of information include:

7. Understandability, 8. Completeness, 9. Economical.

Q.4. Define the term-information system.

Ans. Information system is a set of people, procedures and resources that collects,

What do you mean by artificial Intelligence?

Ans. Artificial intelligence is a field of science and technology science, biology,

Q.5. What are neural networks?

Q.6. What do you mean by fuzzy logical control system?

Ans. Fuzzy logical control system is a vary off reaching

Information conclusions based on ambiguous or vague7 of reaching conclusions based on

Q.8. What is genetic algorithm?

Q.9. What is artificial neural network?

Ans. The artificial neural network is designed by programming computers to behave

Q.10. What exactly is machine learning?

Q.11. How does Netflix use machine learning?

Ans. Netflix uses an ML technology called a “recommendation engine” to suggest

Q.12. What is meant by KDD?

Ans. Knowledge discovery in databases (KDD) is the process of discovering useful

Q.1. What do you understand by Information?

Q.2. What is the difference between data and Information?

Ans. Difference between Data and Information

Data is generally disorganized and Information is properly arranged, classified

3. Data is raw in form. Information is in finished form.

4. Data cant be understood by users. Information can be understood by users.

Data does not depend upon

Q.3. What is the difference between knowledge representation and knowledge

Ans. Knowledge Representation in AI describes the representation of knowledge.

The different kinds of knowledge that need to be represented in AI include:

Knowledge acquisition is the gathering or collecting knowledge from various sources. It

Knowledge Acquisition Techniques

knowledge acquisition techniques. They are:

a) Diagram Based Techniques

b) Matrix Based Techniques

e) Protocol Generation Techniques

Q.4. Discuss some advantages and disadvantages of neural networks.

Ans. The following are some of the advantages of neural networks:

 Once trained, the predictions are pretty fast.

 Neural networks work best with more data points.

Let us take a look at some of the cons of neural networks:

 It is computationally very expensive and time consuming to train with traditional

Q.5. What exactly is machine learning?

Q.6. How are decision trees used in business?

Ans. A decision tree is a mathematical model used to help managers make decisions.

 A decision tree uses estimates and probabilities to calculate likely outcomes.

These probabilities are particularly important to the outcome of a decision tree.

 The percentage chance or possibility that an event will occur

 Ranges between 1 (100%) and 0

The financial value of an outcome calculated by multiplying the estimated financial effect

The value to be gained from taking a decision.

BENEFITS OF USING DECISION TREES

 Choices are set out in a logical way

 Potential options & choices are considered at the same time

 Use of probabilities enables the “risk” of the options to be addressed

 Likely costs are considered as well as potential benefits

 Easy to understand & tangible results

DRAWBACKS OF USING DECISION TREES

 Probabilities are just estimates – always prone to error

 Uses quantitative data only – ignores qualitative aspects of decisions