You are on page 1of 6

A.

I Tools Assignment-5
221910401022 | Haneeth

Q1.) Define Machine learning, Deep learning.


Ans: Machine Learning:
Machine Learning - a subset of Artificial Intelligence, is a study of
computer algorithms that improve automatically through
experience. These algorithms build ‘Training Data’ based on
sample data to make predictions or decisions without being
explicitly programmed.
Deep Learning:
Deep learning is a subset of Machine Learning that has networks
capable of learning itself from unstructured data.

Q2.) List the type of learning methods,explain each.


Ans: There are three types of learning methods in machine
learning:
Supervised Learning:
Supervised learning is where you have input variables
and an output variable, and you use an algorithm to
learn the mapping function from the input to the
output. The goal is to approximate the mapping
function so well that when you have new input data
that you can predict the output variables for that data.

Unsupervised Learning:
Unsupervised learning is where you only have input
data (X) and no corresponding output variables.
The goal of unsupervised learning is to model the
underlying structure or distribution in the data to learn
more about the data.

Reinforcement Learning:
Reinforcement Learning is one of the main categories
of machine learning in which an agent learns from trial
and error methods.
The goal is to find the best sequence of actions that will
generate the optimal outcome. The way it works is by
allowing software (agent) to explore, interact, and learn
from the environment.
Q3.) Define Reinforcement learning, policy, value and action value.
Ans: Reinforcement Learning is a type of machine learning where an
agent performs actions, checks results and learns from the environment.
In this, we have two main components: the environment and the agent.

A policy (π) is a probability distribution over actions given states. That


is the likelihood of every action when an agent is in a particular state. It
defines the learning agent’s way of behaviour at a given time.
The policy dictates the actions that the agent takes as a function of its
State and the Environment. It defines the learning agent’s way of
behaviour at a given time.
The value function is the algorithm to determine the value of being in a
state, the probability of receiving a future reward.
An action-value is the set of all possible actions in an environment. The
actions come from a discrete or continuous space.

Q12.) What do you mean by action value function and with an


example.
Ans: Nearly all reinforcement learning algorithms are focused on
estimating value functions—states (or state-action pairs) functions that
estimate how good it is for the agent to be in a given state (or how good
it is to perform a given action in a given state). We call the policy action
value function. The policy action value function 𝜋, denoted as 𝑞𝜋
It tells us how good it is for the agent, when following policy 𝜋, to take
any given action from a given state. In other terms, it gives us the value
in which an event takes place. 𝜋
Example:
In the figure
1. Four states
2. We have 2 possible actions in each state.
[E.g in A: 1) go to B or 2) go to C ]
3. P(s’ | s, a) = (0.9 , 0.1) with 10% we go to a wrong direction
Q13.) How Q learning algorithm helps in RL? Explain with an
example.
Ans: The ‘Q’ in Q-learning stands for quality. Quality here represents
how useful a given action is in gaining some future reward. Let’s say an
agent has to move from a starting point to an ending point along a path
that has obstacles. Agent needs to reach the target in the shortest path
possible without hitting the obstacles and he needs to follow the
boundary covered by the obstacles. For our convenience, I have
introduced this in a customized grid environment as follows.A process is
initialized and the process continues till it will reach the end or will
exceed the training time. Now will check for the expected output.
This process continues till we get the output.
Example:
1. Place an agent in any one of the rooms (0, 1, 2, 3, 4) and the
goal is
to reach outside the building (room 5)
● 5 rooms in a building connected by doors.
● Each room is numbered 0 through 4.
● the outside of the building can be thought of as one big
room (5).
● Doors 1 and 4 lead into the building from room 5 (outside).
2. Represent the rooms on a graph, each room as a node, and
each door
as a link.
3. Associate a reward value to each door.
● Doors that lead directly to the goal have a reward of 100.
● Doors not directly connected to the target room have zero
reward.
● because doors are two-way, two arrows are assigned to
each room.
● each arrow contains an instant reward value.
4. The terminology in Q-Learning includes the terms state and
action:
● Room (including room 5) represents a state
● Agent’s movement from one room to another represents an
action
for example: Agent traverse from room 2 to room 5:
Initial state = state 2
State 2 -> state 3
State 3 -> state (2, 1, 4)
State 4 -> state 5
5. Put the state diagram and the instant reward values into the
reward table, matrix R.
6. Add another matrix Q, representing the memory of what the
agent
has learned through experience.
● the rows of matrix Q represent the current state of the
agent
● Columns represent the possible actions leading to the next
state
● Formula to calculate the Q matrix
Q (state, action) = R (state, action) + Gamma * Max [Q (next
state,all actions)]

Q14.) List the steps involved in Q learning.


Ans:
Steps involved in Q learning are:
1. Set gama parameters and environment rewards in matrix R
2. initialize matrix Q to zero.
3. select a random initial state
4. select initial state = current state
5. select one among all possible actions for the current states.
6. using this possible action, consider going to the next class.
7. get maximum Q value for this next state based on all possible
actions.
8. compute Q (state,action)= R(state, action)+ gamma+
max[Q(next state, all actions)]
9.repeat above steps until current state= goal state.

Q17.) How deep q learning works in RL?


Ans: Deep Q Neural Network takes a stack of frames as an input.

The frames are pre-processed to reduce the complexity, which helps


in reducing the computation time needed for training. Pre-processing
includes cropping, making image grayscale, and reducing the size of
the frame. The four sub-frames are then stacked together.

For each possible action in the given state, a vector of Q-values is


generated by passing frames through its network.

The highest Q-value of the vector will be selected to find the best
action.

Q19.) How RL is applied in following applications: Game playing,


Diagnostic systems, Virtual Assistant. Explain for any two
applications.
Ans: Reinforcement Learning Implementations in Games In most
situations, gaming apps and reinforcement learning run hand in hand.
Apps for gaming are a demanding area that lets you evaluate different
algorithms for reinforcement learning. Several gaming apps make use of
reinforcement training, one of which is the Othello game. In this,
reinforcement learning, along with a variation of the methodology of
function approximation, helps one to avoid simulation problems and
dimensional issues. The agents of reinforcement learning in Othello
learn to play the game without information or strategies provided by
humans. By seeking out all possible routes and in the end, choosing one
with the least number of obstacles, they learn to play the game through
multiple hit-and-trial methods. This way, the agents find the most
effective way to win. This kind of learning is motivated by learning from
children. Digital Personal Assistants Machine Learning Technology A
Virtual Assistant or Digital Assistant is an application software that
understands voice commands in natural language and completes user
tasks. They use Natural Language Processing (NLP) to match user text
or voice input to run commands.

You might also like