Professional Documents
Culture Documents
Input: The input should be an initial state from which the model will start
Output: There are many possible outputs as there are a variety of solutions to a
particular problem
Training: The training is based upon the input, The model will return a state and the
user will decide to reward or punish the model based on its output.
The model keeps continues to learn.
The best solution is decided based on the maximum reward.
Difference between Reinforcement learning and Supervised learning:
Reinforcement learning Supervised learning
Reinforcement learning is all about In Supervised learning, the decision is
making decisions sequentially. In simple made on the initial input or the input
words, we can say that the output given at the start
depends on the state of the current input
and the next input depends on the output
of the previous input
Types of Reinforcement:
There are two types of Reinforcement:
1. Positive –
Positive Reinforcement is defined as when an event, occurs due to a particular
behaviour, increases the strength and the frequency of the behaviour. In other
words, it has a positive effect on behaviour.
2. Negative –
Negative Reinforcement is defined as strengthening of behaviour because a
negative condition is stopped or avoided.
Advantages of reinforcement learning:
Increases Behaviour
Provide defiance to a minimum standard of performance
It Only provides enough to meet up the minimum behaviour
Reinforcement learning Supervised learning
Reinforcement learning is all about In Supervised learning, the decision is
making decisions sequentially. In simple made on the initial input or the input
words, we can say that the output given at the start
depends on the state of the current input
and the next input depends on the output
of the previous input
1. Policy ():
a. It defines the behaviour of the agent which action to take in a given state
to maximize the received reward in the long term.
b. It stimulus-response rules or associations.
c. It could be a simple lookup table or function, or need more extensive
computation (for example, search).
d. It can be probabilistic.
4. State (s): State refers to the current situation returned by the environment.
5. Policy (): It is a strategy which applies by the agent to decide the next action based
on the current state.
6. Value (V): It is expected long-term return with discount, as compared to the short-
term reward.
7. Value Function: It specifies the value of a state that is the total amount of reward.
It is an agent which should be expected beginning from that state.
8. Model of the environment: This mimics the behavior of the environment. It helps
you to make inferences to be made and also determine how the environment will
behave.
10. Q value or action value (Q): Q value is quite similar to value. The only difference
between the two is that it takes an additional parameter as a current action.
1. Value-Based:
a. In a value-based reinforcement learning method, we should try to maximize a value
function V(s). In this method, the agent is expecting a long-term return of the current
states under policy .
2. Policy-based:
a. In a policy-based RL method, we try to come up with such a policy that the action
performed in every state helps you to gain maximum reward in the future.
b. Two types of policy-based methods are :
i. Deterministic : For any state, the same action is produced by the policy .
ii. Stochastic : Every action has a certain probability, which is determined by the
following equation stochastic policy : n(a/s) = P/A = a/S = S
3. Model-Based :
a. In this Reinforcement Learning method, we need to create a virtual model for each
environment.
b. The agent learns to perform in that specific environment.
Q-learning.
3. Whereas the other type, policy-based estimates the value function with a greedy
policy obtained from the last policy improvement.
4. Q-learning is an off-policy learner i.e., it learns the value of the optimal policy
independently of the agent’s actions.
5. On the other hand, an on-policy learner learns the value of the policy being carried
out by the agent, including the exploration steps and it will find a policy that is optimal,
taking into account the exploration inherent in the policy.
Step 1:
Initialize the Q-table : First the Q-table has to be built. There are n columns, where n =
number of actions. There are m rows, where m = number of states. In our example n =
Go left, Go right, Go up and Go down and m = Start, Idle, Correct path, Wrong path
and End. First, lets initialize the value at 0.
Step 4: Measure reward: Now we have taken an action and observed an outcome and
reward.
Step 5: Evaluate: We need to update the function Q(s, a) This process is repeated again
and again until the learning is stopped. In this way the Q-table is been updated and the
value function Q is maximized. Here the Q returns the expected future reward of that
action at that state.
deep Q-learning.
1. The process of Q-Learning creates an exact matrix for the working agent which it
can “refer to” to maximize its reward in the long run. Although this approach is
not wrong in itself, this is only practical for very small environments and quickly
loses its feasibility when the number of states and actions in the environment
increases.
2. The solution for the above problem comes from the realization that the values in
the matrix only have relative importance ie the values only have importance with
respect to the other values. Thus, this thinking leads us to Deep Q-
Learning which uses a deep neural network to approximate the values. This
approximation of values does not hurt as long as the relative importance is
preserved.
3. The basic working step for Deep Q-Learning is that the initial state is fed into
the neural network and it returns the Q-value of all possible actions as on output.
4. The difference between Q-Learning and Deep Q-Learning can be illustrated as
follows:-
Steps involved in reinforcement learning using deep Q-learning networks:
Genetic algorithm.
Genetic Algorithms(GAs) are adaptive heuristic search algorithms that belong to the
larger part of evolutionary algorithms. Genetic algorithms are based on the ideas of
natural selection and genetics. These are intelligent exploitation of random search
provided with historical data to direct the search into the region of better
performance in solution space. They are commonly used to generate high-quality
solutions for optimization problems and search problems.
Genetic algorithms simulate the process of natural selection which means those
species who can adapt to changes in their environment are able to survive and
reproduce and go to next generation. In simple words, they simulate “survival of the
fittest” among individual of consecutive generation for solving a problem. Each
generation consist of a population of individuals and each individual represents a
point in search space and possible solution. Each individual is represented as a
string of character/integer/float/bits. This string is analogous to the Chromosome.
parameters that forms the solution is the chromosome. Therefore, the population is a
collection of chromosomes.
Chromosomes are often depicted in binary as 0’s and 1’s, but other encodings are also
possible.
2. Fitness Function
We have to select the best ones to reproduce offspring out of the available
The fitness score helps to select the individuals who will be used for reproduction.
3. Selection
This phase’s main goal is to find the region where getting the best solution is more.
Thus, Fitness proportionate selection is used, also known as roulette wheel selection,
recombination solutions.
4. Reproduction
Generation of offsprings happen in 2 ways:
Crossover
Mutation
a) Crossover
Crossover is the most vital stage in the genetic algorithm. During crossover, a random point
and designated a ‘crossover point’. Bits to the right of that point are exchanged
Two-Point Crossover: Two crossover points are picked randomly from the parent
chromosomes. The bits in between the two points are swapped between the parent
organisms.
Uniform Crossover: In a uniform crossover, typically, each bit is chosen from either
b) Mutation
In a few new offspring formed, some of their genes can be subjected to a low random
probability mutation. This indicates that some of the bits in the bit chromosome can be
flipped. Mutation happens to take care of diversity among the population and stop premature
convergence.
For example, when a trip planner is asked to plan a trip, he would take the help of a genetic
algorithm that reduces the trip’s overall cost and reduces the time.GE is also used for
planning the delivery of products from place to place in the best efficient way.
2. Robotics
The genetic algorithm is widely used in the field of Robotics. Robots differ from one another
by the purpose they are built for. For example, few are built for a cooking task; few are built
In the traditional method, the important features in the dataset are selected using the
following method. i.e., you look at the importance of that model, then will set a
threshold value for the features, and if the feature has an importance value more than
a threshold, it is considered.
We will again start with a chromosome population, where each chromosome will be a
binary string. 1 will denote the “inclusion” of feature in the model, and 0 will denote
accurate our set of chromosomes in predicting value, the more fit it will be.
There are many other applications of genetic algorithms like DNA analysis,
Genetic representations:
1. Encoding:
i) Encoding is a process of representing individual genes.
ii) The process can be performed using bits, numbers, trees, arrays, lists or any other
objects.
iii) The encoding depends mainly on solving the problem.
2. Binary encoding:
i. Binary encoding is the most commonly used method of genetic
representation because GA uses this type of encoding.
ii. In binary encoding, every chromosome is a string of bits, 0 or 1.
iii. Binary encoding gives many possible chromosomes.
Chromosome A 101100101100101011100101
Chromosome B 111111100000110000011111
Introduction to Mutation
In simple terms, mutation may be defined as a small random tweak in the chromosome, to get
a new solution. It is used to maintain and introduce diversity in the genetic population and is
usually applied with a low probability – pm. If the probability is very high, the GA gets reduced
to a random search.
Mutation is the part of the GA which is related to the “exploration” of the search space. It has
been observed that mutation is essential to the convergence of the GA while crossover is not.
1. Mutation Types
In this section, we describe some of the most commonly used mutation operators. Like the
crossover operators, this is not an exhaustive list and the GA designer might find a
combination of these approaches or a problem-specific mutation operator more useful.
2. Bit Flip Mutation
In this bit flip mutation, we select one or more random bits and flip them. This is used for
binary encoded GAs.
3. Random Resetting
Random Resetting is an extension of the bit flip for the integer representation. In this, a
random value from the set of permissible values is assigned to a randomly chosen gene.
4. Swap Mutation
In swap mutation, we select two positions on the chromosome at random, and interchange the
values. This is common in permutation based encodings.
5. Scramble Mutation
Scramble mutation is also popular with permutation representations. In this, from the entire
chromosome, a subset of genes is chosen and their values are scrambled or shuffled randomly.
6. Inversion Mutation
In inversion mutation, we select a subset of genes like in scramble mutation, but instead of
shuffling the subset, we merely invert the entire string in the subset.
Crossover
The crossover operator is analogous to reproduction and biological crossover. In this more
than one parent is selected and one or more off-springs are produced using the genetic material
of the parents. Crossover is usually applied in a GA with a high probability – pc .
Crossover Operators
In this section we will discuss some of the most popularly used crossover operators. It is to be
noted that these crossover operators are very generic and the GA Designer might choose to
implement a problem-specific crossover operator as well.
1. One Point Crossover
In this one-point crossover, a random crossover point is selected and the tails of its two parents
are swapped to get new off-springs.
3. Uniform Crossover
In a uniform crossover, we don’t divide the chromosome into segments, rather we treat each
gene separately. In this, we essentially flip a coin for each chromosome to decide whether or
not it’ll be included in the off-spring. We can also bias the coin to one parent, to have more
genetic material in the child from that parent.
Convergence of genetic algorithm