PRAGATI ENGINEERING COLLEGE: SURAMPALEM (AUTONOMOUS) IV B.Tech I Semester Regular Examinations, September - 2023 REINFORCEMENT LEARNING CSE (AI&ML) Time: 3 hours Max. Marks: 70 M Answer ONE Question from each Unit All Questions Carry Equal Marks
Q. No. Questions BTL CO Marks
UNIT – I 1. a) Define Reinforcement Learning? Explain with various examples. K1 CO1 7M b) Explain about a k – armed Bandit Problem with an example. K1 CO1 7M OR 2. a) Explain Optimistic Initial Values. Explain Gradient Bandit Algorithm with an example. K4 CO1 7M b) Explain Incremental Implementation. Explain about Tracking a K2 CO1 7M Non – stationary problem. UNIT – II 3. a) Discuss about the Agent – Environment Interface with examples. K3 CO2 7M b) Discuss about various Goals and Rewards with examples. K2 CO2 7M OR 4. a) Define Dynamic Programming. Explain about Policy Evaluation. K2 CO2 7M b) Explain about Value Iteration, Asynchronous Dynamic Programming. K2 CO2 7M UNIT – III 5. a) Define Monte Carlo Prediction. Explain about Monte Carlo Estimation of Action Values with examples. K1 CO3 7M b) Explain about Monte Carlo Control and Monte Carlo Control without Exploring Starts with examples. K2 CO3 7M OR 6. a) Explain a Unifying Algorithm: n – step with an example. K2 CO3 7M b) Explain about Discontinuing – aware importance Sampling with examples. K2 CO3 7M UNIT – IV 7. a) Explain with examples about Off – Policy Divergence. K2 CO4 7M b) Define Semi – gradient Methods and the Deadly Triad with examples. K2 CO4 7M OR 8. a) Explain about the Bellman Error is not learnable. K4 CO4 7M b) Explain about Dutch Traces in (i) Monte Carlo Learning, (ii) Variables with examples. K2 CO4 7M UNIT – V 9. a) Explain Policy Approximation and its advantages. K2 CO5 7M b) Explain about the Policy Gradient Theorem. K2 CO5 7M OR 10. a) Explain about Reinforce – Monte Carlo Policy Gradient with example. K2 CO5 7M b) Discuss about Watson’s Daily Double – Wagering and optimizing Memory Control with examples. K2 CO5 7M