Professional Documents
Culture Documents
John Doe
Proposal
A huge amount of work has been done on the applications of Reinforcement Learning (RL) on Image based
environments. A typical example could be learning the best policies for Atari games. And most of time,
the policy search takes the whole frame as its input. But a few key things to note here is,
If we can get the model to take account for the relevant part of an image somehow, the convergence for
policy search can happen in much quick fashion. A recent work addresses this issue by doing an object
segmentation from the video frames of an Atari game (Goel, Jameson and Pascal). Mainly,
• Uses unsupervised video object segmentation to segment the moving objects in our frames.
This is helpful because, in most cases, moving objects are one of the most important
aspects of the environment
• Combines the moving object segmentation map with the input image features which are
then taken as input to predict policy and state values (refer to the figure below).
A. Moving objects
B. Static objects
Our proposal is to modify the Static object detection network to make it better. Our idea of doing it is by
introducing an attention model. Attention models (Xu) (T. e. Xu) gained much popularity in object
detection as they can specify a part of the image for the network to look at.
Here, we will be applying it to video frames to get a better and faster object detection. And we hope
that the learning would be much faster. Our proposed model will look like
We will be mostly using Actor-Critic based policy learning and value optimization
Evaluation
Finally, we will be comparing the results of baselines/previous works with this model.
References
Goel, Vikash, Weng Jameson and Poupart Pascal. "Unsupervised video object segmentation for deep
reinforcement learning." Advances in Neural Information Processing Systems. 2018.
Xu, Kelvin, et al. "Show, attend and tell: Neural image caption generation with visual attention."
International conference on machine learning. 2015.
Xu, Tao, et al. "Attngan: Fine-grained text to image generation with attentional generative adversarial
networks." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
2018.