Professional Documents
Culture Documents
KIoT, Kombolcha
yonas.berhanu@wu.edu.et
Video-to-Video Translation with Global Temporal Consistency [5] some of the review papers provide overviews of the state-of-the-art
by Wei et al. further extend the optical flow frame warping GANs, others focused on GANs for a specific domain (e.g. image
network, the authors present a mechanism focusing on the video- generation). Reviews of GANs for general visual image datasets
level consistency by residual error based on two-channel exceed other specialized GANs reviews, including those in
discriminator to minimize the total Mean absolute (L1) distance cybersecurity, anomaly detection and medical imaging (see
between the optical flow map of consecutive frames eventually this below). As shown in Figure 1, the red diamonds that represent
approach failed on longer video and result failed in fast motion general reviews of GANs dominate other categories.
Temporal Temporal
Evaluation
Architecture information constraint Limitation
Related papers Dataset metrics used
modeling. applied on
Unsupervised Video-to- Volumetric 3D Cycle-GAN The network - Human evaluation, 3D tensor fails for
Video Translation [8] MNIST, GTA implicitly learns pixel accuracy, and temporal learning
from input video L2 error between consistency
segment to video and
(3D-Conv- net) original and between frames,
MRI-to-CT
retranslated image fixed-length video.
Video-to-Video Translation DAVIS 2017 RNN based Optical flow, Generator and Peak Signal to Complex
with Global Temporal Cycle-GAN, and RNN temporal residual Discriminator Noise Ratio, architecture hard to train.
Consistency[5] based error minimizer Network Inappropriate for in videos
Region Similarity,
Discriminator for global contain fast object
and Contour
temporal consistency
Accuracy motion. Not work for long
videos.
MoCycle-GAN: Unpaired Flower video and Cycle-GAN with Optical flow with Generator Human Explicit motion
Video- to-Video viper dataset motion translator- motion translator Network evaluation, IoU, translator, and no
Translation based motion network pixel accuracy, content translation
MoCycle-GAN [4] cycle consistency Average class
accuracy
Recycle-GAN: Viper, face, and Cycle-GAN with Recurrent Generator Human Temporal
Unsupervised flower datasets recurrent temporal Network evaluation, IoU, predictor fails to
Video (more than temporal predictor pixel accuracy, correctly predict, and
10,000 images) predictor (Pix2Pix) Average class no content translation
Retargeting:
accuracy, IS
ReCycle-GAN [6]
Viper dataset Cycle-GAN with flow optical flow base Generator mIoU, fwIoU, Input domain
Preserving estimator temporal fuse Network, Use and videos shall have very
Semantic and Temporal network and [43] to further pixel accuracy similar
Consistency for Unpaired with spatial for
consistency reduce the content.
Video- to-Video improving
warping network Temporal
Translation [7] occlusions
warping error.
problem
10.1145/3343031.3350937.
IV. Conclusion
[5] X. Wei, S. Feng, J. Zhu, and H. Su, “Video-to-video translation
This paper aims to provide a comparative analysis along with the
with global temporal consistency,” MM 2018 - Proc. 2018 ACM
video translation domains, how to improve GAN network stability
Multimed. Conf., pp. 18–25, 2018, doi:
and discuss how to improve network cost function to enhance
10.1145/3240508.3240708.
generated video. So Generative models such as GANs provide
promising results in multiple domains including images, videos, [6] A. Bansal, S. Ma, D. Ramanan, and Y. Sheikh, “Recycle-GAN:
audios and texts. Video synthesis is still in the early stages Unsupervised Video Retargeting,” CoRR, vol. abs/1808.0, 2018,
compared to other domains such as images. [Online]. Available: http://arxiv.org/abs/1808.05174.
[4] Y. Chen, Y. Pan, T. Yao, X. Tian, and T. Mei, “Mocycle-GAN: [10] Y. Hong, U. Hwang, J. Yoo, and S. Yoon, "How Generative
Unpaired video-to-video translation,” MM 2019 - Proc. 27th ACM Adversarial Networks and Their Variants Work: An Overview,"
Int. Conf. Multimed., pp. 647–655, Aug. 2019, doi: ACM Computing Surveys (CSUR), vol. 52, no. 1, p. 10, 2019.