Professional Documents
Culture Documents
SE-DAE: Style-Enhanced Denoising Auto-Encoder For Unsupervised Text Style Transfer
SE-DAE: Style-Enhanced Denoising Auto-Encoder For Unsupervised Text Style Transfer
Abstract—Text style transfer aims to change the style of [6]–[12].However, since these proposed style-transfer models
arXiv:2104.12977v1 [cs.CL] 27 Apr 2021
sentences while preserving the semantic meanings. Due to the (e.g., RL-based and GAN-based models) are composed of
lack of parallel data, the Denoising Auto-Encoder (DAE) is complicated neural networks, it is difficult to converge [13],
widely used in this task to model distributions of different
sentence styles. However, because of the conflict between the [14], which also requires over-complex and laborious training
target of the conventional denoising procedure and the target procedures. Furthermore, the complex training procedures also
of style transfer task, the vanilla DAE can not produce satisfying make it intricate to adapt these models to new style datasets.
enough results. To improve the transferability of the model, most To simplify the training procedure, Lample et al. [13] proposes
of the existing works combine DAE with various complicated training DAE-based models on the pseudo-parallel data, rather
unsupervised networks, which makes the whole system become
over-complex. In this work, we design a novel DAE model than using it in combination with any unsupervised neural
named Style-Enhanced DAE (SE-DAE), which is specifically network. Nevertheless, the pseudo-parallel data generated by
designed for the text style transfer task. Compared with previous back-translation is plagued by severe data-quality problem [3],
complicated style-transfer models, our model do not consist of which adversely affects the performance of the model.
any complicated unsupervised networks, but only relies on the In general, the conventional DAE-based models have the
high-quality pseudo-parallel data generated by a novel data
refinement mechanism. Moreover, to alleviate the conflict between following two conflicts: (1) the conflict between the goal of the
the targets of the conventional denoising procedure and the conventional denoising process and the goal of the style trans-
style transfer task, we propose another novel style denoising fer task; (2) the conflict between simplicity and performance of
mechanism, which is more compatible with the target of the the model. To resolve the above conflicts, we propose a novel
style transfer task. We validate the effectiveness of our model DAE model with two task-oriented mechanisms, namely the
on two style benchmark datasets. Both automatic evaluation
and human evaluation show that our proposed model is highly the Style-Enhanced Denoising Auto-Encoder (SE-DAE). We
competitive compared with previous strong the state of the art deal with the first conflict through a novel style denoising
(SOTA) approaches and greatly outperforms the vanilla DAE. mechanism, that is, we randomly “contaminate” the input
sentences with style words during the corruption process, and
I. I NTRODUCTION require the model to remove the contaminants and recover
Text style transfer is an intractable challenge, focusing the original sentence during the reconstruction process. In this
more on how to change the style-attribute of sentences while way, the goal of the style denoising mechanisms is compatible
preserving their semantics. Due to the lack of paired sentences, with the goal of the text style transfer task. Next, we propose a
the Denoising Auto-Encoder (DAE) has been used widely novel data refinement mechanism to select the best inference to
in existing style-transfer models to model distributions of construct a high-quality pseudo-parallel dataset, which solves
different styles in unsupervised manners. However, the target the data-quality problem. By using the high-quality pseudo-
of the conventional denoising process that requires models parallel data for training on the DAE model, we enhance the
to recover all the information in the sentences conflicts with transferability of the model without combining any complex
the target of the style transfer task that only requires models unsupervised networks. Finally, we evaluate the performance
to preserve the semantic information but transfer the style of our model on two benchmark datasets: Y ELP [15] for the
information. As a consequence, the vanilla DAE will regard sentiment transfer task and G YAFC [16] for the formality
rare style words as content words and preserve them in final transfer task. Both automatic and human evaluation show
transferred results. Thus, the vanilla DAE can not produce that our model is highly competitive compared with previous
satisfying enough results in this task. strong SOTAs and greatly outperforms the most commonly
To enhance the transferability of DAE, most of the pro- used vanilla DAE.
posed transfer models combine DAE with different unsu- II. BACKGROUND
pervised neural networks, especially Reinforcement Learning
A. Denoising Auto-Encoder (DAE)
(RL) [1]–[5] and Generative Adversarial Networks (GANs)
DAE [17] is an unsupervised neural network that aims
*The corresponding author is Yang Feng. to learn robust representations by corrupting input pattern
DAE Noise
SX
<latexit sha1_base64="SXy1yXnUiI/AMTsj4KJS6o83wA8=">AAAC1HicjVHLSsNAFD2N7/qqunQTLIKrMhFf3RXcuFS0D7ClTKbTNpgXyUSQ2pW49Qfc6jeJf6B/4Z0xBV0UnZDkzLn3nJl7rxv7XqoYey9YM7Nz8wuLS8XlldW19dLGZiONskTIuoj8KGm5PJW+F8q68pQvW3EieeD6sunenOp481YmqReFV+oulp2AD0Kv7wmuiOqW1i+7o3bA1VBwf9Qaj7ulMqvsH7LqCbNZhZlF4JA51SPHdnKmjHydR6U3tNFDBIEMASRCKMI+OFJ6ruGAISaugxFxCSHPxCXGKJI2oyxJGZzYG/oOaHedsyHttWdq1IJO8elNSGljlzQR5SWE9Wm2iWfGWbPTvEfGU9/tjv5u7hUQqzAk9i/dJPO/Ol2LQh8npgaPaooNo6sTuUtmuqJvbv+oSpFDTJzGPYonhIVRTvpsG01qate95Sb+YTI1q/ciz83wqW9JA55M0Z4OGvsVh1Wci4NyjeWjXsQ2drBH8zxGDWc4R93M/BkveLUa1r31YD1+p1qFXLOFX8t6+gIzI5YO</latexit>
SX
<latexit sha1_base64="SXy1yXnUiI/AMTsj4KJS6o83wA8=">AAAC1HicjVHLSsNAFD2N7/qqunQTLIKrMhFf3RXcuFS0D7ClTKbTNpgXyUSQ2pW49Qfc6jeJf6B/4Z0xBV0UnZDkzLn3nJl7rxv7XqoYey9YM7Nz8wuLS8XlldW19dLGZiONskTIuoj8KGm5PJW+F8q68pQvW3EieeD6sunenOp481YmqReFV+oulp2AD0Kv7wmuiOqW1i+7o3bA1VBwf9Qaj7ulMqvsH7LqCbNZhZlF4JA51SPHdnKmjHydR6U3tNFDBIEMASRCKMI+OFJ6ruGAISaugxFxCSHPxCXGKJI2oyxJGZzYG/oOaHedsyHttWdq1IJO8elNSGljlzQR5SWE9Wm2iWfGWbPTvEfGU9/tjv5u7hUQqzAk9i/dJPO/Ol2LQh8npgaPaooNo6sTuUtmuqJvbv+oSpFDTJzGPYonhIVRTvpsG01qate95Sb+YTI1q/ciz83wqW9JA55M0Z4OGvsVh1Wci4NyjeWjXsQ2drBH8zxGDWc4R93M/BkveLUa1r31YD1+p1qFXLOFX8t6+gIzI5YO</latexit>
B I era ve Back Trans a on
x
The s y e mode ng process can on y he p he DAE o mode
x x0
<latexit sha1_base64="nsNqGIJvuCcjwZsOm8EAOl+5nm8=">AAACyHicjVHLTsJAFD3UF+ILdemmkZi4IlMiCjsSN8YVJhZIgJi2DDihtE07VQlx4w+41S8z/oH+hXfGkuiC6DRt75x7zpm597qRLxLJ2HvOWFpeWV3Lrxc2Nre2d4q7e60kTGOP217oh3HHdRLui4DbUkifd6KYOxPX5213fK7y7TseJyIMruU04v2JMwrEUHiOJMjuuUPz4aZYYuVKldVrzGRlphcFVWbVTy3TypASstUMi2/oYYAQHlJMwBFAUuzDQUJPFxYYIsL6mBEWUyR0nuMRBdKmxOLEcAgd03dEu26GBrRXnolWe3SKT29MShNHpAmJF1OsTjN1PtXOCl3kPdOe6m5T+ruZ14RQiVtC/9LNmf/VqVokhqjpGgTVFGlEVedlLqnuirq5+aMqSQ4RYSoeUD6m2NPKeZ9NrUl07aq3js5/aKZC1d7LuCk+1S1pwPMpmouDVqVssbJ1dVJqsGzUeRzgEMc0zzM0cIEmbPIWeMYLXo1LIzLujek31chlmn38WsbTF2wikQU=</latexit>
<latexit
<latexit sha1_base64="oUmKiGYReIdAqtB25IUaHxAqrGo=">AAAC3HicjVHLSsNAFD2N7/qqunDhJlgEV2UiPncFNy4UFG0VWi2TcdRgXiQTsYTu3Ilbf8Ctfo/4B/oX3hlT0IXohCR3zj3nzL1z3dj3UsXYW8kaGBwaHhkdK49PTE5NV2Zmm2mUJUI2RORHyYnLU+l7oWwoT/nyJE4kD1xfHrvX2zp/fCOT1IvCI9WN5WnAL0PvwhNcEdSpzLcDrq4E9/PdXidvK3mr8sO9Xq9TqbLayhrb2mQ2qzGzKFhjzta6YzsFUkWx9qPKK9o4RwSBDAEkQiiKfXCk9LTggCEm7BQ5YQlFnslL9FAmbUYsSQxO6DV9L2nXKtCQ9tozNWpBp/j0JqS0sUSaiHgJxfo02+Qz46zR37xz46lr69LfLbwCQhWuCP1L12f+V6d7UbjApunBo55ig+juROGSmVvRldvfulLkEBOm43PKJxQLo+zfs200qeld3y03+XfD1Kjei4Kb4UNXSQPuT9H+PWiu1BxWcw5Wq3VWjHoUC1jEMs1zA3XsYB8NU/8TnvFinVl31r318EW1SoVmDj+W9fgJlxGZuA==</latexit>
<latexit sha1_base64="9t1H3WKt8JB5YmHk2GTuKgdp8QM=">AAACyXicjVHLTsJAFD3UF+ILdemmkRhdNVMCCjsSNyZuMJFHIsS0ZcBKaWs7NSBx5Q+41R8z/oH+hXfGkuiC6DRt75x7zpm599qh58aCsfeMtrC4tLySXc2trW9sbuW3d5pxkEQObziBF0Rt24q55/q8IVzh8XYYcWtke7xlD09lvnXPo9gN/EsxCXl3ZA18t+86liCo2bH7+vjwOl9gRrHMqhWmM4OpRUGZmdVjUzdTpIB01YP8GzroIYCDBCNw+BAUe7AQ03MFEwwhYV1MCYsoclWe4xE50ibE4sSwCB3Sd0C7qxT1aS89Y6V26BSP3oiUOg5IExAvoliepqt8opwlOs97qjzl3Sb0t1OvEaECN4T+pZsx/6uTtQj0UVE1uFRTqBBZnZO6JKor8ub6j6oEOYSEybhH+YhiRylnfdaVJla1y95aKv+hmBKVeyflJviUt6QBz6aozw+aRcNkhnlRKtRK6aiz2MM+jmieJ6jhDHU0yPsWz3jBq3au3Wlj7eGbqmVSzS5+Le3pC/F6kTo=</latexit>
y
<latexit sha1_base64="YOPoKe+eMiFVzKBBXs50k586p5w=">AAACyHicjVHLTsJAFD3UF+ILdemmkZi4aqaKijsSN8YVJhZIgJi2DDihtE071RDixh9wq19m/AP9C++MJdEF0Wna3jn3nDNz7/XiQKSSsfeCsbC4tLxSXC2trW9sbpW3d5pplCU+d/woiJK256Y8ECF3pJABb8cJd8dewFve6ELlW/c8SUUU3shJzHtjdxiKgfBdSZDT9Qbm5LZcYdYJs89PmcksppcOavaxbdo5UkG+GlH5DV30EcFHhjE4QkiKA7hI6enABkNMWA9TwhKKhM5zPKJE2oxYnBguoSP6DmnXydGQ9soz1WqfTgnoTUhp4oA0EfESitVpps5n2lmh87yn2lPdbUJ/L/caEypxR+hfuhnzvzpVi8QANV2DoJpijajq/Nwl011RNzd/VCXJISZMxX3KJxT7Wjnrs6k1qa5d9dbV+Q/NVKja+zk3w6e6JQ14NkVzftA8smxm2dfVSr2aj7qIPezjkOZ5hjou0YBD3gLPeMGrcWXExoMx+aYahVyzi1/LePoCVEGQ/g==</latexit>
<latexit sha1_base64="VLZZe7q9UaOe/9dmgMs2Zf5LPj0=">AAACyXicjVHLTsJAFD3UF+ILdemmkRhdNVNFxR2JGxM3mMgjAWLaMmClL9upEYkrf8Ct/pjxD/QvvDOWRBdEp2l759xzzsy91448NxGMvee0mdm5+YX8YmFpeWV1rbi+0UjCNHZ43Qm9MG7ZVsI9N+B14QqPt6KYW77t8aY9PJX55h2PEzcMLsUo4l3fGgRu33UsQVCjY/f10e5VscSMQ2aeHDGdGUwtFVTMA1M3M6SEbNXC4hs66CGEgxQ+OAIIij1YSOhpwwRDRFgXY8JiilyV53hEgbQpsTgxLEKH9B3Qrp2hAe2lZ6LUDp3i0RuTUscOaULixRTL03SVT5WzRKd5j5WnvNuI/nbm5RMqcE3oX7oJ8786WYtAHxVVg0s1RQqR1TmZS6q6Im+u/6hKkENEmIx7lI8pdpRy0mddaRJVu+ytpfIfiilRuXcybopPeUsa8GSK+vSgsW+YzDAvyqVqORt1HlvYxh7N8xhVnKGGOnnf4BkveNXOtVvtXnv4pmq5TLOJX0t7+gLYZpEv</latexit>
y0 y LSM
<latexit sha1_base64="HDEvgZz3YGoTVTKDjukb4lz7sZI=">AAACyHicjVHLSsNAFD2Nr1pfVZdugkVwVSY+667gRlxVMG2hLZKk0zqYF8lECcWNP+BWv0z8A/0L74wp6KLohCR3zj3nzNx73dgXqWTsvWTMzS8sLpWXKyura+sb1c2tdhplicdtL/KjpOs6KfdFyG0ppM+7ccKdwPV5x707V/nOPU9SEYXXMo/5IHDGoRgJz5EE2X13ZOY31RqrHzPr7ISZrM700kHDOrRMq0BqKFYrqr6hjyEieMgQgCOEpNiHg5SeHiwwxIQNMCEsoUjoPMcjKqTNiMWJ4RB6R98x7XoFGtJeeaZa7dEpPr0JKU3skSYiXkKxOs3U+Uw7K3SW90R7qrvl9HcLr4BQiVtC/9JNmf/VqVokRmjoGgTVFGtEVecVLpnuirq5+aMqSQ4xYSoeUj6h2NPKaZ9NrUl17aq3js5/aKZC1d4ruBk+1S1pwNMpmrOD9kHdYnXr6qjWZMWoy9jBLvZpnqdo4gIt2OQt8IwXvBqXRmw8GPk31SgVmm38WsbTF1MNkPo=</latexit>
<latexit sha1_base64="jD2XKQrlNphoEiPvQv86D3xPFN8=">AAAC3HicjVHLSsNAFD3G97vqwoWbYBFclYlvd4IbFwqKVoVWy2Q6tcG8SCaihO7ciVt/wK1+j/gH+hfeGVPQRdEJSe6ce86Ze+e6se+lirH3Pqt/YHBoeGR0bHxicmq6NDN7mkZZImRVRH6UnLs8lb4XyqrylC/P40TywPXlmXu9q/NnNzJJvSg8UXexvAj4Vei1PMEVQY3SfD3gqi24n+93GnldyVuVHx90Oo1SmVXWmbO9wWxWYWaZYMtZdWynQMoo1mFUekMdTUQQyBBAIoSi2AdHSk8NDhhiwi6QE5ZQ5Jm8RAdjpM2IJYnBCb2m7xXtagUa0l57pkYt6BSf3oSUNpZIExEvoVifZpt8Zpw12ss7N566tjv6u4VXQKhCm9C/dF3mf3W6F4UWtkwPHvUUG0R3JwqXzNyKrtz+0ZUih5gwHTcpn1AsjLJ7z7bRpKZ3fbfc5D8MU6N6Lwpuhk9dJQ24O0W7d3C6UnFYxTlaK++wYtQjWMAilmmem9jBHg5RNfU/4wWv1qV1bz1Yj99Uq6/QzOHXsp6+AHucmaw=</latexit>
1 ŷ
C
m m 2 ŷ m y y y y y y
m y⇤ Y
… …
w
st 1 ŷ Y Y Y Y
ŷ
S S SX
<latexit sha1_base64="fF9353OlvxorrkENq62PvHKrTpE=">AAAC1HicjVHLSsNAFD2Nr1ofrbp0EyyCq5KIoMuCG5cV7QPaUibTaRuaF8lEKLUrcesPuNVvEv9A/8I7YwpqEZ2Q5My559yZe68TeW4iLes1Zywtr6yu5dcLG5tb28XSzm4jCdOYizoPvTBuOSwRnhuIunSlJ1pRLJjveKLpjM9VvHkj4sQNg2s5iUTXZ8PAHbicSaJ6peJVb9rxmRxx5k1bs1mvVLYqll7mIrAzUEa2amHpBR30EYIjhQ+BAJKwB4aEnjZsWIiI62JKXEzI1XGBGQrkTUklSMGIHdN3SLt2xga0VzkT7eZ0ikdvTE4Th+QJSRcTVqeZOp7qzIr9LfdU51R3m9DfyXL5xEqMiP3LN1f+16dqkRjgTNfgUk2RZlR1PMuS6q6om5tfqpKUISJO4T7FY8JcO+d9NrUn0bWr3jIdf9NKxao9z7Qp3tUtacD2z3EugsZxxbYq9uVJuWplo85jHwc4onmeoooL1FDXM3/EE56NhnFr3Bn3n1Ijl3n28G0ZDx/I7pXg</latexit>
SX
<latexit sha1_base64="fF9353OlvxorrkENq62PvHKrTpE=">AAAC1HicjVHLSsNAFD2Nr1ofrbp0EyyCq5KIoMuCG5cV7QPaUibTaRuaF8lEKLUrcesPuNVvEv9A/8I7YwpqEZ2Q5My559yZe68TeW4iLes1Zywtr6yu5dcLG5tb28XSzm4jCdOYizoPvTBuOSwRnhuIunSlJ1pRLJjveKLpjM9VvHkj4sQNg2s5iUTXZ8PAHbicSaJ6peJVb9rxmRxx5k1bs1mvVLYqll7mIrAzUEa2amHpBR30EYIjhQ+BAJKwB4aEnjZsWIiI62JKXEzI1XGBGQrkTUklSMGIHdN3SLt2xga0VzkT7eZ0ikdvTE4Th+QJSRcTVqeZOp7qzIr9LfdU51R3m9DfyXL5xEqMiP3LN1f+16dqkRjgTNfgUk2RZlR1PMuS6q6om5tfqpKUISJO4T7FY8JcO+d9NrUn0bWr3jIdf9NKxao9z7Qp3tUtacD2z3EugsZxxbYq9uVJuWplo85jHwc4onmeoooL1FDXM3/EE56NhnFr3Bn3n1Ijl3n28G0ZDx/I7pXg</latexit>
<latexit sha1_base64="YGcWZuG5DS6m2K0vHpam8qhbdPU=">AAACz3icjVHLSsNAFD2Nr1pfVZdugkVwVRIp6LLgxmUL9gFNkUk6bUPzIpmoJVTc+gNu9a/EP9C/8M6YglpEJyQ5c+49Z+bea0eemwjDeC1oS8srq2vF9dLG5tb2Tnl3r52EaezwlhN6Ydy1WcI9N+At4QqPd6OYM9/2eMeenMt455rHiRsGl2Ia8b7PRoE7dB0miLIse5hZYyay29nsqlwxqoZa+iIwc1BBvhph+QUWBgjhIIUPjgCCsAeGhJ4eTBiIiOsjIy4m5Ko4xwwl0qaUxSmDETuh74h2vZwNaC89E6V26BSP3piUOo5IE1JeTFiepqt4qpwl+5t3pjzl3ab0t3Mvn1iBMbF/6eaZ/9XJWgSGOFM1uFRTpBhZnZO7pKor8ub6l6oEOUTESTygeEzYUcp5n3WlSVTtsrdMxd9UpmTl3slzU7zLW9KAzZ/jXATtk6ppVM1mrVKv5aMu4gCHOKZ5nqKOCzTQIu8Ij3jCs9bUbrQ77f4zVSvkmn18W9rDBzz0lIo=</latexit>
x̂
P udo
<latexit sha1_base64="eNDsxk1loXAMu2dOKa24nUiOFZk=">AAACy3icjVHLSsNAFD2Nr1pfVZdugkUQFyWRgi4LbtwIFewD2irJdFpD82IyEWvt0h9wq/8l/oH+hXfGFNQiOiHJmXPPuTP3Xjf2vURa1mvOmJtfWFzKLxdWVtfWN4qbW40kSgXjdRb5kWi5TsJ9L+R16Umft2LBncD1edMdnqh484aLxIvCCzmKeTdwBqHX95gjiWp13P749vJgclUsWWVLL3MW2BkoIVu1qPiCDnqIwJAiAEcISdiHg4SeNmxYiInrYkycIOTpOMcEBfKmpOKkcIgd0ndAu3bGhrRXORPtZnSKT68gp4k98kSkE4TVaaaOpzqzYn/LPdY51d1G9HezXAGxEtfE/uWbKv/rU7VI9HGsa/CoplgzqjqWZUl1V9TNzS9VScoQE6dwj+KCMNPOaZ9N7Ul07aq3jo6/aaVi1Z5l2hTv6pY0YPvnOGdB47BsW2X7vFKqVrJR57GDXezTPI9QxSlqqOs5PuIJz8aZkRh3xv2n1Mhlnm18W8bDB8BBklk=</latexit> <latexit sha1_base64="vFMIFzlWHARlD7DnVcs7ABLDJrA=">AAAC2nicjVHLSsNAFD3GV62vqrhyEyyCq5JIQZeCG5cK9gGNlMk4taF5kUyEErJxJ279Abf6QeIf6F94Z0xBLaITkpw5954zc+91Y99LpWW9zhizc/MLi5Wl6vLK6tp6bWOznUZZwkWLR36UdF2WCt8LRUt60hfdOBEscH3RcUcnKt65EUnqReGFHMfiMmDXoTfwOJNE9WvbVccd5ON+7gRMDjnz825RFP1a3WpYepnTwC5BHeU6i2ovcHCFCBwZAgiEkIR9MKT09GDDQkzcJXLiEkKejgsUqJI2oyxBGYzYEX2vadcr2ZD2yjPVak6n+PQmpDSxR5qI8hLC6jRTxzPtrNjfvHPtqe42pr9begXESgyJ/Us3yfyvTtUiMcCRrsGjmmLNqOp46ZLprqibm1+qkuQQE6fwFcUTwlwrJ302tSbVtaveMh1/05mKVXte5mZ4V7ekAds/xzkN2gcN22rY5836cbMcdQU72MU+zfMQxzjFGVrkneMRT3g2HOPWuDPuP1ONmVKzhW/LePgASGuYbA==</latexit>
L
p S
shared shared shared shared
d No
fin m n
y Decoder y
Encoder ŷ
<latexit sha1_base64="M1y28yvQ4gsZwE+/01cbYZKY+Tw=">AAACz3icjVHLSsNAFD2Nr1pfVZdugkVwVRIRdFlw47IF+4CmSDKdtqFpEpKJUkLFrT/gVv9K/AP9C++MU1CL6IQkZ86958zce7048FNhWa8FY2l5ZXWtuF7a2Nza3inv7rXSKEsYb7IoiJKO56Y88EPeFL4IeCdOuDvxAt72xhcy3r7hSepH4ZWYxrw3cYehP/CZK4hyHG+QOyNX5NPZ7LpcsaqWWuYisDWoQK96VH6Bgz4iMGSYgCOEIBzARUpPFzYsxMT1kBOXEPJVnGOGEmkzyuKU4RI7pu+Qdl3NhrSXnqlSMzoloDchpYkj0kSUlxCWp5kqnilnyf7mnStPebcp/T3tNSFWYETsX7p55n91shaBAc5VDT7VFCtGVse0S6a6Im9ufqlKkENMnMR9iieEmVLO+2wqTapql711VfxNZUpW7pnOzfAub0kDtn+OcxG0Tqq2VbUbp5XaqR51EQc4xDHN8ww1XKKOJnnHeMQTno2GcWvcGfefqUZBa/bxbRkPHz9WlIs=</latexit>
<latexit sha1_base64="X6mFWgoNfhModj4/867WgQI4twQ=">AAACy3icjVHLSsNAFD2Nr1pfVZdugkUQFyWRgi4LbtwIFewD2ipJOq2DaRImE6HWLv0Bt/pf4h/oX3hnnIJaRCckOXPuOXfm3usnIU+l47zmrLn5hcWl/HJhZXVtfaO4udVI40wErB7EYSxavpeykEesLrkMWSsRzBv6IWv6Nycq3rxlIuVxdCFHCesOvUHE+zzwJFGtjt8fjy4PJlfFklN29LJngWtACWbV4uILOughRoAMQzBEkIRDeEjpacOFg4S4LsbECUJcxxkmKJA3IxUjhUfsDX0HtGsbNqK9yplqd0CnhPQKctrYI09MOkFYnWbreKYzK/a33GOdU91tRH/f5BoSK3FN7F++qfK/PlWLRB/HugZONSWaUdUFJkumu6Jubn+pSlKGhDiFexQXhAPtnPbZ1p5U16566+n4m1YqVu0Do83wrm5JA3Z/jnMWNA7LrlN2zyulasWMOo8d7GKf5nmEKk5RQ13P8RFPeLbOrNS6s+4/pVbOeLbxbVkPH8Kkklo=</latexit>
y⇤ Encoder Decoder xY <latexit sha1_base64="evR+tuGTgqwd4IuYgT1TXMxt+VM=">AAAC2nicjVHLSsNAFD3GV62vqLhyEyyCq5JIQZeCG5cVbKu0UibjtA3Ni2QiSsjGnbj1B9zqB4l/oH/hnTGCD0QnJDlz7j1n5t7rxr6XStt+njAmp6ZnZitz1fmFxaVlc2W1nUZZwkWLR36UnLgsFb4Xipb0pC9O4kSwwPVFxx0fqHjnQiSpF4XH8ioWZwEbht7A40wS1TfXqz13kF/2817A5IgzPz8tiqJv1uy6rZf1EzglqKFczch8Qg/niMCRIYBACEnYB0NKTxcObMTEnSEnLiHk6bhAgSppM8oSlMGIHdN3SLtuyYa0V56pVnM6xac3IaWFLdJElJcQVqdZOp5pZ8X+5p1rT3W3K/q7pVdArMSI2L90H5n/1alaJAbY0zV4VFOsGVUdL10y3RV1c+tTVZIcYuIUPqd4Qphr5UefLa1Jde2qt0zHX3SmYtWel7kZXtUtacDO93H+BO2dumPXnaNGbb9RjrqCDWxim+a5i30cookWeee4xwMejZ5xbdwYt++pxkSpWcOXZdy9AUhfmGw=</latexit>
L
S S SY
<latexit sha1_base64="53X23yLu4AO8HFJw63Xp09s0VG8=">AAAC1HicjVHLSsNAFD3GV62PVl26CRbBVUlE0GXBjcuK9iFtKZPptA3Ni2QilNqVuPUH3Oo3iX+gf+GdMQW1iE5Icubcc+7MvdeJPDeRlvW6YCwuLa+s5tby6xubW4Xi9k49CdOYixoPvTBuOiwRnhuImnSlJ5pRLJjveKLhjM5UvHEj4sQNgys5jkTHZ4PA7bucSaK6xcJld9L2mRxy5k2up9NusWSVLb3MeWBnoIRsVcPiC9roIQRHCh8CASRhDwwJPS3YsBAR18GEuJiQq+MCU+TJm5JKkIIRO6LvgHatjA1or3Im2s3pFI/emJwmDsgTki4mrE4zdTzVmRX7W+6JzqnuNqa/k+XyiZUYEvuXb6b8r0/VItHHqa7BpZoizajqeJYl1V1RNze/VCUpQ0Scwj2Kx4S5ds76bGpPomtXvWU6/qaVilV7nmlTvKtb0oDtn+OcB/Wjsm2V7YvjUsXKRp3DHvZxSPM8QQXnqKKmZ/6IJzwbdePWuDPuP6XGQubZxbdlPHwAy1CV4Q==</latexit>
SY
<latexit sha1_base64="53X23yLu4AO8HFJw63Xp09s0VG8=">AAAC1HicjVHLSsNAFD3GV62PVl26CRbBVUlE0GXBjcuK9iFtKZPptA3Ni2QilNqVuPUH3Oo3iX+gf+GdMQW1iE5Icubcc+7MvdeJPDeRlvW6YCwuLa+s5tby6xubW4Xi9k49CdOYixoPvTBuOiwRnhuImnSlJ5pRLJjveKLhjM5UvHEj4sQNgys5jkTHZ4PA7bucSaK6xcJld9L2mRxy5k2up9NusWSVLb3MeWBnoIRsVcPiC9roIQRHCh8CASRhDwwJPS3YsBAR18GEuJiQq+MCU+TJm5JKkIIRO6LvgHatjA1or3Im2s3pFI/emJwmDsgTki4mrE4zdTzVmRX7W+6JzqnuNqa/k+XyiZUYEvuXb6b8r0/VItHHqa7BpZoizajqeJYl1V1RNze/VCUpQ0Scwj2Kx4S5ds76bGpPomtXvWU6/qaVilV7nmlTvKtb0oDtn+OcB/Wjsm2V7YvjUsXKRp3DHvZxSPM8QQXnqKKmZ/6IJzwbdePWuDPuP6XGQubZxbdlPHwAy1CV4Q==</latexit>
F g 2 The a n ng p ocess o ou mode MtX ⇔Y ep esen s he s y e ans e mode a epoch t wh ch s u zed o gene a e pseudo pa a e da a o a n
Mt+1
X ⇔Y
Y ELP G YAFC
Accuracy BLEU-r G2 H2 Accuracy BLEU-r G2 H2
Existing Unsupervised Systems
CrossAlign [23] 73.8 16.8 35.2 27.4 69.8 3.6 15.8 6.8
StyleEmbedding [7] 8.7 38.5 18.3 14.2 22.2 7.9 13.2 11.7
DelRetri [15] 89.2 28.7 50.6 43.5 54.6 21.2 34.0 30.5
UnsupervisedMT [24] 96.3 41.8 63.4 58.3 68.9 33.4 48.0 45.0
BackTranslation [25] 81.4 37.2 55.0 51.1 26.3 44.5 34.2 33.0
MultiAttribute [13] 81.4 39.6 56.8 53.3 26.3 44.5 34.2 33.0
TransfomerGAN [11] 85.4 51.4 66.2 64.1 66.4 44.6 54.4 53.4
DualRL [3] 88.6 52.0 67.9 65.5 62.6 41.9 51.2 50.2
DataRefinement [20] 87.7 19.8 41.7 32.4 64.1 50.7 57.0 56.6
PoiGen [2] 86.8 54.4 68.7 66.9 11.7 44.9 22.9 18.6
WordRelev [26] 84.3 57.0 69.3 68.0 81.8 46.2 61.4 59.0
StyleIns [27] 91.1 45.5 64.4 60.7 69.6 47.5 57.5 56.5
Unsupervised Baseline Systems
Input Copy 2.5 58.6 12.1 4.8 13.4 48.2 25.4 21.0
Vanilla DAE 73.3 54.3 63.1 62.4 29.2 47.6 37.3 36.2
Our Unsupervised DAE Systems
SE-DAE 85.7 53.9 68.0 66.2 85.5 48.4 64.3 61.8
TABLE III
H UMAN EVALUATION RESULTS OF DIFFERENT MODELS ON TWO BENCHMARK DATASETS .
Y ELP G YAFC
Style Content Fluency Style Content Fluency
TransfomerGAN [11] 3.65 ±1.05 4.33 ±0.70 4.06 ±0.83 2.75 ±0.85 3.85 ±0.89 3.67 ±0.97
DualRL [3] 3.90 ±0.93 4.30 ±0.88 4.21 ±0.83 2.57 ±0.77 3.61 ±1.23 3.73 ±0.95
PoiGen [2] 3.83 ±1.13 4.46 ±0.74 4.37 ±0.72 1.56 ±0.74 2.77 ±0.52 2.88 ±0.71
WordRelev [26] 4.03 ±0.95 4.63 ±0.55 4.50 ±0.56 3.47 ±0.86 4.35 ±0.72 4.37 ±0.91
StyleIns [27] 3.82 ±1.02 4.12 ±0.93 4.11 ±0.87 3.27 ±1.03 4.33 ±0.66 4.32 ±0.77
SE-DAE 3.88 ±0.98 4.48 ±0.63 4.38 ±0.69 3.55 ±1.01 4.38 ±0.68 4.36 ±0.75
TABLE IV
A BLATION STUDY OF DIFFERENT COMPONENTS USED IN SE-DAE.
Y ELP G YAFC
ACC BLEU-r G2 ∆ H2 ∆ ACC BLEU-r G2 ∆ H2 ∆
Vanilla DAE 73.3 51.9 63.1 0.0 62.4 0.0 29.2 47.6 37.3 0.0 36.2 0.0
+ Classifier 82.3 48.7 63.3 +0.2 61.2 -1.2 81.9 47.4 62.3 +25.0 60.0 +23.8
+ Refinement 82.1 53.6 66.4 +3.3 64.9 +2.5 81.7 48.8 63.2 +25.9 61.1 +24.9
+ Style Noise 85.7 53.9 68.0 +4.9 66.2 +3.8 85.5 48.4 64.3 +27.0 61.8 +25.6
sentence is generated by, the five human judges are asked to on denoising process is more robust to the noise and can
rate the output sentences of different style-transfer models on better transfer the sentences of G YAFC. The results of human
a Likert scale of 1 to 5 to evaluate the target style accuracy, evaluation are shown in Table III. Similar to the results of
content preservation and sentence fluency (Fluency). the automatic evaluation, our model surpasses most previous
models except for the WordRelev [26] on Y ELP.When G YAFC
VI. E VALUATION R ESULTS is involved, our model can achieve similar performance to
Table II presents the results of automatic evaluation of WordRelev and has higher style accuracy and content score.
different models (G2 and H2 representing the geometric mean Overall, the evaluation results show the effectiveness of our
and harmonic mean of accuracy and BLEU-r respectively. ). model and a competitive advantage over those powerful but
In Y ELP, our model outperforms other style-transfer models complex models.
except for WordRelev and PoiGen. The reason is that both VII. P ERFORMANCE A NALYSIS
WordRelev and PoiGen utilize the word-level style information
through a multi-stage training process, but our model is only A. Ablation Study
uses the sentence-level style information of pesudo-parallel To get a better understanding of roles played by different
data for training. In contrast, our model achieves the best components in our model, we conduct an ablation studies for
performance on G YAFC. A possible explanation is that there each components in our model by both Y ELP and G YAFC,
are more abbreviations and spell errors in sentences of G YAFC and present the results in Table IV. The results show that
than that in Y ELP, so models are required to be more robust the refinement mechanism highly improves BLEU scores on
to noise when transferring the style of sentences of G YAFC. more than 5 points on Y ELP and by more than 1 point on
Thus, compared with other models, our model that based G YAFC. Meanwhile, the style denoising mechanism improves
100
100 140 Pseudo_BLEU w/o refine 100
Pseudo_BLEU w/ refine
Pseudo_ACC w/o refine 80
80 120 Pseudo_ACC w/ refine 80
100
60 60
60
80
40 60 40 40
40
20 Pseudo_BLEU w/o refine 20 BLEU_insert 20 BLEU_insert
Pseudo_BLEU w/ refine BLEU_replace BLEU_replace
Pseudo_ACC w/o refine 20 ACC_insert ACC_insert
Pseudo_ACC w/ refine 0 ACC_replace 0 ACC_replace
0 0
0 5 10 15 20 25 30 0 5 10 15 20 25 30 1 10 20 30 40 50 60 70 1 10 20 30 40 50 60 70
Training epochs Training epochs Training epochs Training epochs
(a) (b) (c) (d)
Fig. 3. (a): Evolution of BLEU score and style accuracy of the pseudo-parallel data with or without refinement on Y ELP. (b): Evolution of BLEU score
and style accuracy of the pseudo-parallel data with or without refinement on G YAFC. (c): Evolution of BLEU score and style-accuracy on Y ELP when using
different ways to add style words. (d): Evolution of BLEU score and style-accuracy on G YAFC when using different ways to add style words.
Output-Probability
Output-Probability
input 0.29 0.28 0.20
irritating us 0.25
Frequency
0.2 0.11
w/o style- the food is so good that 0.08
noise 0.05
irritating us 0.09
0.04
0.05
0.03 0.02 0.02
0.02
w/ style- the food is so good that 0.002
noise fun us
ed
e
ng
d
g
dy
n
g
e
g
l
le
d
se
s
fu
ar
in
u
in
fu
in
bl
in
ba
en
rn
rib
or
lo
ti
bo
ag
kw
in
aw
tat
rri
tat
es
bu
t
w
no
in
ta
g
ho
i
te
er
i
aw
irr
irr
en
er
t
in
t
en
(a) (b) (c) (d)
Fig. 4. (a): An example on Y ELP, style-noise has an influence on transferring negative style words. (b): Appearance frequency of different negative words
on Y ELP. (c): Top-5 output words of the model trained without style-noise when transferring the rare negative word “irritating”. (d): Top-5 output words of
the model trained with style-noise when transferring the rare negative word “irritating”.
the style accuracy on both two datasets by over 3 percentage. information of sentences, and the model trained by insertion
In addition, the results of the ablation studies also show tends to filter words (i.e., focus on filtering out style words).
that the style classifier plays a crucial role in balancing the On the contrary, the replacement will break the information of
style intensity and content preservation. Without the style both content and style of the input sentences. As a result, the
classifier, the vanilla DAE model will generate sentences that model trained with replacement needs to recover the content
can successfully preserves content information but fails in and style information given partial information as input (i.e.,
transferring styles. focus on the reconstruction of both the semantic and the style).
B. Refine Pesudo-parallel Data D. Transfer Style words
In this section, we investigate the influence of the data To further investigate the influence of the style denoising
refinement mechanism by visualizing the evolution curves of mechanism on transferring the rare style words, we sample
the BLEU and style accuracy of the pseudo-parallel data in a sentence that contains the rare negative word irritating. As
each epoch, as shown in Figure 3 (a) and (b). The result shown in Figure 4 (b), irritating rarely appears in Y ELP with
indicates the effectiveness of data refinement mechanism in a low frequency of 0.002 while bad appears most frequently
finding the sentences that both preserving the semantics and with a probability of 0.29. When transferring irritating, the
transferring the style. Moreover, the result also shows that model without style denoising mechanism tends to treat this
without the data refinement mechanism, the pseudo-parallel style word as a content word and gives the highest output
data suffers from more severe data-quality problem on G YAFC probability to it (Figure 4 (c)). In contrast, as the style
than on Y ELP. A possible explanation is that the style of denoising mechanism increases the appearance frequency of
sentiment is more distinguishable than the style of formality rare style words, the model with style denoising mechanism
and transferring sentiment is much easier than transferring provides higher output probability to words with the opposite
formality. style and outputs the positive word “fun” (Figure 4 (d)).
C. Which is Better: Replace Or Insert? E. Case Study
In this work, we explore two ways to add style words To scrutinize the performance of different transfer models,
in the style denoising mechanism: (1) randomly replacing we sample several sentences with multiple style words or com-
input words with style words and (2) randomly inserting plex syntactic structures and then compare the corresponding
style words into input sentences. The result (see Figure 3 transferred sentences of our model with those of other state-of-
(c) and (d)) indicates it is much better to add style words the-art representative systems.From the result shown in Table
through replacement than insertion in this task. One possible V, we observe that: 1) Our model can precisely convert each
explanation is that the insertion does not corrupt the original style words of source sentences 2) In contrast to the incoherent
TABLE V
E XAMPLE OUTPUTS OF DIFFERENT TRANSFER MODELS ON SENTIMENT AND FORMALITY TRANSFER TASKS .
sentences of other models, the sentences generated by our similarity when paired two sentences. Wu et al. [31] propose
model are more understandable. a two-step model that first masks style words with a pre-trained
identifier, and then train the model to infill the masked words
VIII. R ELATED W ORK with words of the target style. Zhou et al. [26] propose a two-
Most of the existing models achieve the style transfer of stage model that calculates the word-level style relevance in
texts from two perspectives: model-oriented and data-oriented. the first stage and then utilizes the learned style relevance in
Model-oriented methods handle the text style transfer prob- the second stage.
lem by designing complicated neural networks, which may IX. C ONCLUSION
incorporate multiple steps in the training procedure. Shen et In this work, we propose the Style-Enhanced Denoising
al. [23] design a powerful model that can align two different Auto-Encoder (SE-DAE) with a novel style-denoising mech-
style domains by adversarial learning. Fu et al. [7] applied an anism, which is more compatible with style transfer tasks. To
adversarial loss on the encoder to separate style and content preserve the simplicity of original vanilla DAE, we train SE-
information, while Dai et al. [11] uses cyle-GAN to achieve DAE on the high-quality pseudo-parallel data generated by an-
style transfer without disentangling the style and content. other novel data-refinement mechanism. Compared with previ-
Meanwhile, Xu et al. [1] proposed a transfer model based ously proposed transfer systems that combine the vanilla DAE
on cycle reinforcement-learning, and Luo et al. [3], [4] solved with complicated unsupervised models, our model achieves
the text style transfer problem by dual reinforcement learning. competitive performance to them while preserving the sim-
Both Xu et al. [1] and Luo et al. [3], [4] contain a two- plicity of the DAE in both the network structure and training
step training procedure. Gong et al. [5] designed another RL- procedure. The evaluation results on two benchmark style
based model, which considers the style, semantic and fluency datasets indicate the effectiveness of SE-DAE in multiple text
rewards of the transferred sentence simultaneously. Wu et al. style transfer tasks.
[2] proposed a multi-step model by hierarchical reinforcement
learning. Liu et al. [29] proposed a VAE model that transfers R EFERENCES
the style of the sentence by modifying its latent representation [1] J. Xu, X. Sun, Q. Zeng, X. Ren, X. Zhang, H. Wang, and W. Li,
in the continuous space. “Unpaired sentiment-to-sentiment translation: A cycled reinforcement
learning approach,” in Proceedings of the 56th Annual Meeting of the
Data-oriented methods prefer to retrieve more word-level or Association for Computational Linguistics, 2018, pp. 979–988.
sentences-level information from training corpus for transfer [2] C. Wu, X. Ren, F. Luo, and X. Sun, “A hierarchical reinforced sequence
models. Li et al. [15] proposed a retrieval-based transfer model operation method for unsupervised text style transfer,” in Proceedings of
the 57th Conference of the Association for Computational Linguistics,
that could retrieve similar sentences from target-style corpus ACL, 2019, pp. 4873–4883.
to help the transfer of source-style sentences. Sudhakar et [3] F. Luo, P. Li, J. Zhou, P. Yang, B. Chang, Z. Sui, and X. Sun, “A dual
al. [30] improved the retrieval process of Li et al. [15] with reinforcement learning framework for unsupervised text style transfer,”
in Proceedings of the 28th International Joint Conference on Artificial
a more complicated scoring function and proposed another Intelligence, IJCAI 2019, 2019a, pp. 5116–5122.
method of use style-embedding as style input when there are [4] F. Luo, P. Li, P. Yang, J. Zhou, Y. Tan, B. Chang, Z. Sui, and X. Sun,
no similar sentences in the target style corpus. Lample et al. “Towards fine-grained text sentiment transfer,” in Proceedings of the
57th Annual Meeting of the Association for Computational Linguistics,
[13] on-the-fly generated pseudo-parallel data and adopted the ACL 2019, 2019b, pp. 2013–2022.
unsupervised machine translation model [18] as their transfer [5] H. Gong, S. Bhat, L. Wu, J. Xiong, and W.-m. Hwu, “Reinforcement
model. Jin et al. [20] iteratively back-translated and matched learning based text style transfer without parallel training corpus,” in
Proceedings of the 2019 Conference of the North American Chapter of
unpaired corpus to obtain high-quality pseudo-parallel data to the Association for Computational Linguistics: Human Language, 2019,
train their transfer model, but they only considered content pp. 3168–3180.
[6] Z. Hu, Z. Yang, X. Liang, R. Salakhutdinov, and E. P. Xing, “Toward [25] S. Prabhumoye, Y. Tsvetkov, R. Salakhutdinov, and A. W. Black, “Style
controlled generation of text,” in Proceedings of the 34th International transfer through back-translation,” in Proceedings of the 56th Annual
Conference on Machine Learning, 2017, pp. 1587–1596. Meeting of the Association for Computational Linguistics (Volume 1:
[7] Z. Fu, X. Tan, N. Peng, D. Zhao, and R. Yan, “Style transfer in Long Papers), 2018, pp. 866–876.
text: Exploration and evaluation,” Association for the Advancement of [26] C. Zhou, L. Chen, J. Liu, X. Xiao, J. Su, S. Guo, and H. Wu, “Exploring
Artificial Intelligence, 2018. contextual word-level style relevance for unsupervised style transfer,”
[8] L. Logeswaran, H. Lee, and S. Bengio, “Content preserving text genera- in Proceedings of the 58th Annual Meeting of the Association for
tion with attribute controls,” in 32nd Conference on Neural Information Computational Linguistics, 2020, pp. 7135–7144.
Processing Systems (NeurIPS 2018), 2018, pp. 5103–5113. [27] X. Yi, Z. Liu, W. Li, and M. Sun, “Text style transfer via learning style
[9] Z. Yang, Z. Hu, C. Dyer, E. P. Xing, and T. Berg-Kirkpatrick, “Unsu- instance supported latent space,” in Proceedings of the Twenty-Ninth
pervised text style transfer using language models as discriminators,” International Joint Conference on Artificial Intelligence, IJCAI-20, 2020,
in Advances in Neural Information Processing Systems 31, S. Bengio, pp. 3801–3807.
H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Gar- [28] D. Bahdanau, K. Cho, and Y. Bengio., “Neural machine translation by
nett, Eds., 2018, pp. 7287–7298. jointly learning to align and translate.” in In Proceedings of ICLR, 2015.
[10] C.-T. Lai, Y.-T. Hong, H.-Y. Chen, C.-J. Lu, and S.-D. Lin, “Multiple [29] D. Liu, J. Fu, Y. Zhang, C. Pal, and J. Lv, “Revision in continuous
text style transfer by using word-level conditional generative adversarial space: Unsupervised text style transfer without adversarial learning,” in
network with two-phase training,” in Proceedings of the 2019 Confer- AAAI, 2020.
ence on Empirical Methods in Natural Language Processing and the [30] A. Sudhakar, B. Upadhyay, and A. Maheswaran, “Transforming delete,
9th International Joint Conference on Natural Language Processing retrieve, generate approach for controlled text style transfer,” in Pro-
(EMNLP-IJCNLP), 2019, pp. 3579–3584. ceedings of the 2019 Conference on Empirical Methods in Natural
[11] N. Dai, J. Liang, X. Qiu, and X. Huang, “Style transformer: Unpaired Language Processing and the 9th International Joint Conference on
text style transfer without disentangled latent representation,” Proceed- Natural Language Processing (EMNLP-IJCNLP), 2019, pp. 3269–3279.
ings of the 57th Annual Meeting of the Association for Computational [31] X. Wu, T. Zhang, L. Zang, J. Han, and S. Hu, “Mask and infill: Applying
Linguistics, pp. 5997–6007, 2019. masked language model for sentiment transfer,” in Proceedings of the
[12] A. Tikhonov, V. Shibaev, A. Nagaev, A. Nugmanova, and I. P. Twenty-Eighth International Joint Conference on Artificial Intelligence,
Yamshchikov, “Style transfer for texts: Retrain, report errors, compare IJCAI-19, 2019, pp. 5271–5277.
with rewrites,” in Proceedings of the 2019 Conference on Empirical
Methods in Natural Language Processing and the 9th International Joint
Conference on Natural Language Processing (EMNLP-IJCNLP), 2019,
pp. 3936–3945.
[13] G. Lample, S. Subramanian, E. Smith, L. Denoyer, M. Ranzato, and
Y.-L. Boureau, “Multiple-attribute text rewriting,” in International Con-
ference on Learning Representations, 2019.
[14] H. Ham, T. J. Jun, and D. Kim, “Unbalanced gans: Pre-training the gen-
erator of generative adversarial network using variational autoencoder,”
in ArXiv, 2020.
[15] J. Li, R. Jia, H. He, and P. Liang, “Delete, retrieve, generate: a simple
approach to sentiment and style transfer,” in Proceedings of the 2018
Conference of the North American Chapter of the Association for
Computational Linguistics: Human Language Technologies, Volume 1
(Long Papers), 2018, pp. 1865–1874.
[16] S. Rao and J. Tetreault, “Dear sir or madam, may I introduce the GYAFC
dataset: Corpus, benchmarks and metrics for formality style transfer,”
in Proceedings of the 2018 Conference of the North American Chapter
of the Association for Computational Linguistics: Human Language
Technologies, Volume 1 (Long Papers), 2018, pp. 129–140.
[17] P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol, “Extracting
and composing robust features with denoising autoencoders,” pp. 1096–
1103, 2008.
[18] G. Lample, M. Ott, A. Conneau, L. Denoyer, and M. Ranzato, “Phrase-
based & neural unsupervised machine translation,” in Proceedings
of the 2018 Conference on Empirical Methods in Natural Language
Processing, 2018, pp. 5039–5049.
[19] M. J. Kusner, Y. Sun, N. I. Kolkin, and K. Q. Weinberger, “From word
embeddings to document distances,” in Proceedings of the 32nd Inter-
national Conference on International Conference on Machine Learning,
2015, p. 957–966.
[20] Z. Jin, D. Jin, J. Mueller, N. Matthews, and E. Santus, “IMaT: Unsu-
pervised text attribute transfer via iterative matching and translation,” in
Proceedings of the 2019 Conference on Empirical Methods in Natural
Language Processing and the 9th International Joint Conference on
Natural Language Processing (EMNLP-IJCNLP), 2019, pp. 3097–3109.
[21] Y. Kim, “Convolutional neural networks for sentence classification,” in
Proceedings of the 2014 Conference on Empirical Methods in Natural
Language Processing (EMNLP), 2014, pp. 1746–1751.
[22] R. Mir, B. Felbo, N. Obradovich, and I. Rahwan, “Evaluating style
transfer for text,” in Proceedings of the 2019 Conference of the North
American Chapter of the Association for Computational Linguistics:
Human Language Technologies, Volume 1 (Long and Short Papers),
2019, pp. 495–504.
[23] T. Shen, T. Lei, R. Barzilay, and T. S. Jaakkola, “Style transfer from
non-parallel text by cross-alignment,” pp. 6830–6841, 2017.
[24] Z. Zhang, S. Ren, S. Liu, J. Wang, P. Chen, M. Li, M. Zhou, and E. Chen,
“Style transfer as unsupervised machine translation,” CoRR, 2018.