SE-DAE: Style-Enhanced Denoising Auto-Encoder For Unsupervised Text Style Transfer

SE-DAE: Style-Enhanced Denoising Auto-Encoder
for Unsupervised Text Style Transfer

Jicheng Li, Yang Feng*, Jiao Ou
Key Laboratory of Intelligent Information Processing,
Institute of Computing Technology, Chinese Academy of Sciences (ICT/CAS), Beijing, China
University of Chinese Academy of Sciences, Beijing, China
lijicheng@ict.ac.cn,fengyang@ict.ac.cn,oujiao17b@ict.ac.cn
Abstract—Text style transfer aims to change the style of [6]–[12].However, since these proposed style-transfer models
arXiv:2104.12977v1 [cs.CL] 27 Apr 2021
sentences while preserving the semantic meanings. Due to the (e.g., RL-based and GAN-based models) are composed of
lack of parallel data, the Denoising Auto-Encoder (DAE) is complicated neural networks, it is difficult to converge [13],
widely used in this task to model distributions of different
sentence styles. However, because of the conflict between the [14], which also requires over-complex and laborious training
target of the conventional denoising procedure and the target procedures. Furthermore, the complex training procedures also
of style transfer task, the vanilla DAE can not produce satisfying make it intricate to adapt these models to new style datasets.
enough results. To improve the transferability of the model, most To simplify the training procedure, Lample et al. [13] proposes
of the existing works combine DAE with various complicated training DAE-based models on the pseudo-parallel data, rather
unsupervised networks, which makes the whole system become
over-complex. In this work, we design a novel DAE model than using it in combination with any unsupervised neural
named Style-Enhanced DAE (SE-DAE), which is specifically network. Nevertheless, the pseudo-parallel data generated by
designed for the text style transfer task. Compared with previous back-translation is plagued by severe data-quality problem [3],
complicated style-transfer models, our model do not consist of which adversely affects the performance of the model.
any complicated unsupervised networks, but only relies on the In general, the conventional DAE-based models have the
high-quality pseudo-parallel data generated by a novel data
refinement mechanism. Moreover, to alleviate the conflict between following two conflicts: (1) the conflict between the goal of the
the targets of the conventional denoising procedure and the conventional denoising process and the goal of the style trans-
style transfer task, we propose another novel style denoising fer task; (2) the conflict between simplicity and performance of
mechanism, which is more compatible with the target of the the model. To resolve the above conflicts, we propose a novel
style transfer task. We validate the effectiveness of our model DAE model with two task-oriented mechanisms, namely the
on two style benchmark datasets. Both automatic evaluation
and human evaluation show that our proposed model is highly the Style-Enhanced Denoising Auto-Encoder (SE-DAE). We
competitive compared with previous strong the state of the art deal with the first conflict through a novel style denoising
(SOTA) approaches and greatly outperforms the vanilla DAE. mechanism, that is, we randomly “contaminate” the input
sentences with style words during the corruption process, and
I. I NTRODUCTION require the model to remove the contaminants and recover
Text style transfer is an intractable challenge, focusing the original sentence during the reconstruction process. In this
more on how to change the style-attribute of sentences while way, the goal of the style denoising mechanisms is compatible
preserving their semantics. Due to the lack of paired sentences, with the goal of the text style transfer task. Next, we propose a
the Denoising Auto-Encoder (DAE) has been used widely novel data refinement mechanism to select the best inference to
in existing style-transfer models to model distributions of construct a high-quality pseudo-parallel dataset, which solves
different styles in unsupervised manners. However, the target the data-quality problem. By using the high-quality pseudo-
of the conventional denoising process that requires models parallel data for training on the DAE model, we enhance the
to recover all the information in the sentences conflicts with transferability of the model without combining any complex
the target of the style transfer task that only requires models unsupervised networks. Finally, we evaluate the performance
to preserve the semantic information but transfer the style of our model on two benchmark datasets: Y ELP [15] for the
information. As a consequence, the vanilla DAE will regard sentiment transfer task and G YAFC [16] for the formality
rare style words as content words and preserve them in final transfer task. Both automatic and human evaluation show
transferred results. Thus, the vanilla DAE can not produce that our model is highly competitive compared with previous
satisfying enough results in this task. strong SOTAs and greatly outperforms the most commonly
To enhance the transferability of DAE, most of the pro- used vanilla DAE.
posed transfer models combine DAE with different unsu- II. BACKGROUND
pervised neural networks, especially Reinforcement Learning
A. Denoising Auto-Encoder (DAE)
(RL) [1]–[5] and Generative Adversarial Networks (GANs)
DAE [17] is an unsupervised neural network that aims
*The corresponding author is Yang Feng. to learn robust representations by corrupting input pattern
DAE Noise
SX
<latexit sha1_base64="SXy1yXnUiI/AMTsj4KJS6o83wA8=">AAAC1HicjVHLSsNAFD2N7/qqunQTLIKrMhFf3RXcuFS0D7ClTKbTNpgXyUSQ2pW49Qfc6jeJf6B/4Z0xBV0UnZDkzLn3nJl7rxv7XqoYey9YM7Nz8wuLS8XlldW19dLGZiONskTIuoj8KGm5PJW+F8q68pQvW3EieeD6sunenOp481YmqReFV+oulp2AD0Kv7wmuiOqW1i+7o3bA1VBwf9Qaj7ulMqvsH7LqCbNZhZlF4JA51SPHdnKmjHydR6U3tNFDBIEMASRCKMI+OFJ6ruGAISaugxFxCSHPxCXGKJI2oyxJGZzYG/oOaHedsyHttWdq1IJO8elNSGljlzQR5SWE9Wm2iWfGWbPTvEfGU9/tjv5u7hUQqzAk9i/dJPO/Ol2LQh8npgaPaooNo6sTuUtmuqJvbv+oSpFDTJzGPYonhIVRTvpsG01qate95Sb+YTI1q/ciz83wqW9JA55M0Z4OGvsVh1Wci4NyjeWjXsQ2drBH8zxGDWc4R93M/BkveLUa1r31YD1+p1qFXLOFX8t6+gIzI5YO</latexit>
SX
<latexit sha1_base64="SXy1yXnUiI/AMTsj4KJS6o83wA8=">AAAC1HicjVHLSsNAFD2N7/qqunQTLIKrMhFf3RXcuFS0D7ClTKbTNpgXyUSQ2pW49Qfc6jeJf6B/4Z0xBV0UnZDkzLn3nJl7rxv7XqoYey9YM7Nz8wuLS8XlldW19dLGZiONskTIuoj8KGm5PJW+F8q68pQvW3EieeD6sunenOp481YmqReFV+oulp2AD0Kv7wmuiOqW1i+7o3bA1VBwf9Qaj7ulMqvsH7LqCbNZhZlF4JA51SPHdnKmjHydR6U3tNFDBIEMASRCKMI+OFJ6ruGAISaugxFxCSHPxCXGKJI2oyxJGZzYG/oOaHedsyHttWdq1IJO8elNSGljlzQR5SWE9Wm2iWfGWbPTvEfGU9/tjv5u7hUQqzAk9i/dJPO/Ol2LQh8npgaPaooNo6sTuUtmuqJvbv+oSpFDTJzGPYonhIVRTvpsG01qate95Sb+YTI1q/ciz83wqW9JA55M0Z4OGvsVh1Wci4NyjeWjXsQ2drBH8zxGDWc4R93M/BkveLUa1r31YD1+p1qFXLOFX8t6+gIzI5YO</latexit>
B I era ve Back Trans a on
x
The s y e mode ng process can on y he p he DAE o mode
x x0
<latexit sha1_base64="nsNqGIJvuCcjwZsOm8EAOl+5nm8=">AAACyHicjVHLTsJAFD3UF+ILdemmkZi4IlMiCjsSN8YVJhZIgJi2DDihtE07VQlx4w+41S8z/oH+hXfGkuiC6DRt75x7zpm597qRLxLJ2HvOWFpeWV3Lrxc2Nre2d4q7e60kTGOP217oh3HHdRLui4DbUkifd6KYOxPX5213fK7y7TseJyIMruU04v2JMwrEUHiOJMjuuUPz4aZYYuVKldVrzGRlphcFVWbVTy3TypASstUMi2/oYYAQHlJMwBFAUuzDQUJPFxYYIsL6mBEWUyR0nuMRBdKmxOLEcAgd03dEu26GBrRXnolWe3SKT29MShNHpAmJF1OsTjN1PtXOCl3kPdOe6m5T+ruZ14RQiVtC/9LNmf/VqVokhqjpGgTVFGlEVedlLqnuirq5+aMqSQ4RYSoeUD6m2NPKeZ9NrUl07aq3js5/aKZC1d7LuCk+1S1pwPMpmouDVqVssbJ1dVJqsGzUeRzgEMc0zzM0cIEmbPIWeMYLXo1LIzLujek31chlmn38WsbTF2wikQU=</latexit>
<latexit
LSM he d s r bu on of each s y e bu no he way of ransferr ng

Encoder Decoder
<latexit sha1_base64="jYq6gQL3zG7YaO0wVq7T+8xnv/4=">AAACyHicjVHLTsJAFD3UF+ILdemmkZi4IlMiCjsSN8YVJhZIgJi2DDihtE07VQlx4w+41S8z/oH+hXfGkuiC6DRt75x7zpm597qRLxLJ2HvOWFpeWV3Lrxc2Nre2d4q7e60kTGOP217oh3HHdRLui4DbUkifd6KYOxPX5213fK7y7TseJyIMruU04v2JMwrEUHiOJMjuuUPz4aZYYuVKldVrzGRlphcFVWbVTy3TypASstUMi2/oYYAQHlJMwBFAUuzDQUJPFxYYIsL6mBEWUyR0nuMRBdKmxOLEcAgd03dEu26GBrRXnolWe3SKT29MShNHpAmJF1OsTjN1PtXOCl3kPdOe6m5T+ruZ14RQiVtC/9LNmf/VqVokhqjpGgTVFGlEVedlLqnuirq5+aMqSQ4RYSoeUD6m2NPKeZ9NrUl07aq3js5/aKZC1d7LuCk+1S1pwPMpmouDVqVssbJ1dVJqnGSjzuMAhzimeZ6hgQs0YZO3wDNe8GpcGpFxb0y/qUYu0+zj1zKevgBtVpEJ</latexit>
<latexit
<latexit sha1_base64="oUmKiGYReIdAqtB25IUaHxAqrGo=">AAAC3HicjVHLSsNAFD2N7/qqunDhJlgEV2UiPncFNy4UFG0VWi2TcdRgXiQTsYTu3Ilbf8Ctfo/4B/oX3hlT0IXohCR3zj3nzL1z3dj3UsXYW8kaGBwaHhkdK49PTE5NV2Zmm2mUJUI2RORHyYnLU+l7oWwoT/nyJE4kD1xfHrvX2zp/fCOT1IvCI9WN5WnAL0PvwhNcEdSpzLcDrq4E9/PdXidvK3mr8sO9Xq9TqbLayhrb2mQ2qzGzKFhjzta6YzsFUkWx9qPKK9o4RwSBDAEkQiiKfXCk9LTggCEm7BQ5YQlFnslL9FAmbUYsSQxO6DV9L2nXKtCQ9tozNWpBp/j0JqS0sUSaiHgJxfo02+Qz46zR37xz46lr69LfLbwCQhWuCP1L12f+V6d7UbjApunBo55ig+juROGSmVvRldvfulLkEBOm43PKJxQLo+zfs200qeld3y03+XfD1Kjei4Kb4UNXSQPuT9H+PWiu1BxWcw5Wq3VWjHoUC1jEMs1zA3XsYB8NU/8TnvFinVl31r318EW1SoVmDj+W9fgJlxGZuA==</latexit>
<latexit sha1_base64="9t1H3WKt8JB5YmHk2GTuKgdp8QM=">AAACyXicjVHLTsJAFD3UF+ILdemmkRhdNVMCCjsSNyZuMJFHIsS0ZcBKaWs7NSBx5Q+41R8z/oH+hXfGkuiC6DRt75x7zpm599qh58aCsfeMtrC4tLySXc2trW9sbuW3d5pxkEQObziBF0Rt24q55/q8IVzh8XYYcWtke7xlD09lvnXPo9gN/EsxCXl3ZA18t+86liCo2bH7+vjwOl9gRrHMqhWmM4OpRUGZmdVjUzdTpIB01YP8GzroIYCDBCNw+BAUe7AQ03MFEwwhYV1MCYsoclWe4xE50ibE4sSwCB3Sd0C7qxT1aS89Y6V26BSP3oiUOg5IExAvoliepqt8opwlOs97qjzl3Sb0t1OvEaECN4T+pZsx/6uTtQj0UVE1uFRTqBBZnZO6JKor8ub6j6oEOYSEybhH+YhiRylnfdaVJla1y95aKv+hmBKVeyflJviUt6QBz6aozw+aRcNkhnlRKtRK6aiz2MM+jmieJ6jhDHU0yPsWz3jBq3au3Wlj7eGbqmVSzS5+Le3pC/F6kTo=</latexit>
y
<latexit sha1_base64="YOPoKe+eMiFVzKBBXs50k586p5w=">AAACyHicjVHLTsJAFD3UF+ILdemmkZi4aqaKijsSN8YVJhZIgJi2DDihtE071RDixh9wq19m/AP9C++MJdEF0Wna3jn3nDNz7/XiQKSSsfeCsbC4tLxSXC2trW9sbpW3d5pplCU+d/woiJK256Y8ECF3pJABb8cJd8dewFve6ELlW/c8SUUU3shJzHtjdxiKgfBdSZDT9Qbm5LZcYdYJs89PmcksppcOavaxbdo5UkG+GlH5DV30EcFHhjE4QkiKA7hI6enABkNMWA9TwhKKhM5zPKJE2oxYnBguoSP6DmnXydGQ9soz1WqfTgnoTUhp4oA0EfESitVpps5n2lmh87yn2lPdbUJ/L/caEypxR+hfuhnzvzpVi8QANV2DoJpijajq/Nwl011RNzd/VCXJISZMxX3KJxT7Wjnrs6k1qa5d9dbV+Q/NVKja+zk3w6e6JQ14NkVzftA8smxm2dfVSr2aj7qIPezjkOZ5hjou0YBD3gLPeMGrcWXExoMx+aYahVyzi1/LePoCVEGQ/g==</latexit>
<latexit sha1_base64="VLZZe7q9UaOe/9dmgMs2Zf5LPj0=">AAACyXicjVHLTsJAFD3UF+ILdemmkRhdNVNFxR2JGxM3mMgjAWLaMmClL9upEYkrf8Ct/pjxD/QvvDOWRBdEp2l759xzzsy91448NxGMvee0mdm5+YX8YmFpeWV1rbi+0UjCNHZ43Qm9MG7ZVsI9N+B14QqPt6KYW77t8aY9PJX55h2PEzcMLsUo4l3fGgRu33UsQVCjY/f10e5VscSMQ2aeHDGdGUwtFVTMA1M3M6SEbNXC4hs66CGEgxQ+OAIIij1YSOhpwwRDRFgXY8JiilyV53hEgbQpsTgxLEKH9B3Qrp2hAe2lZ6LUDp3i0RuTUscOaULixRTL03SVT5WzRKd5j5WnvNuI/nbm5RMqcE3oX7oJ8786WYtAHxVVg0s1RQqR1TmZS6q6Im+u/6hKkENEmIx7lI8pdpRy0mddaRJVu+ytpfIfiilRuXcybopPeUsa8GSK+vSgsW+YzDAvyqVqORt1HlvYxh7N8xhVnKGGOnnf4BkveNXOtVvtXnv4pmq5TLOJX0t7+gLYZpEv</latexit>
y0 y LSM
<latexit sha1_base64="HDEvgZz3YGoTVTKDjukb4lz7sZI=">AAACyHicjVHLSsNAFD2Nr1pfVZdugkVwVSY+667gRlxVMG2hLZKk0zqYF8lECcWNP+BWv0z8A/0L74wp6KLohCR3zj3nzNx73dgXqWTsvWTMzS8sLpWXKyura+sb1c2tdhplicdtL/KjpOs6KfdFyG0ppM+7ccKdwPV5x707V/nOPU9SEYXXMo/5IHDGoRgJz5EE2X13ZOY31RqrHzPr7ISZrM700kHDOrRMq0BqKFYrqr6hjyEieMgQgCOEpNiHg5SeHiwwxIQNMCEsoUjoPMcjKqTNiMWJ4RB6R98x7XoFGtJeeaZa7dEpPr0JKU3skSYiXkKxOs3U+Uw7K3SW90R7qrvl9HcLr4BQiVtC/9JNmf/VqVokRmjoGgTVFGtEVecVLpnuirq5+aMqSQ4xYSoeUj6h2NPKaZ9NrUl17aq3js5/aKZC1d4ruBk+1S1pwNMpmrOD9kHdYnXr6qjWZMWoy9jBLvZpnqdo4gIt2OQt8IwXvBqXRmw8GPk31SgVmm38WsbTF1MNkPo=</latexit>
<latexit sha1_base64="jD2XKQrlNphoEiPvQv86D3xPFN8=">AAAC3HicjVHLSsNAFD3G97vqwoWbYBFclYlvd4IbFwqKVoVWy2Q6tcG8SCaihO7ciVt/wK1+j/gH+hfeGVPQRdEJSe6ce86Ze+e6se+lirH3Pqt/YHBoeGR0bHxicmq6NDN7mkZZImRVRH6UnLs8lb4XyqrylC/P40TywPXlmXu9q/NnNzJJvSg8UXexvAj4Vei1PMEVQY3SfD3gqi24n+93GnldyVuVHx90Oo1SmVXWmbO9wWxWYWaZYMtZdWynQMoo1mFUekMdTUQQyBBAIoSi2AdHSk8NDhhiwi6QE5ZQ5Jm8RAdjpM2IJYnBCb2m7xXtagUa0l57pkYt6BSf3oSUNpZIExEvoVifZpt8Zpw12ss7N566tjv6u4VXQKhCm9C/dF3mf3W6F4UWtkwPHvUUG0R3JwqXzNyKrtz+0ZUih5gwHTcpn1AsjLJ7z7bRpKZ3fbfc5D8MU6N6Lwpuhk9dJQ24O0W7d3C6UnFYxTlaK++wYtQjWMAilmmem9jBHg5RNfU/4wWv1qV1bz1Yj99Uq6/QzOHXsp6+AHucmaw=</latexit>
sen ences from one s y e o ano her s y e Mos of he prev ous

works comb ne he DAE w h a comp ca ed unsuperv sed
DAE Noise
SY
<latexit sha1_base64="6o3K/ONxN3Xso4fBtrDpltyc
sha1_base64="6o3K/ONxN3Xso4fBtrDpltyctBA=">AAAC1HicjVHLTsMwEBzCqzxb4MglokLiVDm8uVXiwhEELUVtVTnGbaPmpcRBQqUnxJUf4ArfhPgD+AvWJpXggMBRkvHszti768a+lyrG3iasyanpmdnC3PzC4tJysbSyWk+jLBGyJiI/ShouT6XvhbKmPOXLRpxIHri+vHQHxzp+eSOT1IvCC3Uby3bAe6HX9QRXRHVKxfPOsBVw1RfcH16NRp1SmVX2mHO0z2xWYWYZcOjsOLaTM2Xk6zQqvaKFa0QQyBBAIoQi7IMjpacJBwwxcW0MiUsIeSYuMcI8aTPKkpTBiR3Qt0e7Zs6GtNeeqVELOsWnNyGljU3SRJSXENan2SaeGWfN/uY9NJ76brf0d3OvgFiFPrF/6caZ/9XpWhS6ODQ1eFRTbBhdnchdMtMVfXP7W1WKHGLiNL6meEJYGOW4z7bRpKZ23Vtu4u8mU7N6L/LcDB/6ljTg8RTt30F9u+KwinO2W66yfNQFrGMDWzTPA1RxglPUzMyf8IwXq27dWffWw1eqNZFr1vBjWY+fGhCWAw==</latexit>
SY mode s ke RL and GANs o earn how o ransfer s y es of
sen ences A hough h s comb na ons can crea e a powerfu
F g 1 S y e mode ng p ocess o s y e X and Y “DAE No se” ep esen s s y e- ransfer mode makes he ra n ng procedure of he
dropp ng and shu fl ng wo ds “SX ” and “SY ” ep esen embedd ng o s y e
X and Y espec ve y en re mode becomes over-comp ex Ins ead of seek ng he ps
from comp ca ed unsuperv sed mode s we ra n he DAE o
par a y wh ch can be u zed o n a ze deep neura ar- ransfer be ween s y es by era ve back- rans a on [13] o
ch ec ures In genera he ra n ng process of he DAE mode keep he s mp c y of DAE o he grea es ex en poss b e
s composed of corrup on and recons ruc on Every epoch we use he DAE a he as epoch o back-
For an npu s n da ase D DAE mode firs corrup s o rans a e he unpa red da a n o he pseudo-para e da a and
a no se vers on c(s) and hen encoder E(c(s) θE ) maps c(s) con nue o ra n upon he recen y genera e pseudo-para e
n o a a en represen a on z da a Spec fica y we u ze he DAE mode a epoch t named
MtX ⇔Y o back- rans a e x and y o ŷ of s y e Y and x̂ of X
z = E(c(s) θE ) (1) respec ve y and pa r he correspond ng sen ences o form he
pseudo-para e da a ŷ x and x̂ y Then we ra n MtX ⇔Y
Then decoder D(z θD ) recons ruc s he or g na npu s
on he recen y genera ed pseudo-para e da a and ob a n he
cond on on z
DAE mode a epoch t+1 named Mt+1 X ⇔Y In genera he oss
ŝ = D(z θD ) (2)
of era ve back rans a on n each epoch can be represen ed
F na y he ob ec ve of he DAE s o m n m ze he fo ow ng as fo ow
oss
LBT = og P(y x̂) + og P(x ŷ) (5)
LDAE = Es∼D ŝ∼D(E(c(s))) ∆(s ŝ) (3)
where ∆(s ŝ) s a measure of he d screpancy be ween s IV O UR M ETHOD
and ŝ In h s sec on we w n roduce our s y e-enhanced DAE
B Tex S y e Trans er Task mode w h wo ask-or en ed modu es as shown n he F gure
G ven a se of corpus w h d fferen s y es {D }K=1 where 2
each corpus on y con a ns sen ences w h he same s y e A Pseudo-para e Da a Refinemen
DX = {x }n=1 s he corpus of source s y e X and DY =
Unfor una e y a hough he pseudo-para e da a can he p
{y }m=1 s he corpus of arge s y e Y (Y = X ) For each
DAE o earn how o ransfer s y es of sen ences suffers
sen ences x n DX he ex s y e ransfer ask a ms o rewr e x
from he da a-qua y prob em [3] wh ch m s he perfor-
n o xY w h he a r bu es of des red s y e Y and he seman cs
mance of DAE n h s ask In h s work we propose an new
mean ng of x
mechan sm o refine he pseudo-para e da a by find ng he
III BAS C S TRUCTURE bes nference n he h s ory of a ra n ng epochs
A S y e Mode ng In epoch t for each sen ence x of s y e X we append he
back- rans a ed sen ence ŷt of s y e Y n o he cand da es se
L ke Lamp e e a [18] mode ng d fferen anguages v a
of x Cx = {ŷ0 ŷ1 ŷt } S nce we prefer o choose ŷ
deno se au o-encod ng DAE can be a so used o mode d s r -
wh ch preserves he mos seman cs of x and has he h ghes
bu ons of d fferen s y es as shown n F gure 1 L kew se we
n ens y owards Y Thus we des gn a scor ng func on CS
corrup he npu sen ences and ra n he DAE o use he cor-
wh ch es ma es he qua y of each ŷ n Cx from wo aspec s
rup ed sen ence as npu o recons ruc he or g na sen ences
the semant c s m ar ty between x and ŷ and the sty e
More spec fica y when corrup ng sen ences we cons der
ntens ty towards Y S m ar y we adop same opera ons o
wo ypes of no ses n our work when corrup ng sen ences
he sen ence x̂ n he cand da e se Cy
dropp ng and shu fl ng npu okens Bes des o un fy a en 1) Seman c S m ar y: In h s work we adop he Word
spaces of d fferen s y es we share encoders and decoders Mover D s ance (WMD) [19] o es ma e he seman c s m -
be ween s y es and u ze s y e embedd ng (for ns ance SX ar y be ween ŷ and x
and SY for s y e X and Y respec ve y) o nd ca e d fferen
s y es dur ng ra n ng and nference A as he oken- eve Con(ŷ x) = −WMD(ŷ x) (6)
cross-en ropy s used o measure he d screpancy be ween he
Compared w h o her s m ar y measures such as ke co-
or g na sen ences and he recons ruc ed sen ences
s ne s m ar y WMD focus more on he cumu a ve cos of
Overa he oss of s y e mode ng can be expressed as
ransform ng sen ence ŷ n o x Bes des WMD cons ders he
LSM = og P(x c(x) SX ) + og P(y c(y) SY ) (4) en re seman cs of he sen ence and can effec ve y hand e he
Epo h C nd S o
x ŷ
0 ŷ R mm Y Y y y Y y
1 ŷ
C
m m 2 ŷ m y y y y y y
m y⇤ Y
… …
w
st 1 ŷ Y Y Y Y
ŷ
S S SX
<latexit sha1_base64="fF9353OlvxorrkENq62PvHKrTpE=">AAAC1HicjVHLSsNAFD2Nr1ofrbp0EyyCq5KIoMuCG5cV7QPaUibTaRuaF8lEKLUrcesPuNVvEv9A/8I7YwpqEZ2Q5My559yZe68TeW4iLes1Zywtr6yu5dcLG5tb28XSzm4jCdOYizoPvTBuOSwRnhuIunSlJ1pRLJjveKLpjM9VvHkj4sQNg2s5iUTXZ8PAHbicSaJ6peJVb9rxmRxx5k1bs1mvVLYqll7mIrAzUEa2amHpBR30EYIjhQ+BAJKwB4aEnjZsWIiI62JKXEzI1XGBGQrkTUklSMGIHdN3SLt2xga0VzkT7eZ0ikdvTE4Th+QJSRcTVqeZOp7qzIr9LfdU51R3m9DfyXL5xEqMiP3LN1f+16dqkRjgTNfgUk2RZlR1PMuS6q6om5tfqpKUISJO4T7FY8JcO+d9NrUn0bWr3jIdf9NKxao9z7Qp3tUtacD2z3EugsZxxbYq9uVJuWplo85jHwc4onmeoooL1FDXM3/EE56NhnFr3Bn3n1Ijl3n28G0ZDx/I7pXg</latexit>
SX
<latexit sha1_base64="fF9353OlvxorrkENq62PvHKrTpE=">AAAC1HicjVHLSsNAFD2Nr1ofrbp0EyyCq5KIoMuCG5cV7QPaUibTaRuaF8lEKLUrcesPuNVvEv9A/8I7YwpqEZ2Q5My559yZe68TeW4iLes1Zywtr6yu5dcLG5tb28XSzm4jCdOYizoPvTBuOSwRnhuIunSlJ1pRLJjveKLpjM9VvHkj4sQNg2s5iUTXZ8PAHbicSaJ6peJVb9rxmRxx5k1bs1mvVLYqll7mIrAzUEa2amHpBR30EYIjhQ+BAJKwB4aEnjZsWIiI62JKXEzI1XGBGQrkTUklSMGIHdN3SLt2xga0VzkT7eZ0ikdvTE4Th+QJSRcTVqeZOp7qzIr9LfdU51R3m9DfyXL5xEqMiP3LN1f+16dqkRjgTNfgUk2RZlR1PMuS6q6om5tfqpKUISJO4T7FY8JcO+d9NrUn0bWr3jIdf9NKxao9z7Qp3tUtacD2z3EugsZxxbYq9uVJuWplo85jHwc4onmeoooL1FDXM3/EE56NhnFr3Bn3n1Ijl3n28G0ZDx/I7pXg</latexit>
x Encoder Decoder x⇤ Encoder Decoder yX x

<latexit
< latexit sha1_base64="MnxtYw8HQEsoFq0bRa85UDWnKwc=">AAACyXicjVHLSsNAFD2Nr1pfVZdugkVwVRIp6LLgRnBTwT6gLZKk0zo2L5OJtJau/AG3+mPiH+hfeGecglpEJyQ5c+49Z+be68Y+T4VlveaMhcWl5ZX8amFtfWNzq7i900ijLPFY3Yv8KGm5Tsp8HrK64MJnrThhTuD6rOkOT2W8eceSlEfhpRjHrBs4g5D3uecIohodtz8ZTa+KJatsqWXOA1uDEvSqRcUXdNBDBA8ZAjCEEIR9OEjpacOGhZi4LibEJYS4ijNMUSBtRlmMMhxih/Qd0K6t2ZD2
sha1_base64="MnxtYw8HQEsoFq0bRa85UDWnKwc=">AAACyXicjVHLSsNAFD2Nr1pfVZdugkVwVRIp6LLgRnBTwT6gLZKk0zo2L5OJtJau/AG3+mPiH+hfeGecglpEJyQ5c+49Z+be68Y+T4VlveaMhcWl5ZX8amFtfWNzq7i900ijLPFY3Yv8KGm5Tsp8HrK64MJnrThhTuD6rOkOT2W8eceSlEfhpRjHrBs4g5D3uecIohodtz8ZTa+KJatsqWXOA1uDEvSqRcUXdNBDBA8ZAjCEEIR9OEjpacOGhZi4LibEJYS4ijNMUSBtRlmMMhxih/Qd0K6t2ZD20jNVao9O8elNSGnigDQR5SWE5WmmimfKWbK/eU+Up7zbmP6u9gqIFbgm9i/dLPO/OlmLQB8nqgZONcWKkdV52iVTXZE3N79UJcghJk7iHsUTwp5SzvpsKk2qape9dVT8TWVKVu49nZvhXd6SBmz/HOc8aByVbatsX1RK1YoedR572MchzfMYVZyhhjp53+ART3g2zo1bY2Tcf6YaOa3ZxbdlPHwALIiRvQ==</latexit>
<latexit sha1_base64="YGcWZuG5DS6m2K0vHpam8qhbdPU=">AAACz3icjVHLSsNAFD2Nr1pfVZdugkVwVRIp6LLgxmUL9gFNkUk6bUPzIpmoJVTc+gNu9a/EP9C/8M6YglpEJyQ5c+49Z+bea0eemwjDeC1oS8srq2vF9dLG5tb2Tnl3r52EaezwlhN6Ydy1WcI9N+At4QqPd6OYM9/2eMeenMt455rHiRsGl2Ia8b7PRoE7dB0miLIse5hZYyay29nsqlwxqoZa+iIwc1BBvhph+QUWBgjhIIUPjgCCsAeGhJ4eTBiIiOsjIy4m5Ko4xwwl0qaUxSmDETuh74h2vZwNaC89E6V26BSP3piUOo5IE1JeTFiepqt4qpwl+5t3pjzl3ab0t3Mvn1iBMbF/6eaZ/9XJWgSGOFM1uFRTpBhZnZO7pKor8ub6l6oEOUTESTygeEzYUcp5n3WlSVTtsrdMxd9UpmTl3slzU7zLW9KAzZ/jXATtk6ppVM1mrVKv5aMu4gCHOKZ5nqKOCzTQIu8Ij3jCs9bUbrQ77f4zVSvkmn18W9rDBzz0lIo=</latexit>
x̂
P udo
<latexit sha1_base64="eNDsxk1loXAMu2dOKa24nUiOFZk=">AAACy3icjVHLSsNAFD2Nr1pfVZdugkUQFyWRgi4LbtwIFewD2irJdFpD82IyEWvt0h9wq/8l/oH+hXfGFNQiOiHJmXPPuTP3Xjf2vURa1mvOmJtfWFzKLxdWVtfWN4qbW40kSgXjdRb5kWi5TsJ9L+R16Umft2LBncD1edMdnqh484aLxIvCCzmKeTdwBqHX95gjiWp13P749vJgclUsWWVLL3MW2BkoIVu1qPiCDnqIwJAiAEcISdiHg4SeNmxYiInrYkycIOTpOMcEBfKmpOKkcIgd0ndAu3bGhrRXORPtZnSKT68gp4k98kSkE4TVaaaOpzqzYn/LPdY51d1G9HezXAGxEtfE/uWbKv/rU7VI9HGsa/CoplgzqjqWZUl1V9TNzS9VScoQE6dwj+KCMNPOaZ9N7Ul07aq3jo6/aaVi1Z5l2hTv6pY0YPvnOGdB47BsW2X7vFKqVrJR57GDXezTPI9QxSlqqOs5PuIJz8aZkRh3xv2n1Mhlnm18W8bDB8BBklk=</latexit> <latexit sha1_base64="vFMIFzlWHARlD7DnVcs7ABLDJrA=">AAAC2nicjVHLSsNAFD3GV62vqrhyEyyCq5JIQZeCG5cK9gGNlMk4taF5kUyEErJxJ279Abf6QeIf6F94Z0xBLaITkpw5954zc+91Y99LpWW9zhizc/MLi5Wl6vLK6tp6bWOznUZZwkWLR36UdF2WCt8LRUt60hfdOBEscH3RcUcnKt65EUnqReGFHMfiMmDXoTfwOJNE9WvbVccd5ON+7gRMDjnz825RFP1a3WpYepnTwC5BHeU6i2ovcHCFCBwZAgiEkIR9MKT09GDDQkzcJXLiEkKejgsUqJI2oyxBGYzYEX2vadcr2ZD2yjPVak6n+PQmpDSxR5qI8hLC6jRTxzPtrNjfvHPtqe42pr9begXESgyJ/Us3yfyvTtUiMcCRrsGjmmLNqOp46ZLprqibm1+qkuQQE6fwFcUTwlwrJ302tSbVtaveMh1/05mKVXte5mZ4V7ekAds/xzkN2gcN22rY5836cbMcdQU72MU+zfMQxzjFGVrkneMRT3g2HOPWuDPuP1ONmVKzhW/LePgASGuYbA==</latexit>
L
p S
shared shared shared shared
d No
fin m n
y Decoder y
Encoder ŷ
<latexit sha1_base64="M1y28yvQ4gsZwE+/01cbYZKY+Tw=">AAACz3icjVHLSsNAFD2Nr1pfVZdugkVwVRIRdFlw47IF+4CmSDKdtqFpEpKJUkLFrT/gVv9K/AP9C++MU1CL6IQkZ86958zce7048FNhWa8FY2l5ZXWtuF7a2Nza3inv7rXSKEsYb7IoiJKO56Y88EPeFL4IeCdOuDvxAt72xhcy3r7hSepH4ZWYxrw3cYehP/CZK4hyHG+QOyNX5NPZ7LpcsaqWWuYisDWoQK96VH6Bgz4iMGSYgCOEIBzARUpPFzYsxMT1kBOXEPJVnGOGEmkzyuKU4RI7pu+Qdl3NhrSXnqlSMzoloDchpYkj0kSUlxCWp5kqnilnyf7mnStPebcp/T3tNSFWYETsX7p55n91shaBAc5VDT7VFCtGVse0S6a6Im9ufqlKkENMnMR9iieEmVLO+2wqTapql711VfxNZUpW7pnOzfAub0kDtn+OcxG0Tqq2VbUbp5XaqR51EQc4xDHN8ww1XKKOJnnHeMQTno2GcWvcGfefqUZBa/bxbRkPHz9WlIs=</latexit>
<latexit sha1_base64="X6mFWgoNfhModj4/867WgQI4twQ=">AAACy3icjVHLSsNAFD2Nr1pfVZdugkUQFyWRgi4LbtwIFewD2ipJOq2DaRImE6HWLv0Bt/pf4h/oX3hnnIJaRCckOXPuOXfm3usnIU+l47zmrLn5hcWl/HJhZXVtfaO4udVI40wErB7EYSxavpeykEesLrkMWSsRzBv6IWv6Nycq3rxlIuVxdCFHCesOvUHE+zzwJFGtjt8fjy4PJlfFklN29LJngWtACWbV4uILOughRoAMQzBEkIRDeEjpacOFg4S4LsbECUJcxxkmKJA3IxUjhUfsDX0HtGsbNqK9yplqd0CnhPQKctrYI09MOkFYnWbreKYzK/a33GOdU91tRH/f5BoSK3FN7F++qfK/PlWLRB/HugZONSWaUdUFJkumu6Jubn+pSlKGhDiFexQXhAPtnPbZ1p5U16566+n4m1YqVu0Do83wrm5JA3Z/jnMWNA7LrlN2zyulasWMOo8d7GKf5nmEKk5RQ13P8RFPeLbOrNS6s+4/pVbOeLbxbVkPH8Kkklo=</latexit>
y⇤ Encoder Decoder xY <latexit sha1_base64="evR+tuGTgqwd4IuYgT1TXMxt+VM=">AAAC2nicjVHLSsNAFD3GV62vqLhyEyyCq5JIQZeCG5cVbKu0UibjtA3Ni2QiSsjGnbj1B9zqB4l/oH/hnTGCD0QnJDlz7j1n5t7rxr6XStt+njAmp6ZnZitz1fmFxaVlc2W1nUZZwkWLR36UnLgsFb4Xipb0pC9O4kSwwPVFxx0fqHjnQiSpF4XH8ioWZwEbht7A40wS1TfXqz13kF/2817A5IgzPz8tiqJv1uy6rZf1EzglqKFczch8Qg/niMCRIYBACEnYB0NKTxcObMTEnSEnLiHk6bhAgSppM8oSlMGIHdN3SLtuyYa0V56pVnM6xac3IaWFLdJElJcQVqdZOp5pZ8X+5p1rT3W3K/q7pVdArMSI2L90H5n/1alaJAbY0zV4VFOsGVUdL10y3RV1c+tTVZIcYuIUPqd4Qphr5UefLa1Jde2qt0zHX3SmYtWel7kZXtUtacDO93H+BO2dumPXnaNGbb9RjrqCDWxim+a5i30cookWeee4xwMejZ5xbdwYt++pxkSpWcOXZdy9AUhfmGw=</latexit>
L
S S SY
<latexit sha1_base64="53X23yLu4AO8HFJw63Xp09s0VG8=">AAAC1HicjVHLSsNAFD3GV62PVl26CRbBVUlE0GXBjcuK9iFtKZPptA3Ni2QilNqVuPUH3Oo3iX+gf+GdMQW1iE5Icubcc+7MvdeJPDeRlvW6YCwuLa+s5tby6xubW4Xi9k49CdOYixoPvTBuOiwRnhuImnSlJ5pRLJjveKLhjM5UvHEj4sQNgys5jkTHZ4PA7bucSaK6xcJld9L2mRxy5k2up9NusWSVLb3MeWBnoIRsVcPiC9roIQRHCh8CASRhDwwJPS3YsBAR18GEuJiQq+MCU+TJm5JKkIIRO6LvgHatjA1or3Im2s3pFI/emJwmDsgTki4mrE4zdTzVmRX7W+6JzqnuNqa/k+XyiZUYEvuXb6b8r0/VItHHqa7BpZoizajqeJYl1V1RNze/VCUpQ0Scwj2Kx4S5ds76bGpPomtXvWU6/qaVilV7nmlTvKtb0oDtn+OcB/Wjsm2V7YvjUsXKRp3DHvZxSPM8QQXnqKKmZ/6IJzwbdePWuDPuP6XGQubZxbdlPHwAy1CV4Q==</latexit>
SY
<latexit sha1_base64="53X23yLu4AO8HFJw63Xp09s0VG8=">AAAC1HicjVHLSsNAFD3GV62PVl26CRbBVUlE0GXBjcuK9iFtKZPptA3Ni2QilNqVuPUH3Oo3iX+gf+GdMQW1iE5Icubcc+7MvdeJPDeRlvW6YCwuLa+s5tby6xubW4Xi9k49CdOYixoPvTBuOiwRnhuImnSlJ5pRLJjveKLhjM5UvHEj4sQNgys5jkTHZ4PA7bucSaK6xcJld9L2mRxy5k2up9NusWSVLb3MeWBnoIRsVcPiC9roIQRHCh8CASRhDwwJPS3YsBAR18GEuJiQq+MCU+TJm5JKkIIRO6LvgHatjA1or3Im2s3pFI/emJwmDsgTki4mrE4zdTzVmRX7W+6JzqnuNqa/k+XyiZUYEvuXb6b8r0/VItHHqa7BpZoizajqeJYl1V1RNze/VCUpQ0Scwj2Kx4S5ds76bGpPomtXvWU6/qaVilV7nmlTvKtb0oDtn+OcB/Wjsm2V7YvjUsXKRp3DHvZxSPM8QQXnqKKmZ/6IJzwbdePWuDPuP6XGQubZxbdlPHwAy1CV4Q==</latexit>
ba k an a on w h MX ,Y fin m n & y no y an a n nng o MX+,Y
F g 2 The a n ng p ocess o ou mode MtX ⇔Y ep esen s he s y e ans e mode a epoch t wh ch s u zed o gene a e pseudo pa a e da a o a n
Mt+1
X ⇔Y
synonym c-s m ar y a he word- eve [20] The ca cu a on B S y e Deno s ng

formu a e of WMD s as fo ows
A hough he refinemen mechan sm n our mode can
v
X h gh y mprove he qua y of pseudo-para e da a he confl c
WMD(ŷ x) = m n (T j c(i j)) (7)
T>0 be ween he goa of he conven ona deno s ng process and
j=1
he goa of he ex s y e ransfer ask makes d fficu
where v s he vocabu ary s ze and T j s he d s ance rave ed for he DAE o rans a e rare s y e words accura e y In he
from he word i n ŷ o he word j n x and s correspond ng conven ona deno s ng process a he nforma on of he
cos s c(i j) or g na sen ences s requ red o be recovered However n
2) S y e In ens y: In add on o he seman c s m ar y he ex s y e ransfer ask on y he seman c nforma on
we a so expec ŷ o have s rong s y e n ens y owards Y needs o be preserved wh e he s y e nforma on needs o be
Thus we pre- ra n Tex -CNN-based s y e c ass fier [21] on conver ed As a consequence of h s he DAE ra ned us ng
he ra n ng corpus and use h s c ass fier o eva ua e he s y e he conven ona deno s ng process ends o rea hose s y e
n ens y of ŷ wh ch s es ma ed by he probab y ha ŷ words ha s rare y appeared n he ra n ng se as con en
be ongs o s y e Y words and m s aken y re a ns hese s y e words n he mode
ou pu
Sty(ŷ Y) = Pc s (Y ŷ θc s ) (8)
In h s work we hand e he above confl c hrough a
Therefore he fina qua y score of ŷ w h regard o spec fica y des gned s y e deno s ng mechan sm F rs we use
sen ence x and s y e Y s ca cu a ed as fo ow a pre- ra ned og s c regress on c ass fier [22] o den fy a
se of s y e words for each s y e Then n he s y e ransfer
CS(ŷ x Y) = Con(ŷ x) + Sty(ŷ Y) (9) ra n ng procedure we random y rep ace he npu okens w h
Based on he score g ven by CS he refinemen mechan sm he den fied s y e words as a ype of no se and enforce he
chooses he sen ence ŷ w h he max ma score n Cx as he DAE mode o e m na e such no se n he ou pu sen ences
bes sen ence y∗ and pa red w h x as he pseudo-para e da a Spec fica y we ”po u e” x∗ (or y∗ ) by random y subs -
for he nex epoch of ra n ng u ng npu words n x∗ (or y∗ ) w h s y e words of X (or Y)
Overa he refinemen mechan sm for y∗ can be formu a ed by a probab y of psn and ob a n he corrup ed sen ences
as fo ow x∗sn (or ysn
∗
)
y∗ = argmax CS(ŷ x Y) (10)
ŷ∈Cx x∗sn = Pollute(x∗ X psn ) (12)
∗ ∗
S m ar y x∗ can be formu a ed as fo ow ysn = Pollute(y Y psn ) (13)
x∗ = argmax CS(x̂ y X ) (11) X and Y are se s of s y e words for s y e X and Y respec ve y

x̂∈Cy
Func on Pollute(s M p) represen s he process of random y
A as he pseudo-para e da a af er refinemen becomes subs u ng words n s w h words n M w h he probab y
x∗ y and y∗ x of p
Then, in the style transfer process, we expect the model with prior works, we use the pre-processed dataset from Luo
to convert the added style words and transform the corrupted et al. [3], which has the same dividing on train, dev and test
sentence x∗sn (or ysn
∗
) into the ground-truth sentence y (or x), sets as Li et al. [15]. The five human references of the test set
by minimizing the following MLE loss: proposed in Jin et al. [20] are used to evaluate the performance
∗ of content preservation.
LMLE = log P(x|ysn ) + log P(y|x∗sn ) (14) Formality transfer The representative dataset G YAFC
Finally, to verify that the added “style noise” has been (Grammarly’s Yahoo Answers Formality Corpus) is used to
removed and the output sentence has the desired style, we estimate the formality transfer capacity of the model [16].
utilize the same pre-trained style classifier ( same as the Following Luo et al. [3], we choose the FAMILY A ND R E -
one used in Eq.8) to estimate the style preference of output LATIONSHIPS domain and treat it as a non-parallel dataset.
sentences and expect our model to maximize the probability
B. Baselines
that output sentences have target-style attributes:
We mainly compare our model with the latest SOTA style-
Lstyle = log Pcls (Y|xY ) + log Pcls (X |yX ) (15) transfer models, which have opensourced their code : Trans-
fomerGAN [11], DualRL [3], PoiGen [2], StyleIns [27] and
where xY and yX are the transferred output of the model, with
WordRelev [26]. Besides, we also include earlier transfer
x∗sn and ysn∗
as input, respectively.
systems for comprehensive comparison, such as UnsuperMT
There are several advantages to insert style noise into the
[24] and BackTranslation [25].
inputs. Firstly, the added style words can enlarge the style gap
between the input (x∗ and y∗ ) and the ground-truth (y and x), C. Training Details
and help the model better capture the differences between each
In this work, we implement both encoder and decoder of
style. Secondly, rare style words will appear more frequently
our model on two-layer bidirectional LSTM with attention
in the input sentences, which can help the model better handle
mechanism [28].We set dimensions of all embedding to 512
the transfer of these rare style words. Moreover, the goal of the
and train them from scratch. The Adam optimizer is used
style denoising process is consistent with the purpose of the
in our model with a learning rate of 10−4 during the whole
style transfer task, which enforces the model to learn to remove
training process. The batch size used in our model is set to
the source style words in the original sentences. In general, the
32 and the psn is set to 0.2. Besides, following Mir et al.
style denoising mechanism makes the model more sensitive to
[22], we obtain style words of different styles via the logistic
the style words and enhances the ability of the model’s ability
regression model with a high threshold of 0.9 to reduce noise.
to convert style words of input sentences.
We use the same pre-train style classifier in Eq.9 and 15 which
1) Final Training Loss: Overall, the final training loss used
is implemented on Text-CNN [21] and achieves accuracy of
for the complete style transfer process is:
97% and 90% on Y ELP and G YAFC respectively. We keep
L = λ · LMLE + (1 − λ) · Lstyle (16) λ = 0.3 during the whole training process.
where λ is a tunable parameter that is used to balance the D. Metrics

content preservation and style accuracy. It is worth mentioning 1) Automatic Metrics: Style Accuracy To evaluate the style
that we do not back-propagate the gradient of L in the back- accuracy of transferred sentences, we train two TextCNN-
translation part ( MtX ⇔Y in the Figure 2). based style classifiers (different from classifiers used in the
V. E XPERIMENTS training process), which achieves 97% and 89% accuracy on
Y ELP and G YAFC respectively. To distinguish from the style
A. Datasets classifier used in the training process, these two classifiers are
We verify the effectiveness of our model on two style implemented by Tensorflow.
transfer tasks: sentiment transfer task and formality transfer Content Preservation In this work, the BLEU score between
task. Table I shows the statistics of datasets used in the two the inference of the style-transfer model and four human refer-
transfer tasks. ences is used to evaluate the content preservation performance
of the model.
TABLE I Overall Performance We calculate the geometric mean (G2)
S TATISTICS OF DATASETS (Y ELP AND G YAFC ).
and harmonic mean (H2) of style accuracy and BLEU score
Y ELP G YAFC to evaluate the overall performance of each model, which is
Positive Negative Formal Informal also used in Luo et al. [3],
Train 266,041 177,218 51,967 51,967 . 2) Human Metrics: We randomly select 400 sentences
Dev 2,000 2,000 2,247 2,788 of different styles (i.e., 100 positive and 100 negative; 100
Test 500 500 500 500
informal and 100 formal.) and hire five human judges with
Sentiment transfer We test the sentiment transfer capability strong linguistic backgrounds (having learned English for
of the model on Y ELP review dataset, which includes positive more than fifteen years) to evaluate the quality of transferred
and negative reviews about restaurants and businesses. In line sentences. Without any information about which model the
TABLE II
AUTOMATIC EVALUATION RESULTS OF DIFFERENT MODELS ON TWO BENCHMARK DATASETS .
Y ELP G YAFC
Accuracy BLEU-r G2 H2 Accuracy BLEU-r G2 H2
Existing Unsupervised Systems
CrossAlign [23] 73.8 16.8 35.2 27.4 69.8 3.6 15.8 6.8
StyleEmbedding [7] 8.7 38.5 18.3 14.2 22.2 7.9 13.2 11.7
DelRetri [15] 89.2 28.7 50.6 43.5 54.6 21.2 34.0 30.5
UnsupervisedMT [24] 96.3 41.8 63.4 58.3 68.9 33.4 48.0 45.0
BackTranslation [25] 81.4 37.2 55.0 51.1 26.3 44.5 34.2 33.0
MultiAttribute [13] 81.4 39.6 56.8 53.3 26.3 44.5 34.2 33.0
TransfomerGAN [11] 85.4 51.4 66.2 64.1 66.4 44.6 54.4 53.4
DualRL [3] 88.6 52.0 67.9 65.5 62.6 41.9 51.2 50.2
DataRefinement [20] 87.7 19.8 41.7 32.4 64.1 50.7 57.0 56.6
PoiGen [2] 86.8 54.4 68.7 66.9 11.7 44.9 22.9 18.6
WordRelev [26] 84.3 57.0 69.3 68.0 81.8 46.2 61.4 59.0
StyleIns [27] 91.1 45.5 64.4 60.7 69.6 47.5 57.5 56.5
Unsupervised Baseline Systems
Input Copy 2.5 58.6 12.1 4.8 13.4 48.2 25.4 21.0
Vanilla DAE 73.3 54.3 63.1 62.4 29.2 47.6 37.3 36.2
Our Unsupervised DAE Systems
SE-DAE 85.7 53.9 68.0 66.2 85.5 48.4 64.3 61.8
TABLE III
H UMAN EVALUATION RESULTS OF DIFFERENT MODELS ON TWO BENCHMARK DATASETS .
Y ELP G YAFC
Style Content Fluency Style Content Fluency
TransfomerGAN [11] 3.65 ±1.05 4.33 ±0.70 4.06 ±0.83 2.75 ±0.85 3.85 ±0.89 3.67 ±0.97
DualRL [3] 3.90 ±0.93 4.30 ±0.88 4.21 ±0.83 2.57 ±0.77 3.61 ±1.23 3.73 ±0.95
PoiGen [2] 3.83 ±1.13 4.46 ±0.74 4.37 ±0.72 1.56 ±0.74 2.77 ±0.52 2.88 ±0.71
WordRelev [26] 4.03 ±0.95 4.63 ±0.55 4.50 ±0.56 3.47 ±0.86 4.35 ±0.72 4.37 ±0.91
StyleIns [27] 3.82 ±1.02 4.12 ±0.93 4.11 ±0.87 3.27 ±1.03 4.33 ±0.66 4.32 ±0.77
SE-DAE 3.88 ±0.98 4.48 ±0.63 4.38 ±0.69 3.55 ±1.01 4.38 ±0.68 4.36 ±0.75
TABLE IV
A BLATION STUDY OF DIFFERENT COMPONENTS USED IN SE-DAE.
Y ELP G YAFC
ACC BLEU-r G2 ∆ H2 ∆ ACC BLEU-r G2 ∆ H2 ∆
Vanilla DAE 73.3 51.9 63.1 0.0 62.4 0.0 29.2 47.6 37.3 0.0 36.2 0.0
+ Classifier 82.3 48.7 63.3 +0.2 61.2 -1.2 81.9 47.4 62.3 +25.0 60.0 +23.8
+ Refinement 82.1 53.6 66.4 +3.3 64.9 +2.5 81.7 48.8 63.2 +25.9 61.1 +24.9
+ Style Noise 85.7 53.9 68.0 +4.9 66.2 +3.8 85.5 48.4 64.3 +27.0 61.8 +25.6
sentence is generated by, the five human judges are asked to on denoising process is more robust to the noise and can
rate the output sentences of different style-transfer models on better transfer the sentences of G YAFC. The results of human
a Likert scale of 1 to 5 to evaluate the target style accuracy, evaluation are shown in Table III. Similar to the results of
content preservation and sentence fluency (Fluency). the automatic evaluation, our model surpasses most previous
models except for the WordRelev [26] on Y ELP.When G YAFC
VI. E VALUATION R ESULTS is involved, our model can achieve similar performance to
Table II presents the results of automatic evaluation of WordRelev and has higher style accuracy and content score.
different models (G2 and H2 representing the geometric mean Overall, the evaluation results show the effectiveness of our
and harmonic mean of accuracy and BLEU-r respectively. ). model and a competitive advantage over those powerful but
In Y ELP, our model outperforms other style-transfer models complex models.
except for WordRelev and PoiGen. The reason is that both VII. P ERFORMANCE A NALYSIS
WordRelev and PoiGen utilize the word-level style information
through a multi-stage training process, but our model is only A. Ablation Study
uses the sentence-level style information of pesudo-parallel To get a better understanding of roles played by different
data for training. In contrast, our model achieves the best components in our model, we conduct an ablation studies for
performance on G YAFC. A possible explanation is that there each components in our model by both Y ELP and G YAFC,
are more abbreviations and spell errors in sentences of G YAFC and present the results in Table IV. The results show that
than that in Y ELP, so models are required to be more robust the refinement mechanism highly improves BLEU scores on
to noise when transferring the style of sentences of G YAFC. more than 5 points on Y ELP and by more than 1 point on
Thus, compared with other models, our model that based G YAFC. Meanwhile, the style denoising mechanism improves
100
100 140 Pseudo_BLEU w/o refine 100
Pseudo_BLEU w/ refine
Pseudo_ACC w/o refine 80
80 120 Pseudo_ACC w/ refine 80
100
60 60
60
80
40 60 40 40
40
20 Pseudo_BLEU w/o refine 20 BLEU_insert 20 BLEU_insert
Pseudo_BLEU w/ refine BLEU_replace BLEU_replace
Pseudo_ACC w/o refine 20 ACC_insert ACC_insert
Pseudo_ACC w/ refine 0 ACC_replace 0 ACC_replace
0 0
0 5 10 15 20 25 30 0 5 10 15 20 25 30 1 10 20 30 40 50 60 70 1 10 20 30 40 50 60 70
Training epochs Training epochs Training epochs Training epochs
(a) (b) (c) (d)
Fig. 3. (a): Evolution of BLEU score and style accuracy of the pseudo-parallel data with or without refinement on Y ELP. (b): Evolution of BLEU score
and style accuracy of the pseudo-parallel data with or without refinement on G YAFC. (c): Evolution of BLEU score and style-accuracy on Y ELP when using
different ways to add style words. (d): Evolution of BLEU score and style-accuracy on G YAFC when using different ways to add style words.
the food is so bad that
Output-Probability
Output-Probability
input 0.29 0.28 0.20
irritating us 0.25
Frequency
0.2 0.11
w/o style- the food is so good that 0.08
noise 0.05
irritating us 0.09
0.04
0.05
0.03 0.02 0.02
0.02
w/ style- the food is so good that 0.002
noise fun us
ed
e
ng
d
g
dy
n
g
e
g
l
le
d
se
s
fu
ar
in
u
in
fu
in
bl
in
ba
en
rn
rib
or
lo
ti
bo
ag
kw
in
aw
tat
rri
tat
es
bu
t
w
no
in
ta
g
ho
i
te
er
i
aw
irr
irr
en
er
t
in
t
en
(a) (b) (c) (d)
Fig. 4. (a): An example on Y ELP, style-noise has an influence on transferring negative style words. (b): Appearance frequency of different negative words
on Y ELP. (c): Top-5 output words of the model trained without style-noise when transferring the rare negative word “irritating”. (d): Top-5 output words of
the model trained with style-noise when transferring the rare negative word “irritating”.
the style accuracy on both two datasets by over 3 percentage. information of sentences, and the model trained by insertion
In addition, the results of the ablation studies also show tends to filter words (i.e., focus on filtering out style words).
that the style classifier plays a crucial role in balancing the On the contrary, the replacement will break the information of
style intensity and content preservation. Without the style both content and style of the input sentences. As a result, the
classifier, the vanilla DAE model will generate sentences that model trained with replacement needs to recover the content
can successfully preserves content information but fails in and style information given partial information as input (i.e.,
transferring styles. focus on the reconstruction of both the semantic and the style).
B. Refine Pesudo-parallel Data D. Transfer Style words
In this section, we investigate the influence of the data To further investigate the influence of the style denoising
refinement mechanism by visualizing the evolution curves of mechanism on transferring the rare style words, we sample
the BLEU and style accuracy of the pseudo-parallel data in a sentence that contains the rare negative word irritating. As
each epoch, as shown in Figure 3 (a) and (b). The result shown in Figure 4 (b), irritating rarely appears in Y ELP with
indicates the effectiveness of data refinement mechanism in a low frequency of 0.002 while bad appears most frequently
finding the sentences that both preserving the semantics and with a probability of 0.29. When transferring irritating, the
transferring the style. Moreover, the result also shows that model without style denoising mechanism tends to treat this
without the data refinement mechanism, the pseudo-parallel style word as a content word and gives the highest output
data suffers from more severe data-quality problem on G YAFC probability to it (Figure 4 (c)). In contrast, as the style
than on Y ELP. A possible explanation is that the style of denoising mechanism increases the appearance frequency of
sentiment is more distinguishable than the style of formality rare style words, the model with style denoising mechanism
and transferring sentiment is much easier than transferring provides higher output probability to words with the opposite
formality. style and outputs the positive word “fun” (Figure 4 (d)).
C. Which is Better: Replace Or Insert? E. Case Study
In this work, we explore two ways to add style words To scrutinize the performance of different transfer models,
in the style denoising mechanism: (1) randomly replacing we sample several sentences with multiple style words or com-
input words with style words and (2) randomly inserting plex syntactic structures and then compare the corresponding
style words into input sentences. The result (see Figure 3 transferred sentences of our model with those of other state-of-
(c) and (d)) indicates it is much better to add style words the-art representative systems.From the result shown in Table
through replacement than insertion in this task. One possible V, we observe that: 1) Our model can precisely convert each
explanation is that the insertion does not corrupt the original style words of source sentences 2) In contrast to the incoherent
TABLE V
E XAMPLE OUTPUTS OF DIFFERENT TRANSFER MODELS ON SENTIMENT AND FORMALITY TRANSFER TASKS .
From Negative to Positive From Positive to Negative

Source if i could give less stars , i would . thank you ladies for being awesome !
TransfomerGAN if i can give fun place , i eat . not you ladies for being awesome !
DualRL if i could give it a shot . why are the ladies for being awesome ?
PoiGen great if i could give less stars , i would . no thank you for being awesome !
StyleIns thank you for and outstanding service , and service . are i asked for being disgusting .
WordRelev if i could give extra stars , i would . thank you ladies for being horrible !
SE-DAE if i can give more stars , i recommend ! shame on ladies for being nasty !
From Fomal to Informal From Informal to Fomal

Source you should play sports with your male friends . r u talking about ur avatar ?
TransfomerGAN and just play sports with your 9 : for just ! exist you talking about !
DualRL you should play sports with your male friends it is talking about avatar avatar ?
PoiGen you play with your male friends . u talking about avatar ?
StyleIns and if play sports with your male friends . r u talking about ur avatar ?
WordRelev you should play sports with your male friends bout this . it is talking about your avatar ?
SE-DAE u just play sports with ur male friends are you talking about your avatar ?
sentences of other models, the sentences generated by our similarity when paired two sentences. Wu et al. [31] propose
model are more understandable. a two-step model that first masks style words with a pre-trained
identifier, and then train the model to infill the masked words
VIII. R ELATED W ORK with words of the target style. Zhou et al. [26] propose a two-
Most of the existing models achieve the style transfer of stage model that calculates the word-level style relevance in
texts from two perspectives: model-oriented and data-oriented. the first stage and then utilizes the learned style relevance in
Model-oriented methods handle the text style transfer prob- the second stage.
lem by designing complicated neural networks, which may IX. C ONCLUSION
incorporate multiple steps in the training procedure. Shen et In this work, we propose the Style-Enhanced Denoising
al. [23] design a powerful model that can align two different Auto-Encoder (SE-DAE) with a novel style-denoising mech-
style domains by adversarial learning. Fu et al. [7] applied an anism, which is more compatible with style transfer tasks. To
adversarial loss on the encoder to separate style and content preserve the simplicity of original vanilla DAE, we train SE-
information, while Dai et al. [11] uses cyle-GAN to achieve DAE on the high-quality pseudo-parallel data generated by an-
style transfer without disentangling the style and content. other novel data-refinement mechanism. Compared with previ-
Meanwhile, Xu et al. [1] proposed a transfer model based ously proposed transfer systems that combine the vanilla DAE
on cycle reinforcement-learning, and Luo et al. [3], [4] solved with complicated unsupervised models, our model achieves
the text style transfer problem by dual reinforcement learning. competitive performance to them while preserving the sim-
Both Xu et al. [1] and Luo et al. [3], [4] contain a two- plicity of the DAE in both the network structure and training
step training procedure. Gong et al. [5] designed another RL- procedure. The evaluation results on two benchmark style
based model, which considers the style, semantic and fluency datasets indicate the effectiveness of SE-DAE in multiple text
rewards of the transferred sentence simultaneously. Wu et al. style transfer tasks.
[2] proposed a multi-step model by hierarchical reinforcement
learning. Liu et al. [29] proposed a VAE model that transfers R EFERENCES
the style of the sentence by modifying its latent representation [1] J. Xu, X. Sun, Q. Zeng, X. Ren, X. Zhang, H. Wang, and W. Li,
in the continuous space. “Unpaired sentiment-to-sentiment translation: A cycled reinforcement
learning approach,” in Proceedings of the 56th Annual Meeting of the
Data-oriented methods prefer to retrieve more word-level or Association for Computational Linguistics, 2018, pp. 979–988.
sentences-level information from training corpus for transfer [2] C. Wu, X. Ren, F. Luo, and X. Sun, “A hierarchical reinforced sequence
models. Li et al. [15] proposed a retrieval-based transfer model operation method for unsupervised text style transfer,” in Proceedings of
the 57th Conference of the Association for Computational Linguistics,
that could retrieve similar sentences from target-style corpus ACL, 2019, pp. 4873–4883.
to help the transfer of source-style sentences. Sudhakar et [3] F. Luo, P. Li, J. Zhou, P. Yang, B. Chang, Z. Sui, and X. Sun, “A dual
al. [30] improved the retrieval process of Li et al. [15] with reinforcement learning framework for unsupervised text style transfer,”
in Proceedings of the 28th International Joint Conference on Artificial
a more complicated scoring function and proposed another Intelligence, IJCAI 2019, 2019a, pp. 5116–5122.
method of use style-embedding as style input when there are [4] F. Luo, P. Li, P. Yang, J. Zhou, Y. Tan, B. Chang, Z. Sui, and X. Sun,
no similar sentences in the target style corpus. Lample et al. “Towards fine-grained text sentiment transfer,” in Proceedings of the
57th Annual Meeting of the Association for Computational Linguistics,
[13] on-the-fly generated pseudo-parallel data and adopted the ACL 2019, 2019b, pp. 2013–2022.
unsupervised machine translation model [18] as their transfer [5] H. Gong, S. Bhat, L. Wu, J. Xiong, and W.-m. Hwu, “Reinforcement
model. Jin et al. [20] iteratively back-translated and matched learning based text style transfer without parallel training corpus,” in
Proceedings of the 2019 Conference of the North American Chapter of
unpaired corpus to obtain high-quality pseudo-parallel data to the Association for Computational Linguistics: Human Language, 2019,
train their transfer model, but they only considered content pp. 3168–3180.
[6] Z. Hu, Z. Yang, X. Liang, R. Salakhutdinov, and E. P. Xing, “Toward [25] S. Prabhumoye, Y. Tsvetkov, R. Salakhutdinov, and A. W. Black, “Style
controlled generation of text,” in Proceedings of the 34th International transfer through back-translation,” in Proceedings of the 56th Annual
Conference on Machine Learning, 2017, pp. 1587–1596. Meeting of the Association for Computational Linguistics (Volume 1:
[7] Z. Fu, X. Tan, N. Peng, D. Zhao, and R. Yan, “Style transfer in Long Papers), 2018, pp. 866–876.
text: Exploration and evaluation,” Association for the Advancement of [26] C. Zhou, L. Chen, J. Liu, X. Xiao, J. Su, S. Guo, and H. Wu, “Exploring
Artificial Intelligence, 2018. contextual word-level style relevance for unsupervised style transfer,”
[8] L. Logeswaran, H. Lee, and S. Bengio, “Content preserving text genera- in Proceedings of the 58th Annual Meeting of the Association for
tion with attribute controls,” in 32nd Conference on Neural Information Computational Linguistics, 2020, pp. 7135–7144.
Processing Systems (NeurIPS 2018), 2018, pp. 5103–5113. [27] X. Yi, Z. Liu, W. Li, and M. Sun, “Text style transfer via learning style
[9] Z. Yang, Z. Hu, C. Dyer, E. P. Xing, and T. Berg-Kirkpatrick, “Unsu- instance supported latent space,” in Proceedings of the Twenty-Ninth
pervised text style transfer using language models as discriminators,” International Joint Conference on Artificial Intelligence, IJCAI-20, 2020,
in Advances in Neural Information Processing Systems 31, S. Bengio, pp. 3801–3807.
H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Gar- [28] D. Bahdanau, K. Cho, and Y. Bengio., “Neural machine translation by
nett, Eds., 2018, pp. 7287–7298. jointly learning to align and translate.” in In Proceedings of ICLR, 2015.
[10] C.-T. Lai, Y.-T. Hong, H.-Y. Chen, C.-J. Lu, and S.-D. Lin, “Multiple [29] D. Liu, J. Fu, Y. Zhang, C. Pal, and J. Lv, “Revision in continuous
text style transfer by using word-level conditional generative adversarial space: Unsupervised text style transfer without adversarial learning,” in
network with two-phase training,” in Proceedings of the 2019 Confer- AAAI, 2020.
ence on Empirical Methods in Natural Language Processing and the [30] A. Sudhakar, B. Upadhyay, and A. Maheswaran, “Transforming delete,
9th International Joint Conference on Natural Language Processing retrieve, generate approach for controlled text style transfer,” in Pro-
(EMNLP-IJCNLP), 2019, pp. 3579–3584. ceedings of the 2019 Conference on Empirical Methods in Natural
[11] N. Dai, J. Liang, X. Qiu, and X. Huang, “Style transformer: Unpaired Language Processing and the 9th International Joint Conference on
text style transfer without disentangled latent representation,” Proceed- Natural Language Processing (EMNLP-IJCNLP), 2019, pp. 3269–3279.
ings of the 57th Annual Meeting of the Association for Computational [31] X. Wu, T. Zhang, L. Zang, J. Han, and S. Hu, “Mask and infill: Applying
Linguistics, pp. 5997–6007, 2019. masked language model for sentiment transfer,” in Proceedings of the
[12] A. Tikhonov, V. Shibaev, A. Nagaev, A. Nugmanova, and I. P. Twenty-Eighth International Joint Conference on Artificial Intelligence,
Yamshchikov, “Style transfer for texts: Retrain, report errors, compare IJCAI-19, 2019, pp. 5271–5277.
with rewrites,” in Proceedings of the 2019 Conference on Empirical
Methods in Natural Language Processing and the 9th International Joint
Conference on Natural Language Processing (EMNLP-IJCNLP), 2019,
pp. 3936–3945.
[13] G. Lample, S. Subramanian, E. Smith, L. Denoyer, M. Ranzato, and
Y.-L. Boureau, “Multiple-attribute text rewriting,” in International Con-
ference on Learning Representations, 2019.
[14] H. Ham, T. J. Jun, and D. Kim, “Unbalanced gans: Pre-training the gen-
erator of generative adversarial network using variational autoencoder,”
in ArXiv, 2020.
[15] J. Li, R. Jia, H. He, and P. Liang, “Delete, retrieve, generate: a simple
approach to sentiment and style transfer,” in Proceedings of the 2018
Conference of the North American Chapter of the Association for
Computational Linguistics: Human Language Technologies, Volume 1
(Long Papers), 2018, pp. 1865–1874.
[16] S. Rao and J. Tetreault, “Dear sir or madam, may I introduce the GYAFC
dataset: Corpus, benchmarks and metrics for formality style transfer,”
in Proceedings of the 2018 Conference of the North American Chapter
of the Association for Computational Linguistics: Human Language
Technologies, Volume 1 (Long Papers), 2018, pp. 129–140.
[17] P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol, “Extracting
and composing robust features with denoising autoencoders,” pp. 1096–
1103, 2008.
[18] G. Lample, M. Ott, A. Conneau, L. Denoyer, and M. Ranzato, “Phrase-
based & neural unsupervised machine translation,” in Proceedings
of the 2018 Conference on Empirical Methods in Natural Language
Processing, 2018, pp. 5039–5049.
[19] M. J. Kusner, Y. Sun, N. I. Kolkin, and K. Q. Weinberger, “From word
embeddings to document distances,” in Proceedings of the 32nd Inter-
national Conference on International Conference on Machine Learning,
2015, p. 957–966.
[20] Z. Jin, D. Jin, J. Mueller, N. Matthews, and E. Santus, “IMaT: Unsu-
pervised text attribute transfer via iterative matching and translation,” in
Proceedings of the 2019 Conference on Empirical Methods in Natural
Language Processing and the 9th International Joint Conference on
Natural Language Processing (EMNLP-IJCNLP), 2019, pp. 3097–3109.
[21] Y. Kim, “Convolutional neural networks for sentence classification,” in
Proceedings of the 2014 Conference on Empirical Methods in Natural
Language Processing (EMNLP), 2014, pp. 1746–1751.
[22] R. Mir, B. Felbo, N. Obradovich, and I. Rahwan, “Evaluating style
transfer for text,” in Proceedings of the 2019 Conference of the North
American Chapter of the Association for Computational Linguistics:
Human Language Technologies, Volume 1 (Long and Short Papers),
2019, pp. 495–504.
[23] T. Shen, T. Lei, R. Barzilay, and T. S. Jaakkola, “Style transfer from
non-parallel text by cross-alignment,” pp. 6830–6841, 2017.
[24] Z. Zhang, S. Ren, S. Liu, J. Wang, P. Chen, M. Li, M. Zhou, and E. Chen,
“Style transfer as unsupervised machine translation,” CoRR, 2018.

SE-DAE: Style-Enhanced Denoising Auto-Encoder For Unsupervised Text Style Transfer

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

SE-DAE: Style-Enhanced Denoising Auto-Encoder For Unsupervised Text Style Transfer

Uploaded by

Copyright:

Available Formats

SE-DAE: Style-Enhanced Denoising Auto-Encoder

for Unsupervised Text Style Transfer

LSM he d s r bu on of each s y e bu no he way of ransferr ng

sen ences from one s y e o ano her s y e Mos of he prev ous

x Encoder Decoder x⇤ Encoder Decoder yX x

ba k an a on w h MX ,Y fin m n & y no y an a n nng o MX+,Y

synonym c-s m ar y a he word- eve [20] The ca cu a on B S y e Deno s ng

x∗ = argmax CS(x̂ y X ) (11) X and Y are se s of s y e words for s y e X and Y respec ve y

where λ is a tunable parameter that is used to balance the D. Metrics

the food is so bad that

From Negative to Positive From Positive to Negative

From Fomal to Informal From Informal to Fomal

You might also like