Professional Documents
Culture Documents
Georg Kirchner
Dell Technologies
42 South Street
Hopkinton, MA, 01748, U.S.A.
Georg.Kirchner@dell.com
As we analyzed the accumulated data for this The hourly throughput does not account for
paper, we found that post-edits made in a single elapsed time, such as research while segments in
CAT tool session at the translation step correctly the CAT tool are inactive and the time tracker is
capture the full extent of post edits, even if the not running. Therefore these productivity num-
linguist revisits a segment multiple times in the bers are somewhat theoretical and do not mean
same session. If the linguist edited the same that our PT-BR linguists post-edit 2147 x 8 =
segment in separate CAT tool sessions, only the 17,176 words per day. But they surely suggest
edits made in the last session are captured. that the historic translation output of 2000 words
Because of this, we are underreporting the post- per day is outdated and the actual productivity is
edit distance for an unknown number of significantly higher due to translation technolo-
segments. gy.
Another caveat is that we decided not to track While the language ranking is roughly as ex-
productivity for human translations (HT). We pected, there are surprises: ZH-CN ranks rela-
originally expected that for a given quality tier, tively high compared to JA-JP and KO-KR; and
we would either MTPE or HT all jobs. And ES-MX (Latin American Spanish) ranks low
comparing productivity between MTPE and HT compared to FR-FR, IT-IT and PT-BR.
jobs across quality tiers would not result in a fair Productivity clearly varies by quality tiers, be-
comparison. Later on, we found that our PMs did ing higher for Good enough than High quality
apply HT workflows selectively to MTPE quality content. This may be due to varying levels of
tiers. Had we adjusted our setup, we would now linguistic complexity and expectations, or the
have data to benchmark MTPE against HT simple fact that Good enough jobs are bigger
productivity. than High quality jobs, think of product docu-
mentation vs. marketing material. And the more
4 Ranking by Productivity volume in a given job, the easier it is for linguists
The most basic exercise is to rank our major to pick up speed.
language pairs by productivity. These hourly
word numbers below result from dividing
cumulative MT words by cumulative post-editing
time between August 2017 and December 2019.
4
https://docs.microsoft.com/en-us/azure/cognitive-
services/translator/custom-translator/overview
Sorting segments by post-edit time, we noticed For EN-US to FR-FR, the average PED for
outliers that were active much longer than typi- MT segments is 17 characters vs. 21 characters
cally necessary for editing. The biggest outlier for 75%-84% fuzzy matches. While MT seg-
was active in the CAT tool for 16 minutes. When ments require 19% fewer post-edits, it takes 9%
only considering segments active for 60 seconds longer to edit them. It has been noted before, that
– which means 81% of all MT segments - the post-edits and post-editing time, i.e., technical
editing time gap shrinks from 76% to 25%. and temporal efforts do not necessarily correlate
(Krings, 2001).
While we couldn’t demonstrate that our MT
segments can be edited faster than our fuzzies,
we did gain a couple of useful insights. For one,
References
Levenshtein, Vladimir Iosifovich. 1966. Binary Codes
Capable of Correcting Deletions, Insertions and
Reversals. Soviet Physics Doklady.
Zaretskaya, Anna. 2019. Raising the TM Threshold in
Neural MT Post-Editing: a Case-Study on Two Da-
tasets. Proceedings of MT Summit XVII, volume
2. 213-218.
Krings, Hans P. 2001. Repairing Texts: Empirical
Investigations of Machine Translation Post-Editing
Processes. Kent State University Press, Kent,
Ohio.
Koponen, Maarit. 2016. Is Machine Translation
Postediting Worth the Effort? A Survey of Research
into Post-editing and Effort. The Journal of Spe-
cialised Translation.