You are on page 1of 78

SDL Certification

Post-Editing Certification

May 2017
1. Introduction ·························································································································· 5
1.1. MT Development in the Last Century ··················································································· 6
1.2. A Brief History of MT at SDL ·································································································· 8
2. PEMT and Translation ·········································································································11
2.1. Global Developments and the Localization Industry ··························································11
2.2. The Right Solution for the Right Content ············································································11
2.3. Content Evaluation and the Post-Editing Process·······························································13
3. MT Technologies ·················································································································15
3.1. Rules-Based Machine Translation ·······················································································15
3.2. Statistical Machine Translation
(also known as data-driven machine translation) ······························································17
3.3. Hybrid MT Systems ·············································································································19
4. How the MT Output Is Created ···························································································21
4.1. MT Engine Creation·············································································································21
4.1.1. Baselines ·····························································································································21
4.1.2. Verticals·······························································································································22
4.1.3. Customizations ····················································································································23
4.2. MT Engine Training ·············································································································24
4.3. MT Output Evaluation — Testing Methodologies ······························································25
5. Using the MT Output: The Basics of Post-Editing ·······························································31
5.1. Introduction to Post-Editing································································································31
5.2. Degrees of Post-Editing ·······································································································32
5.3. The Quality Check Process ··································································································36
6. How to Get the Most out of MT ·························································································38
6.1. What Makes an Effective Post-Editor? ···············································································38
6.2. Post-Editing Quality Expectations ·······················································································39
6.3. Under-Editing ······················································································································41
6.4. Over-Editing ························································································································42
7. Expected Statistical MT Behavior························································································45
7.1. Common Patterns to Watch for When Post-Editing ···························································45
7.2. How to Provide Feedback to Improve the MT Output ·······················································46
8. Using SDL Language Cloud in SDL Trados Studio ································································49
8.1. How to Add SDL Language Cloud
as a Machine Translation Provider in SDL Trados Studio····················································49
8.1.1. Applying MT Segment-by-Segment ····················································································51
8.1.2. Applying MT to Whole Files/Projects
Using the “Pre-translate Files” Batch Task··········································································51
8.1.3. Using the Studio AutoSuggest Feature to Retrieve MT in Studio ·······································52
8.2. How to Use Dictionaries in Language Cloud to Improve MT Output··································54
8.2.1. Best Practices for LC Dictionary Creation ···········································································55
8.2.2. How to Upload and Maintain a Dictionary in LC ·································································56
8.2.3. Testing That the Dictionary Has Been Applied ···································································58
9. AdaptiveMT Engines Powered by Language Cloud·····························································60
10. The Future of Post-Editing ··································································································63
11. Summary ·····························································································································64
12. More Resources on MT and Post-Editing ···········································································66

Appendix 1: References ························································································································68


Appendix 2: Post-Editing Examples·······································································································69

CONFIDENTIAL 3 SDL Certification | Post-Editing Certification


CONFIDENTIAL 4 SDL Certification | Post-Editing Certification
The volume of content requiring translation is growing faster than ever before and cannot be
handled any longer by purely conventional means. The data-driven world calls for speed and agility,
and Post-Editing Machine Translation (PEMT) has proven to be the most effective way to meet the
quality and communication demands of our time.

In response to the growing demand for post-editing, SDL introduced its Post-Editing Machine
Translation (PEMT) Certification in 2014 to enable translators to gain a foothold in the emerging
post-editing market. The ready availability of MT means that more material is earmarked for
translation, which in turn fuels the need for skilled and knowledgeable post-editors. PEMT is now
very much a mainstream skill for translators, and one of the main aims of the PEMT Certification
program is to give them the right skills to deal with the ever-increasing demand for post-editing.

Our certification program is geared toward anyone who is impacted by post-editing and wants to
gain a better understanding of the history and theory behind MT as well as the practical applications
of MT. The interest in and uptake of the PEMT Certification program is testament to the continuous
need for MT training, and this updated training guide aims to capture the latest MT developments
and best practice guidelines.

This guide covers the following areas:

 History and development of MT


 MT technologies
 Post-editing machine translation
 Best practices
 SDL Language Cloud

Machine Translation (MT) is automated translation that uses software to translate text from one
natural language to another. It is one of the oldest applications of artificial intelligence and both
facilitates and accelerates the creation of translated materials.

Why is PEMT so important in today’s market? A structured and controlled PEMT workflow allows
companies to join the fast-track MT highway and successfully deliver their brand message to a global
audience.

With this in mind, it is important to remember that MT does not replace the need for human
translation and human translators. PEMT is the process of allowing machines to do the heavy lifting
of translation, with editing and quality assurance being performed by a trained translator. MT is an
effective tool to assist translators in their everyday work.

CONFIDENTIAL 5 SDL Certification | Post-Editing Certification


1.1.
Following on from the efforts made in cryptography during World War II, MT is generally considered
to have started in the 1950s. In 1954, the successful execution of the Georgetown experiment—the
fully automated translation of approximately sixty Russian sentences into English—ushered in an era
of significant funding for MT research in the USA. Researchers believed they could produce a fully
automated MT system within three to five years. This endeavor proved more difficult than expected,
however, and 10 years later funding was cut when it became clear that the development of MT had
not progressed as far as originally hoped.

Early attempts at MT typically failed due to a lack of coverage. The models functioned by encoding a
limited selection of transformational rules, which simply did not provide for the diversity of natural
language translation. Consequently, the first attempts to commercialize MT in the 1970s and 1980s
operated by drastically increasing the number of encoded transformational rules. This produced
Rules-Based Machine Translation (RBMT), which functioned relatively successfully with targeted
human feedback over a particular domain. However, this led to the further problem of how to make
the huge number of transformational rules needed to encode language pairs cooperate with each
other. The answer was a statistical approach to MT.

In the late 1980s, computational power increased and became less expensive. As a result, interest
picked up in Statistical Machine Translation (SMT). From the 1990s, statistical learning approaches
came to the fore, led by cutting-edge work from the research team at IBM. SMT systems no longer
required the same human effort to encode transformational rules and update lexicons and
terminology lists, but instead exploited the wealth of existing translations, covering numerous
language pairs, to extract rules based on statistical probability.

Since the 1990s, SMT has been pushed forward through intensive research and training as well as
support from industry, the US Defense Advanced Research Projects Agency (DARPA) and the
European Commission’s FP7 program. Statistical MT has been deployed in real-world, commercial
contexts by SDL, Google, Microsoft and IBM, alongside ongoing research and new developments in
the field of statistical and hybrid MT. In 2011, SMT was boosted with Google’s announcement that it
would charge for access to the Google Translator API. Shortly afterwards, Microsoft also announced
that it would start charging for use of the Microsoft Translator API. These two events can be viewed
as a key milestone for the machine translation industry and the localization industry as a whole. The
progression to a paid API model for machine translation was a clear sign that the use, spread and
quality of MT have matured to a level where enterprises and developers see sufficient value in MT to
invest in it.
After many decades, it appears that the models used in MT are more in line with our understanding
of how human language cognition and processing operates.

This does not mean that the MT output is of an equal standard to anything the human brain can
produce, but it reinforces that MT is an essential tool in the translation lifecycle.

MT accuracy is improving constantly, and new research and developments put MT at the forefront
of linguistic improvements. MT is now a truly interdisciplinary field, drawing from computer science,
linguistics, probability theory, algorithm design, automata theory and engineering.

CONFIDENTIAL 6 SDL Certification | Post-Editing Certification


The application of deep learning and neural networks to machine translation is the most recent
innovation taking place in the field of MT, and we will most likely see its impact on the industry in
the near future.

Some Facts about MT Today

CONFIDENTIAL 7 SDL Certification | Post-Editing Certification


1.2.
As a company, SDL has over 20 years of experience as a Language Service Provider (LSP) in the
localization industry.

Our MT journey started back in 2000 when we acquired a Rules-Based Machine Translation (RBMT)
engine from Transparent Language, which became SDL Enterprise Translation Server (ETS). In 2004,
the Knowledge-based Translation System (KbTS) group was set up to use the MT system as part of a
high-quality translation process and offer post-editing as a service to our customers.

In 2009, Statistical Machine Translation (SMT) was starting to establish itself firmly as a contender in
the localization industry, following rapid development. SDL forged a strategic partnership with a
leading SMT developer, Language Weaver, allowing MT usage to grow exponentially.

In 2010, SDL acquired Language Weaver and has continued to invest heavily in the development and
deployment of SMT technology. KbTS was rebranded as iMT (intelligent Machine Translation) in
2011, underlining the importance of machine translation within the translation productivity
workflow. The same year also saw the execution of a radical MT growth strategy by scaling MT
projects through our in-country language offices.

Brief history of MT in SDL

Today, the MT team is a truly interdisciplinary team encompassing computational linguists, project
managers, data specialists and post-editors who educate, promote and support MT usage
throughout SDL. Best results can only be achieved through an integrated effort, and we are
supported by two research labs that focus on MT technology and SDL’s language offices, where MT
is fully integrated in the production lifecycle.

Recent improvements to machine translation technology at SDL include the implementation of XMT
technology.

CONFIDENTIAL 8 SDL Certification | Post-Editing Certification


SDL MT New Platform

Developed by the SDL Language Research Group based in Los Angeles, California, XMT is a complete
rewrite of MT technology by a team who understands the limitations of previous generations of
machine translation.

While previous technologies apply a single translation algorithm universally to all language pairs,
XMT allows different translation algorithms to be used for different language pairs, providing much
higher-quality output. SDL XMT incorporates all previous innovations and algorithms, but its modular
and robust design also enables the rapid transition of new technologies and innovations.
Another recent exciting release saw the implementation of real-time MT learning mechanisms under
the name AdaptiveMT. This technology will allow XMT MT engines to learn and remember the
translation preferences of individual users.

This development will form the basis of a recalibration of the relationship between MT and
translation experts, and will put the post-editor firmly at the heart of the MT improvement cycle.
Section 9 of this manual focuses on the latest developments in AdaptiveMT at SDL.

With regard to the application of neural network technology to machine translation, SDL has been
experimenting with this approach for some time. Information about developments on this front will
follow in the near future.

CONFIDENTIAL 9 SDL Certification | Post-Editing Certification


CONFIDENTIAL 10 SDL Certification | Post-Editing Certification
2.1.

Why is post-editing so important? Now more than ever, language is a business requirement. An
estimated three-quarters of web users take advantage of free translation tools thanks to the greater
accessibility and integration of MT solutions. Over 90% of non-native English speakers use MT
software to translate English websites they visit. Nowadays, businesses need their websites to cover
14 languages to reach 90% of the most economically active people. Overall, there is an enormous
growth in digital content which is fueling localization volumes.

Furthermore, the importance of English as a global lingua franca is slowly decreasing. In recent
years, the two languages with the greatest growth on the Internet were Arabic and Mandarin
Chinese — both of which grew twentyfold. In contrast, content in English only trebled.
Proportionally, English is declining in importance. It is estimated that by 2020, English will have lost
its status as lingua franca altogether. However, rather than being replaced by another natural
language, linguistic diversity will be the new status quo and translation will be key to
communication.

How can MT and post-editing help respond to these trends? The only answer to the digital content
explosion is an automated solution with the ability to accelerate content availability and scale to
enable a faster time to market. Structured PEMT solutions create the framework to accommodate
the ever-increasing volume of data, which is set to outstrip the current capacity of both conventional
translators and post-editors.

2.2.

The last few years have seen significant investment and progress in MT technology. SDL remains at
the forefront of MT development and uses a structured PEMT process to increase efficiency while
delivering a high-quality end product. Our MT technology is integrated with SDL’s translation
environments across the translation workflow: SDL Trados Studio, TMS and WorldServer.

It is worth noting that language service providers such as SDL are not the only ones implementing an
MT roadmap — many of our clients have mature MT strategies and rely on MT as an integral part of
their localization process.

With so many of SDL’s clients and partners looking to MT as a standard, there has been a shift in
terms of which domains and content types can be considered for machine translation. Content types
that were considered unsuitable a couple of years ago can now be handled productively using
machine translation.

CONFIDENTIAL 11 SDL Certification | Post-Editing Certification


In today’s localization market, there are four main translation use cases. Three of them involve
machine translation and two of them involve some type of post-editing of the MT output:

 Human translation
 Post-editing to publishable level: MT output from MT engines is post-edited by professional
linguists to a quality level equivalent to conventional translation. Post-editing MT content is
the preferred solution for publishable documents. It is used as part of a high-quality
translation process
 Light post-editing or post-editing to understandable quality: MT output is post-edited to a
level suitable for an acceptable and actionable translation, not with perfect grammar and
style. This is a viable option for perishable and low-visibility content
 Raw MT or FAUT (Fully Automated Useful Translation): MT is generated by baseline engines
or customized engines and the output is used directly, with no human intervention. This
solution is used mostly for content such as emails, support content or instant messages,
where the user wants to have an idea of the content, without the need for high quality

Deciding which content can successfully be translated using MT is a commercial and financial
decision, and it is important to follow best practices when taking it.

Many companies now consider MT the only viable option to process the volume of content they
need to localize. MT also allows the translation of content that would previously have remained
untranslated. An example of this is USG (User-Generated Content). Most clients have some USG in
their content portfolio and are facing increasing demands to make it available to a global audience.
This cannot usually be achieved with human translation; however, MT is ideally suited to translating
this high-volume, perishable content.

Post-editing machine translation is no longer a marginal translation activity but has become a
mainstream way to satisfy the translation productivity needs of our data-driven world. Post-editing
is now an established practice, widely used by translators and routinely taught at universities.

Translators and linguists are an integral part of any post-editing process, and their expertise is
needed throughout. As a language services and machine translation provider, SDL offers its
customers over 600 MT engines for a variety of language pairs. This is only possible with the right
linguistic support — SDL’s linguists and in-house and freelance translators are involved in every stage
of the MT lifecycle, from rigorously testing MT solutions before deployment to providing the
feedback needed for engine improvements.

CONFIDENTIAL 12 SDL Certification | Post-Editing Certification


2.3.

A careful content evaluation should be at the start of every successful post-editing implementation
process. Some types of content are simply not suitable for an MT process as they require a more
nuanced approach harnessing skills that can only be provided by human translators.

A content audit will help determine whether PEMT is the right solution for the content. The current
and future needs of a project should be evaluated on the basis of:

 General suitability based on language/domain/content

 Quality of the evaluated source data (issues deriving from language errors, formatting and
authoring)
 Volume requirements that are too capital and time intensive for human translators to
manage
 The technical integration required to produce a production workflow (CAT tools/CMS and
workflow systems)
 Chances of MT translation success (based on previous experiences)

Companies can achieve the most significant gains through PEMT by targeting content that is both
strategic and high volume with tight turnaround times.

As part of the wider process, it is critical for the evaluator to identify and record not just the content
types where PEMT has been successful but also where PEMT has not met project goals. Collecting
feedback from editors and users is critical for this to happen. A continual feedback loop will
eventually increase the speed at which evaluations take place and help predict translation outcomes
more accurately.

CONFIDENTIAL 13 SDL Certification | Post-Editing Certification


CONFIDENTIAL 14 SDL Certification | Post-Editing Certification
In order to post-edit effectively, it is important to know which MT technology is being used, as this
will help avoid mistakes. The three main, established MT technologies are:

 Rules-Based Machine Translation (RBMT)


 Statistical Machine Translation (SMT)
 Hybrid machine translation

3.1.
Chronologically speaking, Rules-Based Machine Translation (RBMT) was the first approach to
automated translation. RBMT uses a linguistics-based set of rules in combination with a dictionary.
This is also known as knowledge-driven MT. A language pair is built by looking at the construction of
both source and target, taking into account source and target grammar and vocabulary. This is a
transfer-based approach with three translation phases: analysis, transfer and generation. It involves
parsing a source sentence, analyzing the structure, converting this to a machine-readable code and
then transforming it into the target, as shown below:

1) The source sentence is analyzed and a syntactic analysis is built


2) The source parse tree is converted into a target parse tree
3) The target parse tree is converted into an output sentence

The core system is based on a set of grammatical rules for each language pair, combined with a
dictionary. The dictionary contains source words and phrases, their translation and detailed
grammatical information, such as the part of speech and inflection. It provides the modules with the
linguistic knowledge they need.

The rules are the “linguistic processor” of the system, responsible for analysis and generation. They
use linguistic information stored in the dictionary. These rules are intended to represent the
grammatical knowledge of speakers and specify inherent agreement and relational information.

Determiner and
Subject and finite
noun need to
Example agree in number
verb need to agree
in number
and gender

At the translation stage, the MT engine analyzes each source sentence and tags the words and
phrases with their part of speech to identify grammatical components (for example, the subject,
object and verb). The MT system then looks up the translations of these grammatically tagged words
and phrases in the machine dictionary and combines them using the coded language rules for the
target language. This builds the translated sentence.

CONFIDENTIAL 15 SDL Certification | Post-Editing Certification


A large core dictionary provides the translations for everyday words and phrases. For translations
that use specific terminology, an RBMT system can use customized dictionaries in conjunction with
the baseline to improve translation accuracy.

How to Recognize RBMT Output


RBMT output is based on three factors:

 Rules for the language pair


 General settings that can be customized (such as quotation marks, verb tense, accents and
decimal points)
 Project dictionary where specific terminology is entered — this is key to improving MT
quality

Pros and Cons of RBMT


Advantages:
 Rules and terminology are highly controlled
 Once a set of grammatical rules for any given language pair is established, new projects can
be created relatively quickly
 Consistent use of terminology

Disadvantages:
 It is very time consuming to develop a set of rules and a dictionary for any given language
pair
 This approach is limited in terms of applications, as source content needs to be well written
to generate good output
 Translations are often literal
 Rules-based systems do not allow for context sensitivity in terms of style or terminology

Challenges of RBMT
RBMT allows for excellent terminology control. There is no need for pre-existing TMs, as project
dictionaries can be created from scratch and the output is systematic (rightly or wrongly), allowing
experienced post-editors to work quickly and reliably. However, it can take a number of years to
develop a new language pair and the source must be well written to generate good output.

As explained previously, RBMT uses a linguistics-based set of rules together with a dictionary.
Translation rules are encoded manually. Issues arise from the number of rules that need to be
created and the fact that rules conflict with each other. In addition, both words and grammar can be
ambiguous. One word can have several meanings (for example, “bank”) and different grammatical
rules apply only in certain contexts.

RBMT output is often not very fluent or sensitive to context, providing a single translation per term.
The creation and maintenance of project dictionaries can also be time consuming.

CONFIDENTIAL 16 SDL Certification | Post-Editing Certification


3.2.

A Statistical Machine Translation (SMT) system learns to translate by analyzing large volumes of
previously translated content. How does this work? A large database of aligned source and target
texts is entered into a statistical learning system. On the basis of these examples, the system creates
an engine for automated translation. In essence, the system learns how to translate by analyzing the
statistical relationships between source and target data.

The starting point for training an engine is an aligned corpus of source and translated sentences
containing hundreds of millions of words. The training process subdivides each of the source
sentences into words and series of words (n-grams) and analyzes the associated translated
sentences. In this way, the training process determines the most likely set of translations for each n-
gram in the source. By analyzing just the translated content, the training process learns the order in
which the translated words are most likely to occur. The more training data and the more
consistency there is in the data, the more accurate the process becomes.

In the next stage of the process, the system compiles all of the learned data into the runtime MT
engine. The runtime MT engine subdivides each sentence into smaller chunks and looks up the
possible translations in the compiled database. For a given source sentence, this process results in
many possible translated sentences. The MT engine uses the statistical data on the probability of a
translation and the word order to determine the best candidate for the MT output.

The following graphic illustrates how SMT learns from translations rather than using language rules.
It shows very simply how the system uses statistical analysis to decide how to translate the Spanish
word “banco”.

CONFIDENTIAL 17 SDL Certification | Post-Editing Certification


For general-purpose translation, the system uses a baseline language engine that is trained with a
large corpus of broad-spectrum content — hundreds of millions of words. To enhance performance
for applications that use specific terminology, an SMT system can be trained using a corpus that only
or mostly contains data that is close to the content that is to be translated. An ideal corpus for this
would be a large Translation Memory (TM) that contains the translations of previous projects. The
recommended volume of data required is 1 to 5 million words, although it is possible to work with
fewer than 1 million. This is known as customization or engine training.

The quality of the MT output depends on both the linguistic and the technical quality of the material
included.

Pros and Cons of SMT


Advantages:
 Once the learning system is in place, developing new engines is a quick process
 Translations are relatively fluent and show some context sensitivity
 The application is broader as source content only needs to be of acceptable quality, with
higher tolerance for mistakes

Disadvantages:
 Large databases of sufficient quality are required
 Engine cannot be influenced directly
 Little control over terminology

Compared with RBMT, statistical machine translation can offer a larger number of languages for
post-editing, as engines cost less and are faster to train, as well as being easier to maintain. Because
SMT is trained using “real” sentences and phrases, the direct output tends to be more fluent than
RBMT output.

The following table summarizes the key differences between SMT and RBMT.

Attribute SMT RBMT


Does not need a large volume of aligned 
data for training/customization

Number of languages supported



Setup time for new language

Terminology control 

Software UI term handling 

Raw fluency

Raw accuracy 

Level of research activity and performance


improvement predicted 

CONFIDENTIAL 18 SDL Certification | Post-Editing Certification


3.3.
A third MT technology option is the “hybrid system,” where MT developers aim to combine the best
features offered by each system (for example RBMT’s grammatical correctness or SMT’s lexical
selection). SMT systems are inherently more robust and always produce output, their language
models ensure fluency, and lexical selection is better. However, they have more difficulty in coping
with phenomena that require linguistic knowledge, such as morphology, syntactic functions and
word order. RBMT vendors may, for example, use a hybrid approach to improve systems without the
high cost of dictionary creation.

Hybrid MT can be any combination of SMT and RBMT technology. There are several forms of hybrid
MT, which are based on different approaches:

 Coupling two or more existing systems in series or in parallel without modifying either
 Using either RBMT or SMT as the basic architecture and extending it with either knowledge-
driven or data-driven components
 Using a new hybrid architecture that combines knowledge-driven or data-driven
components
 A common form of hybrid MT is RBMT post-processed with SMT (also known as statistical
smoothing). RBMT is used as a starting point for translation and SMT is added as a post-
process to correct the output and improve the fluency of the RBMT system

Pros and Cons of Hybrid MT


Advantages:
 More predictable results than purely statistical MT
 More fluent than RBMT due to the beneficial influence of statistics

Disadvantages:
 Risk of introducing errors with SMT

CONFIDENTIAL 19 SDL Certification | Post-Editing Certification


CONFIDENTIAL 20 SDL Certification | Post-Editing Certification
4.1.
Statistical machine translation is the technology of choice at SDL and this course will focus on SMT
technology and its applications.

SDL takes a three-pronged approach to SMT and uses the following engine types, matching the
solution to the particular use case:

Baselines

Verticals

Customized engines

We will now explore the characteristics of each solution in more detail. It is helpful for post-editors
to know which type of engine they are working with in order to determine the correct approach for
post-editing.

4.1.1.
The baselines are the core generic engines developed by SDL for any given language pair and contain
hundreds of millions of words of bilingual data. The baseline systems are used as the starting point
for new language directions. They use existing translation databases to build up language pairs. The
data used is mined from reliable sources available in the public domain, such as news, IT
documentation, technical manuals and publically available government material, and cover a variety
of subjects, for example IT, automotive, news, sports, electronics etc.

Baselines can be used as powerful backup engines for customizations and verticals. This means that
if a word, phrase or grammatical structure is not found in the training data for a vertical or

CONFIDENTIAL 21 SDL Certification | Post-Editing Certification


customized engine, the engine may still be able to produce a translation if the word is found in the
baseline.

Baselines are continuously monitored and updated in regular intervals.

This solution produces good results for clients who require immediate access to MT or do not have
sufficient data volumes for a custom solution. Baselines work particularly well if the content for
translation is general and varied.

4.1.2. Verticals
Verticals are trained statistical engines that are exclusive to a specific domain or subject area, for
example automotive, IT, electronics or travel. The data used for creating a vertical is selected from
sources within a domain or an industry. The MT output is more likely to follow domain-specific
technical terminology.

This solution can be used if there is not enough project-specific data to create a customized engine.
These domain-specific engines therefore provide a point of entry for projects with small project TMs.
Verticals are off-the-shelf solutions that also prove useful in cases where there is not enough time to
create a project-specific solution before the project starts.

Based on the higher volume of data used in a vertical in comparison with a customization, the engine
is less likely to take translations from the baseline and is therefore more likely to produce a specific
technical translation instead of a general one. However, as the data used to create a vertical will
come from different sources within a domain, the post-editor will need to look out for
inconsistencies in terms of style and terminology.

SDL verticals are available for the following domains in a wide range of languages:

These engines are reviewed on a regular basis and are retrained when new data or new technical
features become available.

CONFIDENTIAL 22 SDL Certification | Post-Editing Certification


4.1.3.
When creating a customized engine, the MT is optimized for particular client projects. The MT
engine training is based on client-specific bilingual data (usually translation memories). The
recommended requirement for a successful customization is an aligned corpus containing a
minimum of one million words of relevant customer data, although this may vary per project and
language pair, and it is possible to create a customization with lower volumes of customer data. As a
general rule, more data usually has a positive effect on the quality of an engine and, in turn, the MT
output. However, for trained engines, the quality and consistency of data is also paramount.

One of the biggest advantages of a customized engine is the adherence to client-specific terminology
and style. As the machine translation output is fully based on the bilingual corpus, with no
syntactical or lexical data added, the quality of the output can only be as good as the quality of the
corpus. If the corpus data contains inconsistent terminology or style, the resulting MT output may
also show inconsistencies.

Key Features of SMT Solutions


The graphic below shows the key features of SMT solutions at a glance.

Baselines work well for general and varied content. Verticals are an off-the-shelf solution when no or
not enough client data is available for a customization. Customized engines are trained with
customer-specific data and are the recommended solution for specific client projects.

The type of MT solution chosen for a particular client or project not only depends on the type of
content to be translated, but also on the MT use case (for example raw MT versus post-editing).
Choosing the right solution within this context serves to improve the usefulness and efficiency of MT
and post-editing.

CONFIDENTIAL 23 SDL Certification | Post-Editing Certification


4.2.
The MT engine and how it is trained is one of the most critical parts of the PEMT process. A
mistake in engine training could result in repeated omissions or inaccurate translations,
forcing editors to make the same edits over and over again. This will have a negative impact
on the financial ROI of the PEMT process.

Building an engine to handle large data sets across several file formats requires a structured
process. Specialist tools at the data intake stage are required to optimize the data and
ensure that it is understood, cleaned and prioritized. Engine design must also include a view
of future translation requirements to ensure the ability to process data quickly and provide
insightful and relevant analytics further down the road.

Successful engine training is based on fully understanding the quality expectations for a
particular project. This includes having access to important project assets such as
terminology and style guides. These must be incorporated into the MT engine training and
used to measure the success of the output.

Engine Creation Process


Components and Workflow of Engine Training

Once the training data corpus (a large collection of words, typically a translation
memory) has been selected, it is the role of the computational linguists to decide
how to optimize the data, with the key objective of ensuring high-quality MT
performance. Computational linguists, working with data specialists and engineers,

CONFIDENTIAL 24 SDL Certification | Post-Editing Certification


need to view the data from a high-level vantage point but also analyze the details to
understand the eventual translation output.

Data cleaning can also take place at this stage. This is a process applied to the
training corpus in order to make it compatible with the platform where the SMT
engines are created. This process improves the quality of the data by removing
content that could adversely affect the MT output, such as tags, entities, misaligned
segments and corruptions. These elements could appear in the output and provoke a
drop in productivity. Some parts are also harmonized to achieve MT output that will
be faster to post-edit, as fewer changes will be required.

There is a close relationship between engine data and future client content. A
combination of automated processes and human validation ensures that tests
remain relevant for use with future data. Being able to spot obvious issues such as
corruption in a million-word data set (impossible for the human eye) is critical when
assessing content. Recording the results in a structured way will ensure that lessons
are learned for future efforts.

It is likely that several attempts are needed (possibly simultaneously) in order to find
the optimal engine. The choice of engine design must be agile, based on experience,
incorporate knowledge of previously successful configurations and draw on an
inherent understanding of statistical machine translation behavior. Trainers must be
able to incorporate industry-standard automated evaluations to narrow down the
best engine candidates for human evaluation.

4.3.

Once an MT solution has been proposed for a specific use case, it is important to measure how well
it performs. To achieve this, a number of tests can be performed. The key considerations to be taken
into account are:

 MT evaluations should be relevant to the content being translated. It is important to test


representative files of a sufficient size
 MT evaluations should be relevant to the specific use case
 MT evaluations should take production scenarios into account, as predictions should stand
up in day-to-day commercial reality
 Test sentences need to be unleveraged against corpus material used for engine creation
 Experienced or future project translators should carry out the tests if possible

We can distinguish between human evaluations and automated evaluations.

CONFIDENTIAL 25 SDL Certification | Post-Editing Certification


When done well, a human evaluation is still often considered to be more reliable than automated
measures, and has the added advantage that a human translator can provide useful comments on
the issues found in the MT output.

Human evaluation of MT quality assessment normally uses Likert-based scales. With this method,
resources are asked to score aspects of the MT output by following a list of parameters associated
with a numerical scale. For example, “score 5 if the output is entirely correct, score 4 if the output is
understandable but has grammatical errors,” and so on. This kind of assessment mainly focuses on
the understandability, utility and actionability of the MT output, although some vendors have
started looking into Likert-based scales that could help assess the post-editing effort.

Human evaluation can also be used to compare two or more MT engines or systems in order to
select the best one, and is based on the evaluator stating their preference between two or more MT
outputs generated for the same source sentences.

Human evaluation testing is also used to assess the level of productivity enhancement when working
with MT compared to human translation. In this case, the tests should try to match the real-life
production environment of the user while still focusing on the quality of the MT output. Some tests
mimic the features of a standard production environment while recoding detailed analytics. Most
productivity tests in the industry are based on a combination of measuring post-editing speed, and
post-editing effort and distance, or comparing post-editing speed with conventional translation
speed. This makes it possible to correctly predict the increased productivity that post-editing can
offer.

The main disadvantages inherent in human evaluation are higher costs and the fact that humans are
prone to subjectivity. However, as the use case becomes more complex (such as in the case of
PEMT), so does the requirement to test in a way that is relevant to the particular project. Quality and
productivity evaluations that involve human testers are naturally more costly, but provide conclusive
results overall.

In the last few decades, many methods for automated evaluation have been proposed. For some use
cases, these offer a quick, recognized and cost-effective way to analyze the potential quality of a MT
engine.

Most automated measures assess the quality of the machine translation compared to a reference
translation that is deemed to be high quality. Some of the most widely used approaches are detailed
below.

BLEU (BiLingual Evaluation Understudy) score: This algorithm aims to evaluate the quality of text
that has been machine translated. The central idea behind BLEU is “the closer a machine translation
is to a professional human translation, the better it is.” To assess this, scores are calculated for
individual translated segments—generally sentences—by comparing them with a set of good-quality
reference translations. Those scores are then averaged over the whole corpus to reach an estimate
of the translation’s overall quality. Even though BLEU has become a standard in the industry, it has
its limitations. Intelligibility or grammatical correctness are not taken into account explicitly, for
instance, as they are supposed to be included in the correct reference translations.

NIST: The name of this metric comes from the US National Institute of Standards and Technology.
This measure is based on the BLEU score, but it differs from the BLEU algorithm in several ways.

CONFIDENTIAL 26 SDL Certification | Post-Editing Certification


While BLEU simply calculates how many n-grams match both in the reference translation and in the
MT output and gives these n-grams the same weight, NIST also calculates how “informative” a
particular n-gram is. When a correct n-gram is found, the algorithm measures whether that
combination is a common sequence in the corpus material or whether that fragment is not that
common in the data. Depending on the result, an n-gram will be given more or less weight. To give
an example, if the bigram “on the” is correctly matched, it will receive a lower weight than the
correct matching of the bigram “interesting calculations,” as this is less likely to occur.

NIST also differs from BLEU in terms of how some penalties are calculated. For example, small
variations in translation length do not impact the overall NIST score as much as in BLEU.

METEOR (Metric for Evaluation of Translation with Explicit ORdering): This metric was designed to
address some of the problems found in the more popular BLEU metric, and also produces a good
correlation with human judgment at the sentence or segment level (this differs from the BLEU
metric in that BLEU seeks correlation at the corpus level). With this system, several features that had
not been part of any other metrics at the time were introduced. Matches in METEOR are made by
following the parameters below, among others:

 Exact words: As with other metrics, a match is made if two words are identical in the
machine translation output and the reference translation
 Stem: Words are reduced to their stem form. If two words have the same stem, a match is
also made
 Synonymy: Words are matched if they are synonyms of one another. Words are considered
synonymous if they share any synonym sets according to an external database

Levenshtein distance: This metric measures the similarity or dissimilarity (“distance”) between two
text strings by calculating the minimum number of single-character edits (insertions, deletions and
substitutions) required to change one word into another. In the field of machine translation, this can
be done by comparing the raw MT output to the human translation.

Let’s look at a couple of examples:


The Levenshtein distance between “sport” and “short” is 1, because one edit is required to convert
one word into the other (replace “p” with “h”).

The Levenshtein distance between “dog” and “frog” is 2, as it is not possible to convert the first
word into the second with fewer edits (replace “d” with “f” and add “r”).

This algorithm always has a maximum value that corresponds to the maximum length of both input
strings. In the case that two words do not have anything in common, the minimum number of edits
will not exceed the maximum number of characters in the longer string.

Example: If we have “computer” and “alibi”, the Levenshtein distance will be 8 and no higher than 8:
Replace “c” with “a”
Replace “o” with “l”
Replace “m” with “I”
Replace “p” with “b”
Replace “u” with “I”
Delete “t”
Delete “e”
Delete “r”

CONFIDENTIAL 27 SDL Certification | Post-Editing Certification


As with other automated measures, the results of the Levenshtein distance are not set in stone. As
mentioned before, there can be many correct translations for a single source. The Levenshtein
distance will not be able to measure quality on its own. Results will vary, for example, if clauses are
positioned differently in the MT output and in the human reference translation.

Example:
MT: “If I go home after 10pm, I will let you know”
Reference human translation: “I will let you know if I go home after 10 pm”

In this case, the MT output is correct and no changes would be necessary during a post-editing stage.
However, the Levenshtein distance will be quite high, as many changes would be required to turn
the first sentence into the second one.

This demonstrates once again the importance of selecting large test beds to run any of these
automated evaluations on, as that will allow us to obtain more reliable results.

TERp: This is a word-based metric that calculates the minimum number of edits required to match
an MT output to a correct reference translation, normalized by the length of the reference.

# of edits
TER =
average # of reference words

TERp (TER-Plus) is an extension of Translation Edit Rate (TER) and builds on the success of TER as an
evaluation metric and alignment tool. At the same time, it addresses several of TER’s weaknesses
through the use of paraphrases, morphological stemming and synonyms, as well as edit costs that
are optimized to correlate more closely with various types of human judgments.

Put simply, TERp measures the number of edits that are necessary to go from the raw MT output to
a final edited version. As such, it is a helpful metric to measure typing and editing effort.
The TERp score is a number from 1 to 100; the higher the number, the more editing was required.

Another alternative present in the industry is TAUS’s Dynamic Quality Evaluation Framework (DQF),
which was developed to tackle the general problem of evaluating translation quality.
The framework allows users to profile their content and receive guidance on best-fit evaluation
methods. DQF provides a vendor-independent environment for evaluating translated content.

A knowledge base documenting best practices provides detailed practical information on how to
carry out different types of quality evaluation. By establishing best practices, metrics and
benchmarks within a dynamic framework, best-fit evaluation approaches are applied depending on
content type and usage.

In conclusion, automatic measures do present some limitations. Especially in the case of assessing
productivity, the cognitive effort made by the post-editor to deliver a high-quality end product is not
really accounted for. For example:

– Read the source and MT output


– Consider and apply necessary edits to the MT output
– Validate terminology and check against relevant reference materials
– Ensure consistency with bordering segments
– Ensure proper text flow

CONFIDENTIAL 28 SDL Certification | Post-Editing Certification


– Research new concepts and raise queries
– Run the required quality assurance checks

The task of post-editing really amounts to much more than purely typing and editing. For that
reason, automated metrics are more suited to engine training development and comparison, but are
not necessarily practical in a production scenario. Human evaluations are difficult to replace in this
sense.

However, we also need to see different testing methodologies that respond to the demands of the
MT industry. The potential to use Translation Edit Rate as a measure of MT productivity in particular
has generated a lot of interest for a while now, and some MT clients are certainly very aware of it
and are very interested.

For us, it is important that our engines are fit for purpose, and our current testing methodology
ensures that the engine quality is good enough for post-editing. However, we know that our testing
methods also need to evolve in line with MT developments.

CONFIDENTIAL 29 SDL Certification | Post-Editing Certification


CONFIDENTIAL 30 SDL Certification | Post-Editing Certification
5.1.
Post-editing is the task that replaces conventional translation for MT projects. Within a professional
environment, the working tools, applications and reference materials will remain the same as for a
conventional translation approach. Machine translation is a new component in the process that
provides human translators with more leverage, in addition to using CAT tools and TMs. Post-editors
focus on refining the pre-translated content to the required quality level.

Translators new to post-editing will develop this skill with time. Post-editors will not be fully
productive from day one as they need to learn their trade. Industry research has shown that
experience is the single most important factor in translation productivity, and this becomes even
more influential in post-editing. Over time, translators learn strategies that help them adapt their
working practices to use the MT output to their advantage.

Integrating Post-Editing into a Production Environment

When preparing a file for post-editing, the translation memory is applied as usual to obtain the 100%
and fuzzy matches. Machine translation is applied to any untranslated text left after the TM is
applied.

The post-editing phase itself involves a number of key stages. Since the post-editor is attempting to
be as efficient and productive as possible, preparation is key. Do not rush ahead without taking time
to consider the source and MT output. Determine the useable parts and then build around these.
Focus on accuracy, without under-editing or over-editing, and finally check over the grammar and
terminology.

CONFIDENTIAL 31 SDL Certification | Post-Editing Certification


• Read the source segment first and then the MT output
1

• Determine the usable elements (single words and phrases) and make
them the basis for your translation
2

• Build from the MT output and use every part of the MT output that can
speed up your work
3

• Take care not to over-edit (unnecessary rephrasing) or under-edit (wrong


prepositions, inflections, compounds etc.) the MT output. The adjustment
of style (such as “may” versus “might”) can be optional, but grammatical
4 correctness in the target is not

• Correct any grammatical errors and make sure that the terminology of the
MT output is compliant with glossaries and termbases. These aspects will
always need to be checked, as any inconsistencies in the training material
5 will be reproduced in the output

• Run the compulsory checks (spelling, grammar, terminology check)


6

• Finally, after post-editing each segment, reread your translation and make
sure that no details are missing and you have not left any words that are
7 not needed

5.2.
The market makes a distinction between two degrees of post-editing: post-editing to publishable
quality and post-editing to an understandable level or light PE. Quality expectations need to be
determined in conjunction with the customer — end quality is dependent on client requirements.

CONFIDENTIAL 32 SDL Certification | Post-Editing Certification


While the present workbook focuses mainly on publishable-quality post-editing (see Section 6.2
“Post-Editing Quality Expectations”), it is important to be aware of light PE too.

Post-editing to publishable level is the highest quality standard. This is in line with the expectations
of the majority of SDL’s clients. After post-editing, files undergo a quality check to ensure that the
translation is correct and fluent. The final quality should be comparable to conventional translation.

Post-editing to understandable quality, or light post-editing, is normally required for low-visibility


text, or texts that would not otherwise be translated as it would be too expensive and time
consuming. A client might decide to opt for understandable-quality texts in order to reduce the
number of support requests for a product or to provide an extra service to the user, for example.
The typical purposes of understandable-quality texts include offering users a quick answer on how to
fix an issue or providing a translation solution for low-visibility content, such as FAQs, blogs and
knowledge bases.

The basic principles of light post-editing can be summarized as follows:


 Focus is on meaning, not grammar or style
 Style is basic, not fluent
 Register and tone are basic, not adapted to text type/target audience
 Sentence structures are readable, not perfect in terms of spelling and grammar

When post-editing a text to understandable quality, the user should not expect a grammatically or
stylistically perfect translation. Grammar and spelling mistakes will only be corrected if the meaning
is affected.
 Instructions in the text must be actionable, not perfectly worded
The translation is not expected to be perfect but should be actionable. As an example, if
a knowledge base article or FAQ is post-edited to understandable quality, the user
should be able to understand how to fix a particular problem.
 Terminology is appropriate in context, not client-specific
For light post-editing, the post-editor will be asked to fix critical terminology errors that
prevent the end-user from understanding the concept (NB: Automatic checks for client-
preferred terms can only be corrected if a glossary in MultiTerm format is provided).
 Text is understandable, not consistent
Light post-editing does not allow the post-editor to check consistency across the text.
 Information is accurate, not localized to in-country standards
 Wrong punctuation is not corrected

CONFIDENTIAL 33 SDL Certification | Post-Editing Certification


Overall, the post-editor is expected to deliver an acceptable quality level in line with the guidelines
set out above. The following example of a quality matrix could be used as a guideline.

Error Category Specific Issue Light PE Comment

Mistranslation Yes Critical errors only

Automated checks for client-


In-context terminology preferred terms at end of project
Terminology Yes
choices possible only if glossary provided in
MultiTerm format

Accuracy Omissions/additions Yes Critical errors only

Grammar No Unless meaning is affected

Language Spelling No Unless meaning is affected

Punctuation No

Consistency No

Style General style No

Country standards No
Country
Register and tone No

The following table lays out the differences in quality requirements for different use cases.

CONFIDENTIAL 34 SDL Certification | Post-Editing Certification


To provide an idea of the quality level required, please see some examples of the light PE use case.

Examples of Light Post-Editing


LP SOURCE EN MT EN PE COMMENTS
IT-EN Attrezzo di Tools for Tool for compression The plural needs to
compressione compression to to measure cylinder be edited because
per misurare la measure cylinder liner protrusion ( use “attrezzo” is
sporgenza delle liner protrusion ( use with 380000364 and singular in the
canne dei with 380000364 and specific plates) Italian source, but
cilindri (da specific plates) there is no need to
utilizzare con remove the space
380000364 e after the bracket.
piastre
specifiche)
IT-EN Prima di Always stop the Always stop the There is no need to
iniziare engine and remove engine and remove change the
qualsiasi lavoro the Key before the Key before uppercase to
in quest'area, working in this area. working in this area. lowercase.
spegnere il
motore ed
estrarre la
chiave di
accensione.
FR-EN Si la valeur If the desired If the desired “Required” would
souhaitée n’est pressure has not pressure has not be better than
pas obtenue, been reached, repeat been reached, repeat “desired,” but since
répéter les instructions 3 to 5. instructions 3 to 5. this is perfectly
instructions 3 à understandable,
5. there is no need to
change it.
EN-DE To remove the Zum Entfernen des Zum Entfernen des The MT has the
3D diffuser: 3D 3D wrong case “des”
Refraktionstechnik: Refraktionstechnik: instead of “der”. But
the MT sentence is
perfectly
understandable as it
is.
EN-FR The pressure is La pression est réduit La pression est réduit The gender
reduced to à la pression pilote. à la pression pilote. agreement is
pilot pressure. wrong, should be
“réduite“ instead of
“réduit”, but the
sentence is
understandable as it
is and that does not
need to be
corrected.

CONFIDENTIAL 35 SDL Certification | Post-Editing Certification


5.3.
It is recommended that the post-editing process is followed by a quality check, which is the
equivalent of conventional review.
As part of SDL’s workflow, the quality check is performed as a separate step by a reviewer and
guarantees that the translation has achieved the required level of quality depending on the degree
of post-editing applied. To achieve this, quality at source is key — in the case of post-editing to
publishable quality, the post-edited file should already be of this level. To facilitate this, ensure that
the post-editor receives clear instructions and has access to all of the most up-to-date reference
materials. The required QA checks need to be run and can be used as an indication of the post-
editing quality.

When quality checking, always bear the MT in mind and understand the initial MT output. Identify
known problems in advance (see Section 7) and make sure to include them in your checks (e.g.
wrong prepositions, terminology, known issues with MT). It is important to learn to distinguish
between what needs to be changed and what can remain untouched. Note that there are some
items that always need to be amended by the post-editor. Examples include date formats, spacing,
wrong prepositions or terminology issues caused by several possible translations of the same word.

When quality checking post-edited material, focus on over-editing and under-editing (depending on
style and client requirements) (see Sections 6.3 and 6.4). Over-editing will lead to lower productivity
and needs to be avoided during both the PE and the QA check phase. Under-editing may result in
quality issues and will impact negatively on the time needed for quality checking.

Before starting a quality check, make sure that all the content has been translated. Then check that
the post-edited text reads well from a user’s point of view. The post-edited text must match the
source. Be careful to look for mistranslations, words left out of the translation or additional words
that are not in the source text. Check that there are no typos. Scrolling down the file will enable you
to spot spelling mistakes and inconsistencies. Terminology, especially product names, should be
consistent with the master glossary. Sometimes, terminology is not consistent in the TMs and there
are additional lists and guidelines for terminology. Finally, check that the overall style is consistent
with the rest of the files and complies with the style guide from the client.

CONFIDENTIAL 36 SDL Certification | Post-Editing Certification


CONFIDENTIAL 37 SDL Certification | Post-Editing Certification
6.1.
In order to post-edit effectively, it is essential to use the machine translation output as much as
possible. Do not ignore the machine translation output and do not translate segments from scratch.
In almost all cases, some parts of the automated translation output can be used and help speed up
work.

The following guidelines will help you identify usable parts and achieve the maximum post-editing
productivity. The translator needs to achieve publishable quality at the post-editing stage without
sacrificing translation speed. Once you have learnt to identify usable parts and to use them, you will
find post-editing easier and faster than translating from scratch. Like any new skill, however, there is
a learning curve with MT post-editing — but the more you practice, the faster and easier it gets.

Post-Editing Tips

Do not make alterations If formatting is an issue,


Do not ignore or erase restore the original
for the sake of variation
the MT output source format and paste
alone
the useful MT parts
instead

Maximize usage of the Do not replace words


If there are many tags,
MT output with synonyms
an alternative is to
delete them, edit the
text, then insert the tags
Do not spend time again
Use the appropriate researching terminology
style and terminology unless the MT is clearly
wrong At the end, reread the
segment and compare it
to the source for
Follow the If the MT meets the accuracy
project/client style project requirements,
guidelines do not modify it

Apart from this, it is important to bear in mind that account knowledge is important for post-editing
as well. While this is important for all translation projects—conventional as well as MT—a solid
knowledge of the project requirements with regard to style guidelines, terminology, the TM and
client expectations will help you achieve good post-editing productivity.

CONFIDENTIAL 38 SDL Certification | Post-Editing Certification


So What Makes a Good Post-Editor?

Excellent
linguistic skills

Domain and
Practice! subject
knowledge

Proficiency
Positive with CAT tools
attitude and
towards MT automated
text-checking

6.2.
The quality expectations will vary according to the degree of post-editing and the client
requirements. However, certain general principles apply. The aim is to deliver a high-quality
translation more quickly than a conventional translation. Translation speed is a key factor when
post-editing. Therefore, the machine translation needs to be corrected with a view to maintaining
efficiency.

When post-editing to publishable quality, there should be no difference in quality between a human
translation and a post-edited translation. However, there may be a slight shift in style. Style should
be correct and appropriate to the project, but may need to be less refined to allow for a more
efficient use of the MT output. Where a client specifically asks for MT to be used on their project,
the client needs to be made aware of this and expectations need to be managed accordingly.
There will, of course, be a certain amount of variation — but this is a feature of conventional
translation as well. So long as the quality criteria are adhered to, a post-edited text will be
considered to have met the quality expectations.

CONFIDENTIAL 39 SDL Certification | Post-Editing Certification


Post-Editing Quality Criteria

• The translation must be a correct reflection of the source.

• Spelling and punctuation must be correct.

• The translation must be grammatically and syntactically correct and reflect


the conventions of the target language.

• The correct terminology must be applied and used consistently (including


preferred translations for frequently occurring terms).

• Cultural references (date and time formats, units of measurement, number


formats, currency etc.) must be correctly adapted.

• The style and register of the target must be appropriate for the document
type.

• The original formatting must be reproduced.

• Project guidelines must be followed.

• The translation must read well and be suitable for the end user.

There are two main issues that post-editors often face when attempting to fulfill the highest-
possible quality criteria in the shortest amount of time. These are under-editing and over-editing.

CONFIDENTIAL 40 SDL Certification | Post-Editing Certification


6.3.
If a post-editor has under-edited the MT output, they may have missed important errors that
needed to be corrected and this may reflect badly on the quality of the translation. Under-editing is
generally characterized by the following features:

• Errors (spelling, typos)


• Mistranslations (target does not match source)
• Inconsistent terminology
Under-editing • Inaccuracy
• Inconsistency in figures, units of measurement etc.
• Incorrect formatting
• Not following project-specific instructions

Below are some examples of under-editing.

LP Source MT PE Reviewer Comment


The term “cifras” has
On its walls been correctly post-
you'll En sus murallas, En sus murallas En sus murallas edited and replaced
discover the descubrirá la descubrirá la descubrirá las with “figuras”, but the
figures of a cifras de un figuras de un figuras de un article “la” has not
puma and a puma y una puma y una puma y una been changed to the
EN-ES snake. serpiente. serpiente. serpiente. plural form.
The preposition “de”
has been correctly
post-edited, but the
En su interior se En su interior se En su interior se article “una” does not
Inside you puede ver una puede ver una puede ver un correspond to the
can see a altar de altar de altar de gender of the noun
sacrificial sacrificios de sacrificios hecho sacrificios hecho “altar” (“una” is
altar made of una enorme con una enorme con una enorme feminine whilst
EN-ES a huge stone. piedra. piedra. piedra. “altar” is masculine).
“Combien de temps
dure” should not be
combined with the
word “autonomie.”
Combien de The literal translation
Combien de temps dure Quelle est of “How long does
How long will temps dure l'autonomie de la l'autonomie de XXX last” is not
the battery l'autonomie à batterie lorsque la batterie appropriate in this
last using partir j'utilise les lorsque j'utilise context. The correct
interactive d'interactive fonctions les fonctions version is “Quelle est
features fonctions interactives interactives l’autonomie.”
(such as (comme les (comme les jeux) (comme les The preposition “sur”
games) on my jeux) sur mon sur mon jeux) de mon is not appropriate in
EN-FR phone? téléphone ? telephone ? telephone ? this context.

CONFIDENTIAL 41 SDL Certification | Post-Editing Certification


6.4.
If a post-editor has over-edited the MT output, they may be taking extra time, which may affect their
overall productivity and reduce the benefits of post-editing. Over-editing is typically characterized by
preferential rather than necessary changes.

• Do not rewrite the translation unless unavoidable


• Do not change correct and understandable
translations, even if they could be phrased more
naturally or fluently
Over-editing • If the MT output style meets the project
requirements, do not change it
• Reduce changes to a minimum and focus on actual
mistakes

There is always room to allow for stylistic changes and creativity with post-editing, and stylistic
features that do not meet the client style guidelines should certainly be amended. The important
thing to remember is not to let preferential changes distract from necessary amendments and not to
let these changes have a negative impact on the overall productivity.

Below are some examples of over-editing.

Comment on
Language PE with Over- PE without Over-Edited
Pair Source MT Editing Over-Editing Version
Fotos mit 1,3 Photos with 1.3 1.3 megapixel photos Photos with 1.3 Unnecessary
Megapixeln megapixels megapixels reordering of
DE-EN elements.
Zudem In addition there are There are various In addition, there Unnecessary
stehen different SATA-types types of SATA are different SATA rephrasing and
verschieden are available, such as available for this, types available, change of syntax.
SATA-Typen micro SATA or Slimline such as micro SATA such as micro
zur Auswahl, SATA. or slimline SATA. SATA or slimline
wie z.B. Micro SATA.
SATA oder
Slimline-
DE-EN SATA.
Mit der 1 With the 1 meter long You can optimise With the 1-m Unnecessary
Meter langen Tischantenne you can your WLAN reception table-top antenna reordering of
Tischantenne significantly optimize significantly using the you can elements; more
können Sie your WLAN-reception. 1-m table-top significantly of the MT can be
Ihren WLAN- antenna. optimise your left unchanged if
Empfang WLAN reception. syntax is kept as
deutlich is.
DE-EN optimieren.
Make sure Sicherstellen, dass das Währenddessen Das Bremspedal Unnecessary
that the brake Bremspedal muss das Bremspedal muss rewrite; usable
pedal is niedergedrückt wird weiterhin gedrückt niedergedrückt parts of the MT
depressed während Sie dieses werden! sein, während Sie were ignored in
while you Verfahren dieses Verfahren over-edited
perform this durchführen. durchführen. version.
EN-DE procedure.

CONFIDENTIAL 42 SDL Certification | Post-Editing Certification


Install the Installieren Sie die Installieren Sie den Installieren Sie den Unnecessary use
Bluetooth Bluetooth Drucker auf Bluetooth-Drucker Bluetooth-Drucker of synonyms;
printer on Ihrem Computer, und auf Ihrem Computer, auf Ihrem verb “einrichten”
your richten Sie ihn als und legen Sie ihn als Computer, und was
computer and Standarddrucker. Standarddrucker richten Sie ihn als unnecessarily
set it as the fest. Standarddrucker replaced by
default ein. “festlegen.”
EN-DE printer.
Allow the Warten Sie, bis der Gestatten Sie, dass Warten Sie, bis der Unnecessary use
computer to Computer die Sperre der Computer nach Computer die of synonyms;
lock automatisch nach 10 10 Sekunden Sperre nach verb “warten”
automatically Sekunden. automatisch gesperrt 10 Sekunden was
after 10 wird. automatisch unnecessarily
seconds. aktiviert. replaced by
“gestatten”
(“warten”
conveyed the
same meaning in
EN-DE this context).
After Unnecessary
disconnecting change of syntax.
the high Après avoir
voltage Après le débranché les
terminals, Après avoir débranché débranchement des bornes, barres
busbars, etc., les bornes haute bornes, barres collectrices, etc.
insulate the tension, jeux, etc., collectrices, etc. haute tension,
parts with isoler les pièces avec haute tension, isoler isoler les pièces
insulating de la bande adhésive les pièces avec du avec du ruban
EN-FR tape. isolante. ruban isolant. isolant.
Correct
Alternator is Le client trouve que expression in MT;
found to be L'alternateur est l’alternateur est L'alternateur est not needing any
EN-FR noisy bruyant bruyant bruyant editing.
The oil in Unnecessary
these rephrasing.
passages is
trapped and L'huile dans ces
the blade L'huile dans ces La lame ne bouge pas passages est
does not passages est piégée et car l'huile de ces piégée et la lame
EN-FR move. la lame ne bouge pas. conduits est piégée. ne bouge pas.
Be sure that Accertarsi che il Assicurarsi che il Accertarsi che il Unnecessary use
the hydraulic flessibile idraulico sia flessibile idraulico sia flessibile idraulico of a synonym.
hose is free of privo di abrasioni. privo di abrasioni. sia privo di
EN-IT abrasion. abrasioni.
Adjust the Regolare l'angolo Sollevando la parte Regolare l'angolo Unnecessary
angle by sollevando la parte posteriore del sollevando la parte reordering of
raising the posteriore del veicolo veicolo, regolare posteriore del phrases.
rear of the per assicurarsi che l'angolo per veicolo per
vehicle to l'acqua copre i giunti. assicurarsi che assicurarsi che
ensure water l'acqua copra i giunti. l'acqua copra i
covers the giunti.
EN-IT joints.
The only way L'unico modo per Per permettere al L'unico modo per Unnecessary use
to allow the consentire il dispositivo di convali consentire al of synonyms and
device to dispositivo per convali dare un certificato dispositivo di conv reordering of
validate a dare un certificato autofirmato, l'unico alidare un phrases.
self-signed autofirmato per modo è quello di certificato
certificate is installare il certificato installare il certificato autofirmato è
to install the sul dispositivo. sul dispositivo. quello di installare
certificate on il certificato sul
EN-IT the device. dispositivo.

CONFIDENTIAL 43 SDL Certification | Post-Editing Certification


CONFIDENTIAL 44 SDL Certification | Post-Editing Certification
7.1.

Post-editing is not a light review of machine-translated content. MT output is rarely flawless and
post-editors need to notice the issues and evaluate which parts of the MT output can be used for the
final translation. Post-editing is most efficient when the translator knows what kind of issues to
expect. MT output issues depend to some extent on the project and language combinations as well
as the type of MT technology used.

Common Errors in SMT


Statistical MT output is usually more fluent in style than rules-based output but post-editors need to
look out for the following issues.

Extra words in the Syntax and word order


Formatting issues
target errors

Missing words in the Terminology


Incorrect punctuation
target inconsistencies

Incorrect or missing
Mistranslations/ Vocabulary left in
articles and
antonyms source language
prepositions

Noun compounds in the


Wrong gender, number,
wrong order/ not Incorrect capitalization
agreement, inflection
compounded

Since statistical processes do not actively involve linguistic information in the translation process
(excluding any rules that are extracted from statistical regularities), the opposite meaning may occur
in the MT output (i.e. positive sentences instead of negative sentences and vice versa). This is
relatively easy to post-edit but should always be checked during the post-editing stage as it is a
major quality concern.

CONFIDENTIAL 45 SDL Certification | Post-Editing Certification


Such mistranslations and opposites are perceived as being far more serious than any other potential
errors related to SMT as there is a high potential for post-editors to miss them. This is also the case
with fuzzy matches and incorrect editing. We know that these kinds of errors are found across
language pairs and projects, but they are only reported sporadically, perhaps on account of how
easy they are to correct.

For some examples of typical MT behavior, refer to Appendix 2 “Post-Editing Examples.”

7.2. How to Provide Feedback to


Improve the MT Output
For MT systems, as for all tools and services, feedback from users is important to find out whether
they work as intended, highlight particular problems or determine whether they can generally be
enhanced and improved. At SDL, all feedback received on MT systems is analyzed and either dealt
with within the department or passed on to our technical team. In addition to this, all feedback is
logged using a dedicated tracking system to ensure that nothing is lost or forgotten, even if it may
not be possible to act upon it straight away.

Post-editor feedback is an important part of integrating post-editing into a workflow and is


invaluable to help maintain and improve MT output quality.

These are some of the key benefits of implementing a feedback process:


o Ability to fix technical issues
o Possibility to gain useful hints for retraining the engines
o Opportunity to gather experience and improve MT in the long run through continuous
research
o Engagement with post-editors as an integral part of the MT process

Some general guidelines for post-editors who would like to provide actionable feedback are
presented below:
 Feedback needs to be precise and specific in order to be useful and actionable
 It is important to report structural issues, not only individual mistranslations. Structural
issues are recurring errors with an immediate impact on productivity, e.g.:
○ Same particle wrongly appears at the start of sentences
○ Incorrect punctuation and additional spaces
 Indicating the frequency of errors helps determine possible error patterns

Feedback for Statistical MT Engines


The MT output of a statistical engine is based on:
 A corpus of bilingual data (mostly TMs)
 Statistical algorithms

Source and translation units are matched up based on statistical probability. As opposed to rules-
based engines, no grammatical rules are applied and no grammatical information is stored for
individual words or terms.

CONFIDENTIAL 46 SDL Certification | Post-Editing Certification


The SMT output quality relates directly to the quality of the input data (i.e. the bilingual data/TMs
used to build the engine). Improvements to the MT output with regard to terminology and style are
achieved by improving the input data. The more consistent the style and terminology are in the TMs,
the better the MT output.

When evaluating style and terminology for an SMT engine, it is also important to take into account
whether the engine in question is a customized engine (i.e. specifically created for one project) or a
vertical engine (i.e. created for a domain). Vertical engines combine data from many different
projects within a domain and can therefore be expected to produce less consistent output than
customized engines, which use project-specific data for a single project/client only.

Unless you are working with adaptive MT technology of some kind (see Section 9), SMT engines are
static once they have been created and can only be changed by retraining the engine, which involves
reassembling and re-uploading the data corpus. Post-editor feedback becomes very relevant and can
be implemented at that stage.

Retraining of Engines

Any unusual issues in the MT output that do not relate back directly to the data corpus should
always be reported immediately. Examples of such issues include corrupt characters, the random
insertion of words or numbers, or a high number of untranslated words in the target. These
occurrences may indicate a general technical problem that needs to be fixed urgently.

At the same time, many of the issues reported for SMT engines are in fact expected SMT behaviors.
Post-editors and project leads should therefore familiarize themselves with the issues they are likely
to encounter when post-editing statistical machine translation output. These were detailed in the
previous section, “Common Patterns to Watch for When Post-Editing.”

In all cases, post-editors can still influence the quality of MT by making sure that project dictionaries
are updated when working with rules-based MT, and by carefully maintaining TMs for statistical MT
in order to build up a clean data corpus for future use.

CONFIDENTIAL 47 SDL Certification | Post-Editing Certification


CONFIDENTIAL 48 SDL Certification | Post-Editing Certification
SDL Language Cloud (previously BeGlobal Community) offers a range of free-to-access baseline
engines. These are the core MT engines developed by SDL. Industry-specific MT engines that are
trained using content specific to a particular domain (e.g. automotive, travel or electronics) are also
available.

SDL Language Cloud machine translation can be applied to the files you receive from your clients and
the output can be post-edited to full publishable quality. Generally, output from the baseline
engines might require additional work for projects with very specific and technical terminology.
Industry-specific or domain MT engines can generate translation results that require less post-
editing.

Through a series of available subscription packages, the Language Cloud online platform also allows
users to create their own custom MT engines from Translation Memory eXchange (TMX) files. By
using custom MT engines, future machine translation outputs will be closer to customers’ previous
content.

8.1. How to Add SDL Language Cloud


as a Machine Translation Provider
in SDL Trados Studio
In Studio, MT providers are added in a similar way to TMs. You can therefore add SDL Language
Cloud either when you are creating a new project (e.g. using files downloaded from your workflow
system) or to an existing project (e.g. a Studio package received from a client).

During the project creation process, or via Project Settings for an existing project, go to “Language
Pairs” -> “Translation Memory and Automated Translation”, click “Use…” and then select “SDL
Language Cloud Machine Translation” from the dropdown menu:

CONFIDENTIAL 49 SDL Certification | Post-Editing Certification


If you have not already created an SDL Language Cloud account, you will need to do so now. Please
use your SDL My Account credentials to create an account (also known as an oos.sdl.com account).
If you have not used SDL Language Cloud before, a free 30-day trial will automatically start.

After the 30-day trial is over, you will be able to choose from a selection of packages available on the
SDL Language Cloud website, including a free package for free machine translation usage.

CONFIDENTIAL 50 SDL Certification | Post-Editing Certification


8.1.1. Applying MT Segment-by-Segment
Once you have added the MT provider to your project, you will immediately be able to see MT
output in the Editor for new segments in the project.

Segments where MT has been applied appear as “AT” (Automated Translation) in the Editor.

8.1.2.

To apply MT to whole files or projects, you can use the “Pre-translate Files” batch task.

In the settings for this batch task (accessible via Project Settings or when you run the batch task
itself), choose “Apply automated translation” for the “When no match found” option. Also note that
this batch task will reapply your TMs, so ensure that the minimum match value is set appropriately
(usually 75):

CONFIDENTIAL 51 SDL Certification | Post-Editing Certification


Your TMs will be applied first and any remaining new segments will have machine translation
applied.

Please note that machine-translated words are listed as “New” in Analysis reports.

8.1.3.

Retrieving MT output using AutoSuggest in Studio is another excellent way to increase productivity
through clever leverage of machine translation.

AutoSuggest editing is an optional feature of SDL Trados Studio that can be used to speed up
conventional translation. AutoSuggest monitors what you type and, after you have typed the first
few characters of a word, presents you with a dropdown list of suggested words and phrases in the
target language that start with the same characters. If one of the words or phrases matches what
you were about to type, you can automatically complete the word or phrase by selecting it from the
list. As you continue to type, the list of suggested words is continuously updated.

MT AutoSuggest offers alternatives of different lengths, which can be useful.

CONFIDENTIAL 52 SDL Certification | Post-Editing Certification


For example, instead of accepting the MT suggestion for the full segment, you can use smaller, more
suited chunks. The feature can even be used as a kind of background dictionary to avoid having to
type long words.

Example of longer chunks that are not perfect, but using a shorter chunk can still provide
productivity increases through assisted typing.

Another example of longer chunks that are not perfect, but using a shorter chunk can still provide
productivity increases through assisted typing.

When the MT output is good enough, you can use longer chunks, or even fully translate the segment
with the suggested MT output.

Example of good MT where a large chunk can be inserted, improving productivity effectively.

Example of good MT where the whole sentence can be translated instantly, with no post-editing
required.

As the suggested MT output is entirely optional when using this feature, the user can simply ignore
the suggestion and continue as normal if a suggestion is not good enough for a specific section.

You can set up AutoSuggest to provide suggestions from other sources other than MT. All
suggestions will be displayed in the same dropdown list.

As you can see in the following screenshots, an icon will help you identify the source of the
suggestions. These can include termbase (green can icon), MT suggestions (blue AT icon), TM

CONFIDENTIAL 53 SDL Certification | Post-Editing Certification


suggestions (orange “F” icon for fuzzy matches and green “100” icon for 100% matches), and upLIFT
fragment recall (green and yellow square icon).

Retrieving MT output using AutoSuggest is another excellent way to increase productivity through
clever leverage of machine translation.

8.2.

A Language Cloud subscription allows customers to enforce specific terminology by using Language
Cloud (LC) dictionaries. Dictionaries use the same underlying technology, and are relatively easy to
set up, select and maintain. Dictionaries can be used in combination with any type of engine
available in LC (baseline, domain or customized engines). Other advantages include the fact that
more than one dictionary can be applied per project and that exact term matches will be translated
correctly.

The main disadvantage is that the fluency of translations may suffer: If a term is found in a source
segment, the engines will translate the part of the sentence before the term, add the translation
from the dictionary, and then translate the part of the sentence after the term. If a sentence has one
or more terms, this may have a negative effect on the quality of the MT output. It is recommended
to carry out some tests using the actual data to check whether the quality loss is acceptable as a
trade-off for correct terminology.

LC requires dictionaries in TBX format. Lists of terms in Excel or MultiTerm termbase (SDLTB) format
can easily be converted to TBX using tools such as the Glossary Converter tool, which is available for
free in the SDL AppStore. The tool includes a help file with easy-to-follow instructions.

CONFIDENTIAL 54 SDL Certification | Post-Editing Certification


Glossary Converter interface

8.2.1.

Due to the high impact that enforcing terminology has on the MT output in LC, it is important to
follow the criteria below when creating a dictionary for this purpose:

 Enter structures or terms that are frequently mistranslated in your MT output


 If possible, avoid commonly used nouns and verbs
 To avoid introducing mistranslations, only add unambiguous entries to your list of terms
 The use of one-word entries should be limited — they will break up the sentence and reduce
the overall accuracy
 Aim to compile terms containing at least two words to facilitate a more accurate rendering
of their meaning
 Searches and replacements will be preserved as listed:
o Pay extra attention to capitalization in the target entry of your dictionary to prevent
incorrect enforcements
o For changes in gender/number, separate entries need to be added to cover all
variants (feminine and masculine, for instance)
o For best results, dictionaries should include all the variants for words that can be
spelled in several ways
 The source search performed by the engine is not case sensitive — do not include terms that
are identical to brand names. If they appear frequently in the data set, you might introduce
mistranslations

CONFIDENTIAL 55 SDL Certification | Post-Editing Certification


8.2.2.

Dictionaries are managed in a browser through your online LC account, which allows you to create
and maintain dictionaries. Let us look at how to get there. Once on the account page, use the menu
on the left to go to Machine Translation, and click on the Dictionaries tab.

Click on New Dictionary (the blue button at the top right of the tab), and enter a name and
description. It is strongly recommended to include the language(s) in the name, for example
Automotive Project 1_Dictionary_EN-ES.

Click on Add languages to open a language list. Select all the languages contained in your termbase,
and click Save.

CONFIDENTIAL 56 SDL Certification | Post-Editing Certification


Clicking Save in the main tab will save the (still empty) dictionary and take you to the dictionary
overview. Click the Import button and then the Choose file button to load a TBX file.

Check the status on the details page, and wait for it to switch to Done.

If you upload the same file twice or have overlapping content, no duplicates are generated, so you
can upload a later version with additional terms and it will overwrite an existing one.

The details page of a dictionary is accessible by clicking the Edit button next to the name.

CONFIDENTIAL 57 SDL Certification | Post-Editing Certification


8.2.3.

In your Language Cloud account, go to the Translator tab. Choose the source and target language,
and the dictionary you uploaded. If your dictionary does not appear in the list, you may have to
refresh the page or log out and then in again.

Translate a segment that contains a word from your term list. The translation should contain the
translation from the dictionary. Go to the Add a dictionary dropdown again, select None, and
translate the same sentence. You are likely to get a different translation. Of course, the translation
might have used the correct term anyway, so it is worth testing with several different terms and
segments.

CONFIDENTIAL 58 SDL Certification | Post-Editing Certification


CONFIDENTIAL 59 SDL Certification | Post-Editing Certification
AdaptiveMT, an SDL innovation that builds on SDL’s current statistical machine translation system,
opens up new and innovative ways of working for translators and post-editors. Research shows that
machine translation enters almost every discussion in the localization industry these days, and is not
only seen as an essential tool in scaling translation operations but is often hailed as the future of
translation productivity. However, despite the added value that MT offers, it is still not universally
welcomed and used. Post-editing suffers from an image problem, in particular when it comes to MT
output quality and ways to improve it. This has a direct impact on user experience, which is not
always as positive as it could be.

AdaptiveMT addresses the balance between MT technology and user experience by giving the post-
editor a much greater degree of control over machine-translated output and engine quality. The
machine learns from the corrections the translator makes, significantly reducing errors over time
and thus alleviating the frustration of having to correct the same mistakes again and again.

AdaptiveMT is self-learning machine translation where the MT engine adapts in real time to the
terminology and style used by the translator, based on each individual post-edited segment.

AdaptiveMT is available in Trados Studio 2017 for the following language pairs on the SDL baseline:
English > French, German, Spanish, Italian and Dutch. Reversed language directions and other
language pairs (such as Japanese <> English) are expected to follow in 2017.
Adaptive engines are personal at present; however, plans to allow multiple translators to share
adaptive engines are in the pipeline.

How Does AdaptiveMT Work?


Currently, statistical machine translation engines learn during the engine training process by
analyzing the statistical relationships between large amounts of source and target data. Once an
engine is created, it is static. Any changes made during post-editing will populate the updated
translation memory but not the MT engine. MT engines are updated through a regular retraining
process. This means that without AdaptiveMT, the machine repeats the same errors — the
translator provides feedback during the post-editing process but the machine cannot benefit from
the feedback.

CONFIDENTIAL 60 SDL Certification | Post-Editing Certification


SMT Systems are Static

With AdaptiveMT, by contrast, the machine learns seamlessly and continuously from user feedback,
in real time, during the post-editing process. Any post-edits are fed back to the machine translation
engine, hosted by SDL, and this engine is completely personal to the translator. The post-edits are
exclusive to the translator and are not used to train any of SDL’s own engines. Translators will be
able to benefit from improved MT output from job to job which is adapted to their specific tone,
terminology and content. AdaptiveMT has evolved from a traditional static MT system to a highly
scalable and responsive dynamic system capable of self-learning and self-improvement.

SDL MT Innovation – AdaptiveMT

The core benefits of this new technology are highlighted below:


– Reduced frustration — reduces the need to edit the same incorrect MT output
– Interactive — works as the translator post-edits
– Improved productivity — increases translation productivity over time
– Cumulative learning — continues to learn from job to job
– Personalized — creates a personal adaptive MT engine for the user and allows the user to
create multiple personalized engines from a baseline MT engine

CONFIDENTIAL 61 SDL Certification | Post-Editing Certification


How Will AdaptiveMT Change the Way We Work?

MT technology and MT innovation add the most value when embedded in a well-designed and
structured process.

When put into the context of post-editing, AdaptiveMT is a new and powerful tool for translators
and post-editors which needs to be integrated into the existing MT lifecycle. This innovation is
currently targeted at an individual level, giving control to post-editors, but also opens up new MT
solutions and scenarios that will need to be delivered and integrated for translation teams, LSPs and
organizations. This will create the need to set up projects efficiently and share adaptive engines in
the same way as other translation resources such as termbases or TMs.

CONFIDENTIAL 62 SDL Certification | Post-Editing Certification


The machine translation industry is constantly evolving with the development of interactive and
personalized machine learning capabilities such as AdaptiveMT and the arrival of neural machine
translation, to name just two major trends. Add to this the importance that large organizations place
on MT as their localization tool of choice and it is easy to see why machine translation providers such
as SDL continue to lead innovation and invest heavily in this field.

The impact of machine translation on the translation industry is only set to grow, and technological
improvements pave the way for language pairs or content types previously considered unsuitable for
machine translation.

Interactivity will be key, not only during the post-editing process but also when training and
maintaining engines and measuring their quality. The role of the post-editor will become increasingly
strategic, and linguists and translators will be able to influence and drive MT output on several levels
— both through adaptive technology and through highly evolved tools to measure and predict
engine quality and human effort. The human element is one of the key influencing factors for
machine translation development and SDL remains committed to working with the translation
community to enhance technology progress and user experience.

Our objective is to harness the power of these technologies to connect people and help
organizations go global faster.

CONFIDENTIAL 63 SDL Certification | Post-Editing Certification


This training workbook outlines the basic principles involved in post-editing MT output, from
creating MT engines to quality checking. Readers are given a brief history of MT in Chapter 1,
followed by a discussion of the benefits of post-editing when addressing emerging global business
trends as well as the limitations of MT in Chapter 2.

Readers are subsequently introduced to the technology behind RBMT and SMT systems in Chapter 3
and are then guided through the engine creation options currently available in Chapter 4. This
includes information on preparation, testing and engine training, which is central to the MT process
at SDL and ensures the highest-quality output.

Readers are given a series of examples and tips on how to post-edit the raw MT output to different
degrees of understandability and to a high-quality, publishable standard in Chapters 5 and 6. The
latter chapter explains how post-editing as a skill can be integrated into the translator’s workflow
and how best to improve productivity without compromising on quality. Common pitfalls of MT
technology and how to provide feedback on the issues observed are discussed in Chapter 7.

The training workbook explains how to access MT through SDL Language Cloud in SDL Trados Studio
and why this can be advantageous to translators, both as a means of increasing efficiency and as an
introduction to the practice of post-editing.

Finally, there is an introduction to the recent development of AdaptiveMT, which allows for an
unprecedented level of interactivity between the post-editor and the MT output, bringing the
technology closer to the user.

The aim of the training workbook has been to cover the material above, while reassuring aspiring
post-editors that MT does not remove the need for human translation or the creative input of the
translator, but simply facilitates and accelerates their task. MT provides a means to an end rather
than an end solution in itself.

Over the last few years, MT has been moving out of the research lab into a bigger, more critical
playing field, powered by multidisciplinary teams, end-user satisfaction and a more rewarding post-
editing experience. The combination of structured user feedback loops and the latest, best-of-breed
MT technology has put the post-editor firmly at the heart of the MT lifecycle and improvement
process.

CONFIDENTIAL 64 SDL Certification | Post-Editing Certification


Ultimately, the objective of this workbook is to provide translators and anyone interested in post-
editing with the necessary background, as well as the practical knowledge necessary to meet the
demands of this changing MT market, enabling them to future-proof their offerings.

CONFIDENTIAL 65 SDL Certification | Post-Editing Certification


• SDL: http://www.sdl.com/services/language-
services/intelligent-machine-translation.html
• SDL Best Practices for Enterprise Scale Post-Editing –
White Paper:
Websites & https://community.sdl.com/solutions/language/machi
ne_translation/m/document-library/1740
White Papers • TAUS: http://translationautomation.com/
• ProZ:
http://www.proz.com/about/overview/education/
• GALA: https://www.gala-global.org/

•SDL General: http://www.sdl.com/event/webinars.html

•SDL “Go Global Faster with PEMT” webinar series:


•Part 1: MT Evaluation & Engine Creation
•Part 2: MT Output Testing
•Part 3: PEMT & Quality Assurance

Webinars •TAUS:
http://www.translationautomation.com/multimedia/mult
imedia#webinars
•GALA: https://www.gala-global.org/resources/videos-
downloads
•Program for the TranslatingEurope Forum 2016, featuring
video recordings:
https://ec.europa.eu/info/sites/info/files/tef2016_progra
mme_with_links_en.pdf

CONFIDENTIAL 66 SDL Certification | Post-Editing Certification


• ProZ:
http://www.proz.com/forum/machine_translation_
mt-844.html
Forums • TranslatorsCafé:
http://www.translatorscafe.com/cafe/MegaBBS/foru
m-view.asp?forumid=44

• SDL: http://www.sdl.com/community/blog/
• SDL AppStore: http://appstore.sdl.com/
Blogs & Other • Paul Filkin’s Blog (CAT tools, Studio):
https://multifarious.filkin.com/

CONFIDENTIAL 67 SDL Certification | Post-Editing Certification


Appendix 1: References
BLEU score available at http://www.cs.columbia.edu/nlp/sgd/bleu.pdf

METEOR available at http://www.cs.cmu.edu/~alavie/METEOR/pdf/meteor-mtj-2009.pdf

TER available at http://www.cs.umd.edu/~snover/pub/amta06/ter_amta.pdf

TAUS Dynamic Quality Framework White Paper available at https://www.taus.net/think-


tank/reports/evaluate-reports/taus-quality-dashboard-white-paper

Common Sense Advisory available at http://www.commonsenseadvisory.com/

CONFIDENTIAL 68 SDL Certification | Post-Editing Certification


Appendix 2: Post-Editing Examples
Language Source MT Output Post-Edited Comment
Pair
EN-DA The Loader has a Frontlæsserens Frontlæsseren har Delete extra word
ROPS decal har et et mærkat på (“førerværn”); adjust
showing the førerværn styrtbøjlen, der terminology; adjust
certification of the mærkat viser viser certificeringen preposition (“af” ->
ROPS, gross bekræftelse af af styrtbøjlen, “for”).
weight, approval, syrtbøjlen, bruttovægt,
regulation, and bruttovægt, godkendelse,
model number of godkendelse, direktiv og
the machine. regulering, og modelnummer for
modelnummer maskinen.
af maskinen.
EN-DA If you are routing Hvis du fører Hvis du fører kabler Insert comma after
the cables through kabler igennem igennem “monteringsoverfladen,”
the mounting monteringsoverf monteringsoverflad insert the words “skal
surface, mark the laden markere en, skal du markere du”.
location directly in placeringen placeringen direkte i
the middle of the direkte i midten midten af de tre
three pilot holes af de tre forboringshuller
(optional). forboringshuller (valgfrit).
(valgfrit).
EN-DA See your Owner’s Se indeholdt Se i Terminology: Replace
Manual for din’s for at få instruktionsbogen “din’s” with
detailed nærmere for at få nærmere “instruktionsbogen,” add
explanation about forklaring til forklaring om “MOTOR” to the word
the ENGINE START/STOP- knappen “START”.
START/STOP knappen. MOTORSTART/STOP
button. .
EN-DA When a jam occurs, Hvis der opstår Hvis der opstår Rephrase the sentence to
a message papirstop, en papirstop, vises en comply with Danish
indicating the jam meddelelse om meddelelse om grammar rules: The verb
location and papirstoppets papirstoppets “vises” should be placed
information to placering og placering og at the beginning of the
clear the jam oplysninger for oplysninger til sentence. Replace the

CONFIDENTIAL 69 SDL Certification | Post-Editing Certification


appears on the at afhjælpe afhjælpning af preposition “for” with
printer display. papirstoppet papirstoppet på the correct one (“til”).
vises på printerdisplayet.
printerens
display.
EN-NO The additional Den ekstra Den ekstra kraften Adjust grammar (noun
forces can lead to krefter kan føre kan gi raskere from plural to singular),
faster cycle times til raskere syklustider ved make phrase more
when digging in syklustider ved graving i tett idiomatic.
tightly compacted graving i tett sammenpakkede
materials or soil. sammenpakked materialer eller jord.
e materialer
eller jord.
EN-NO Autoguidance Automatisk Automatisk Adjust terminology,
disengages when styring kobles ut veiledning kobles ut adjust meaning.
the operator is not når føreren ikke når føreren ikke
seated. sitter godt. sitter.
EN-NO Open the rear door Åpne Åpne bakdekselet Adjust meaning.
of the printer and bakdekselet på på skriveren og
the rear duplex skriveren og den tosidigenheten, og
area, and then bakre fjern fastkjørt papir.
remove the tosidigenhet og
jammed paper. fjern det
fastkjørte
papiret.
EN-NO Can I log in to Kan jeg logge på Kan jeg logge på Adjust meaning
Twitter with the diskanthøyttale Twitter med (“diskanthøyttaler” >
Facebook account? r med Facebook Facebook-kontoen? “Twitter”), adjust
This camcorder konto? Dette grammar.
only supports Dette videokameraet
Twitter login with videokameraet støtter bare Twitter-
the Twitter støtter bare pålogging med
account. diskanthøyttale Twitter-konto.
r pålogging med
diskanthøyttale
r-konto.
EN-NO Why do I fail to Hvorfor hører Hvorfor klarer jeg Delete extra word and
upload my jeg ikke klarer å ikke å laste opp adjust grammar. Adjust
recordings online? laste opp min opptakene mine på terminology (“online” >
There is size opptak online? Internett? “på Internett”). Adjust
limitation to online Det er størrelse Det finnes grammar and change
sharing. begrensning til størrelsesbegrensni preposition.
nettbaserte nger for nettbasert
deling. deling.
EN-SV Excess tension can Överflödig Överdriven Adjust terminology
lead to final drive spänning kan spänning kan leda (“Överdriven” instead of
or idler breakage as leda till till att slutväxeln “Överflödig”); adjust
well as slutväxel eller eller ledarhjulet går grammar.
undercarriage ledarhjul brott sönder, liksom till
frame bending. liksom

CONFIDENTIAL 70 SDL Certification | Post-Editing Certification


underredesram att underredsramen
böjning. böjs.
EN-SV If you are routing Om du Om du drar Adjust terminology
the cables through vidarebefordrar kablarna genom (“drar” instead of
the mounting kablarna genom monteringsytan bör “vidarebefordrar”);
surface, mark the monteringsytan du märka ut platsen delete extra word
location directly in och märk ut precis i mitten av de (“och”); adjust grammar.
the middle of the direkt i mitten tre rikthålen
three pilot holes av de tre (valfritt).
(optional). rikthålen
(valfritt).
EN-SV This section Det här Det här avsnittet Translate untranslated
describes how to avsnittet beskriver term (“Visual”); translate
operate the audio beskriver hur in användningen av missing word
and visual system. ljud-och Visual ljud- och (“operate”); adjust
system. bildsystemet. grammar.
EN-SV At the moment I det ögonblick I det ögonblick då Adjust grammar.
fluoroscopy is då genomlysning är
possible, all genomlysning är möjlig fungerar alla
functions of the möjlig, alla funktioner i
system are funktioner i de systemet.
operable. systemet
fungerar.
EN-DE Adjusting the LED Justieren der Justieren des LED- Adjust
Tilt Angle LED Neigewinkels gender/agreement;
Neigewinkel correct noun
compound/hyphenation
EN-DE <1/>Select the <1/>Wählen Sie <1/>Wählen Sie das Add verb prefix; adjust
protocol that the das Protokoll, Protokoll aus, das grammar; correct noun
selected time dass die der ausgewählte compound/hyphenation
server supports. ausgewählten Zeitserver
Zeit-Server unterstützt.
unterstützt.
EN-DE The image file can Die Bilddatei Die Bilddatei kann Add missing verb.
be stored on a local kann werden auf einem lokalen
computer, in the auf einem Computer, im
local network, or at lokalen lokalen Netzwerk
an Internet Computer, im oder unter einer
address. lokalen Internet-Adresse
Netzwerk oder gespeichert
unter einer werden.
Internet-
Adresse.
EN-DE Replace the 10 O- Ersetzen Sie den Ersetzen Sie die 10 Adjust
ring seals on the 10 O-Ring- O-Ring-Dichtungen grammar/preposition;
exposed remote. Dichtungen auf am freiliegenden adust terminology;
die Zusatzsteuergerätbl delete extra word (verb)
freiliegenden ock. not in source.
Zusatzsteuerger
ät betätigt wird.

CONFIDENTIAL 71 SDL Certification | Post-Editing Certification


EN-DE The following basic Die folgenden Die folgenden Adjust word choice;
principles still apply
Grundsätze Grundsätze gelten adjust number
as with gelten immer ebenso wie für agreement (sing. ->
conventional noch als mit herkömmliche plural).
translation: herkömmlichen Übersetzungen:
Übersetzung:
EN-ES Remove the sealing Retire el tapón Retire el tapón Delete extra words not in
plug on the bottom hermético hermético de la source.
left corner. conector en la esquina inferior
esquina inferior izquierda.
izquierda.
EN-ES Now you can adjust Ahora se puede Ahora se puede Add missing article;
the focus, focal ajustar el ajustar el enfoque, adjust
length and LED tilt enfoque, la distancia focal, y gender/agreement for
controls (see distancia focal, y los controles de article; rearrange word
below). el LED inclinación de los order.
inclinación LED (consulte la a
controles continuación).
(consulte la a
continuación).
EN-ES Click “Connect”, Haga clic en Haga clic en Adjust verb form; adjust
then reprogram “Conectar” y, a “Conectar” y, a number agreement (“el”
the 4 files <1>in continuación, continuación, -> “los”); adjust word
the order volver a vuelva a programar order; add missing
below,</1> programar el 4 los 4 archivos <1>en information (“switch on”
following the archivos <1>en el orden = “activarlo”).
instructions (switch el orden siguiente</1> ,
the TopDock off siguiente</1> , siguiendo las
and back on again): siguiendo las instrucciones
instrucciones (desactive el
(interruptor la interruptor la
unidad TopDock unidad TopDock y
desactive y vuelva a activarlo):
vuelva a):
EN-ES NOT to thresh No trille, más NO trille con más Adjust capitalization; add
harder, faster or rápido o más de fuerza, más rápido o missing information
more than you lo que resulta más de lo que (“harder” = “con más
need to. necesario. resulte necesario. fuerza”); adjust verb
form (“resulta” ->
“resulte”).
EN-ES Overthreshed Overthreshed Trigo Translate word left as
wheat trigo excesivamente source.
trillado
EN-ES Check for abnormal Comprobar si Compruebe si hay Add missing word
noise on the hay anormal en ruido anormal en (“ruido”); adjust verb
hydraulic los los componentes del form (“comprobar” ->
components. componentes sistema hidráulico. “compruebe”).
des sistema
hidráulico.

CONFIDENTIAL 72 SDL Certification | Post-Editing Certification


EN-FR On the other hand, D'un autre côté, D'un autre côté, si le Add missing word (“le”);
if the Exchange si le serveur serveur adjust word choice and
server is configured Exchange est Exchange est verb form (“réussir”);
to configuré pour configuré pour adjust word order
require autolock, exiger verrouilla exiger le (“automatique
the ge automatique, verrouillage fonction”); add missing
synchronisation wil la automatique, la information (“device”).
l not succeed if synchronisation synchronisation ne
the autolock featur ne sera sera pas effectuée si
e is disabled in the pas réussir si la fonction
user's device. le verrouillage de verrouillage
automatique fo automatique est dé
nction sactivée sur
est désactivé de l'appareil de
l'utilisateur. l'utilisateur.
EN-FR No license Aucune Aucune activation Add negation and
activation needed activation de la de la licence n’est auxiliary verb.
in this release. licence nécessaire dans
nécessaire dans cette version.
cette version.
EN-FR It is from these 3 Il est de ces 3 C’est en fonction de Adjust word order.
factors you must facteurs vous ces 3 facteurs ques
run the combine. devez utiliser la vous devez utiliser
moissonneuse- la moissonneuse-
batteuse. batteuse.
EN-FR If the release Si la régulation Si la fonction de Adjust word order;
control function is de délestage régulation de adjust gender
not selected, area fonction n’est délestage n’est pas agreement; adjust
<1>E</1> is not pas sélectionné, sélectionnée, zone formatting tag.
taken into account! <1>zone E</1> <1>E</1> n’est pas
n’est pas pris en pris en compte !
compte !
EN-FR Right and left Ailes droite et Les ailes droites et Add missing article
fenders are the gauche sont gauches sont les (“les”); adjust number
same. bien les mêmes. mêmes. agreement (“droites,”
“gauches”); delete extra
word (“bien”).
EN-FR Using the hardware À l’aide du A l’aide du matériel Adust accent in upper
removed earlier, matériel de de fixation déposé case; adjust verb form
attach the new fixation déposé plus tôt, fixez la (“fixer” -> “fixez”); adjust
center link. plus tôt, fixer la nouvelle terminology (“liaison” ->
nouvelle liaison articulation “articulation”).
centrale. centrale.
EN-IT Transport function Funzione die Viene eseguita la Adjust word order;
(front power lift trasporto funzione di adjust gender agreement
moves to upper (sollevatore trasporto (il (“eseguito” ->
limit position) is anteriore si sollevatore “eseguita”); add missing
performed. sposta nella anteriore si sposta articles (“la,” “il”).
posizione limite nella posizione
limite superiore)

CONFIDENTIAL 73 SDL Certification | Post-Editing Certification


superiore) viene
eseguito
EN-IT Selective Catalytic Sistema di Istruzioni di base Delete extra word
Reduction (SCR) riduzione sulla Riduzione (“sistema”); adjust word
basic instructions catalitica catalitica selettiva order; adjust lowercase.
selettiva (RCS) (RCS)
istruzioni di
base
EN-IT A server’s Un certificato Un certificato del Adjust verb form.
Certificate can be del server server può essere di
of two types: possono essere due tipi:
di due tipi:
EN-IT The on-screen La tastiera su La tastiera su Adjust gender
keypad can be used schermo può schermo può essere agreement (“varie” ->
for pasting the data essere utilizzata utilizzata per “vari”); delete extra
in various input per incollare i incollare i dati in words (“dati”); adjust
fields, for example dati in varie vari campi di word order.
in the input fields campi di immissione, ad
of the web pages. immissione dati, esempio nei campi
ad esempio nei di immissione delle
campi di pagine Web.
immissione dati
di Web pagine.
EN-NL After the engine Zodra de motor Zodra de motor Add preposition; delete
starts, check the start, start, controleert u extra word (“door”);
instruments to controleert de de instrumenten om adjust word order.
make sure the instrumenten er zeker van te zijn
indications are om er zeker van dat de aanwijzingen
correct. te zijn dat de die erop staan, juist
aanwijzingen die zijn.
erop staan door
zijn juist.
EN-NL Check that no tools Controleer of er Controleer of er Adjust terminology; add
or other items have geen wektuigen geen gerrdschap of missing verb.
been left on the of andere items andere voorwerpen
machine or in the hebben zijn op zijn achtergebleven
operator’s de machine of in op de machine of in
compartment. de cabine. de cabine.
EN-NL The message De melding REF. De melding REF. Adjust verb form.
REFERENCE CYCLE BEWEGING BEWEGING kan op
may appear in the kunnen op het het display
display. display verschijnen.
verschijnen.
EN-NL The accelerator Het rijpedaal Het rijpedaal kan Add missing preposition;
pedal can be used kan worden worden gebruikt om adjust punctuation;
to switch off the gebruikt de de machine uit te adjust word order.
truck depending on machine uit te schakelen,
which function schakelen afhankelijk van welk
type is afhankelijk van type functie is
programmed: deze functie geprogrammeerd:

CONFIDENTIAL 74 SDL Certification | Post-Editing Certification


type is
geprogrammeer
d:
EN-NL The plug De De Add missing auxiliary
connectors can steekkoppelinge steekkoppelingen verb; add pronoun and
become stiff and n kunnen stijf kunnen stijf worden missing verb.
dirt can enter the en vuil in het en er kan vuil in het
hydraulic system. hydraulisch hydraulisch systeem
systeem binnendringen.
kunnen.
EN-PT-BR The windows are As janelas se As janelas se Adjust grammar (“para
primarily intended destinam destinam os” -> “aos”); adjust
for developers to principalmente principalmente aos terminology (“criadores”
test the para os programadores -> “programadores”);
compatibility of criadores para para testar a adjust word order and
their software with testar a compatibilidade do product name.
the Virtual compatibilidade seu software com o
Terminal do seu software aplicativo Virtual
application. com o Terminal Terminal.
virtual
aplicação.
EN-PT-BR This chapter helps Este capítulo Este capítulo o Add pronoun, delete
you quickly begin ajuda a verificar ajuda a começar a additional word
using the analyzer. rapidamente usar rapidamente o (“verificar”), change
começar a usar analisador. word order.
o analisador.
EN-PT-BR User Tests are tests Os testes de Os testes de usuário Adjust grammar, add
that you create to usuário são são testes que você pronoun.
test specific testes que é cria para testar as
functionality of criar para testar funcionalidades
your network. funcionalidades específicas de sua
específicas da rede.
rede.
EN-PT-BR Your search could Sua pesquisa Sua pesquisa Correct gender.
also be as specific também pode também pode ser
as comparing a ser tão tão específica como
single object in one específico como comparar um único
database with a comparar um objeto em um
single object in único objeto em banco de dados com
another database. um banco de um único objeto em
dados com um outro banco de
único objeto em dados.
outro banco de
dados.
EN-PT-BR In order to avoid Para evitar Para evitar Change “substituído” for
mutual influence, influência influência mútua, “alterado”, although that
only one mútua, somente somente um could be considered a
parameter should um parâmetro parâmetro deve ser preferential change and
be changed at a deve ser alterado por vez not necessary. Correct
time as a remedy. substituído a como solução.

CONFIDENTIAL 75 SDL Certification | Post-Editing Certification


um tempo em translation of “at a
que uma time…”
solução.
EN-ZH- Keep the pinion 防止小齿轮轴 防止小齿轮轴转动 Redundant translation (
CN shaft from turning 转动,然后转 ,并转动飞轮三圈 旋转) is deleted, and
and turn the 动飞轮旋转三 。 logic of the sentence is
flywheel three 圈。 improved.
revolutions.
EN-ZH- If the gap is 如果间隙正确 如果间隙正确,继 Translation of phrase
CN correct, continue ,继续在与其 续进行其他的步骤 “continue on with” is
on with the 它的程序。 。 corrected, and
remainder of the translation of term
procedure. “procedure” is modified.
EN-ZH- Use the optional 使用可选的右 使用可选的直角直 Translation of term “right
CN right angled DC 角的直流电源 流电源线。 angled” is corrected, and
power cable. 电缆。 translation of term
“power cable” is
modified.
EN-ZH- The card holder is 卡架位于 vc 70 卡架位于 VC70 右 Translation of model
CN located on the right 右侧的服务门 侧的维修盖下。 number “VC70” and term
side of the VC70 下。 “Service door” are
under the Service modified.
door.
EN-ZH- To adjust keyboard 调整键盘托盘 要调整键盘托架的 Logic of the sentence and
CN tray position, 的位置,松动 位置,请将左/右锁 translation of terms are
loosen the right 的左向和右锁 定旋钮松开两整圈 modified.
and left locking 定旋钮装满两 ,然后将键盘托架
knobs two full 转和键盘托盘 旋转至所需位置。
turns and rotate 旋转至所需位
the keyboard tray
置。
to the desired
position.
EN-ZH- Click Mail in 单击左侧导航 在左侧“导航窗格” Modifications are made
CN Navigation Pane on 窗格”中的“邮 中单击“邮件”,然 to the single-byte comma
the left, and click 件”,然后单击“ 后单击“新建电子邮 and inconsistent
New E-mail. 新建电子邮件” 件”。 treatment of UI
。 elements.
DE-EN Mit der 1 Meter With the 1 With the 1-metre Adjust spelling (American
langen meter long long table-top English -> British English);
Tischantenne Tischantenne antenna you can translate word left
können Sie Ihren you can significantly untranslated
WLAN-Empfang significantly optimise your WLAN (“Tischantenne”).
deutlich optimize your reception.
optimieren. WLAN-
reception.
DE-EN Drücken des Press the foot Press the foot Adjust verb form; delete
Fußschalters switch activates switch to activate extra article.
aktiviert einen a low pressure low pressure in the
geringen Druck in in the pneumatic
den cylinders.

CONFIDENTIAL 76 SDL Certification | Post-Editing Certification


Pneumatikzylinder pneumatic
n. cylinders.
DE-EN Hauptspindel nicht Do not operate Do not operate the Adjust word order.
in Betrieb nehmen, main spindle if main spindle unless
wenn die the supply with the coolant supply
Versorgung mit coolant is not is ensured.
Kühlmittel nicht ensured.
gewährleistet ist.
FR-EN Élaborer une Develop an Prepare an Adjust verb meaning,
introduction à introduction to introduction using adjust verb form, adjust
l’aide des éléments using the key key elements to word order, add in
clés d’une entrée elements of an provide an effective necessary text.
en matière efficace entry in the overview of the
effective subject area
FR-EN Voici quelques Here are some Here are some Adjust meaning; delete
techniques techniques that techniques to unnecessary text; adjust
pouvant vous aider can help you to facilitate learning, word order.
à faciliter facilitate based on the
l’apprentissage learning, based different learning
selon les styles on the styles of styles
d’apprentissages programming
FR-EN Passez le matériel Skip the Review the Adjust meaning.
en revue, lisez à hardware material, read
voix haute, review, read aloud, record
enregistrez-vous aloud, save-you yourself

CONFIDENTIAL 77 SDL Certification | Post-Editing Certification


SDL (LSE: SDL) is the global innovator in language translation
technology, services and content management. For more than 20
years, SDL has transformed business results by enabling nuanced
digital experiences with customers across the globe so they can create
personalized connections anywhere and on any device. Are you in the
know? Find out more at SDL.com.

Copyright © 2017 SDL plc. All Rights Reserved. All company product or service
names referenced herein are properties of their respective owners.

CONFIDENTIAL 78 SDL Certification | Post-Editing Certification

You might also like