You are on page 1of 8

Edited version to remove similarity to the original

On another level of assessment practice, a recent Dynamic Quality framework (DQF)


introduces more practical methods particularly intended for evaluating translations, yet in a
different manner from MQM. Despite the fact that both MQM and DQF share a functional
perspective aiming “to select the most appropriate translation quality evaluation model(s) and
metrics depending on specific quality requirements” (Görög, 2014, p.157), yet they differ in their
design and features. In contrast to MQM which harmonized the current metrics and compiled
them into one assessment pile, DQF developers contacted translation providers and clients to
ask them for their quality measuring practices and needs. Such feedback has extensively been
used to help the user of DQF platform select a more flexible error typology based on the actual
needs and evaluation methodologies of both the supplier and user of translation (Lommel, 2018,
p.123). In this sense, DQF can be judged to be a more practical and market-oriented tool for
measuring translation. Unlike MQM, DQF also offers a room for both evaluating and profiling
the translated texts submitted by volunteers from academic institutions and translation agencies
(Görög, 2014, p.155). Görög praises DQF for transcending the traditional stagnant metrics that
totally disregard “variables such as content type, communicative function (purpose), end user
requirements (audience) [and] context” of assessment (p.156) to offer multi-functions for the
assessor. Accordingly, the rich database of DQF constitutes an informative source for the
assessment tools and the assessed translations that could be used as a user- friendly teaching aid
for training evaluators on quantitative metrics.

Actually, the newly evolving DQF possesses some distinctive features that contribute to
its value. It provides assessment practices for the user through a collection of both human and
machine evaluation tools, templates, guidelines , evaluated samples and evaluators’ reports in
the “Content profile feature” (Görög, 2014,p.157-158) of the tool. Another main feature of DQF
is the “benchmarking and reporting function” (p.158) in which the evaluation task of a translated
text is performed by different assessors through a score card then the scores and performance
rate of assessors are compiled in a report. Such report reflects the frequent quality problems and
errors encountered in some language pairs and text types. Meanwhile, through the “Translation
ranking and comparison” feature (p.159), evaluators compare and rank three translations of the
same segment as per the quality of the output whether machine or human translation. Assessors
can decide which version provides a better translation for specific language pairs or text types
and identify the common errors in the translation thereof. Moreover, the “ productivity testing”
feature of DQF proves to be important as it tests the post-editing activity required to improve the
evaluated translation in terms of effort or time(p.160) and thereupon marks the quality of
translation with the tag light or heavy post-editing .

The merits of DQF is not limited to mapping the translated text type to the relevant
assessment tool but also extends to standardizing assessment criteria and error typologies in a
free online feature available to both practitioners and academic researchers. Lommel (2018,
p.123) highlights six major error types related to “Accuracy” in the transfer of meaning from ST,
“Language” errors in the TL language, “Terminology” or specialist vocabulary, “Style” that is
the in-house transfer strategies of the translation agency, “Country standards” for local specific
culture-bound references and finally to “Lay out” or non textual elements of formatting and
spacing. It is noteworthy to pinpoint the fact that DQF has maintained the MQM four severity
levels of errors: critical, major, minor and null (ibid) as well as the calculation method of the
overall weight of errors to produce a final quality score. As a result, a shared language and
understanding is ensured among translation providers, clients and evaluators in the professional
context. Likewise, the DQF’s error typology feature offers a remarkable standardized way to
categorize, count and measure translation errors via the tools used in the industry. In addition,
Görög (2014, p.158) assures that DQF error types mirror the educational assessment criteria such
as accuracy, terminology, cultural communication and style used in most academic institutions.
Accordingly, DQF can facilitate the collaboration between industry and academia to develop
more reliable common criteria catering for both fields and hence, breach the gap between theory
and practice.

This trend of harmonization is further implemented through the integration of both DQF
and MQM metrics into a shared metric unifying the definitions and specifications of both tools.
Despite the success of DQF, it was criticized for the time constraint and the subjectivity in
differentiating the translator’s preferences from errors (Görög, 2014, p.162).To avoid such
drawbacks especially the time factor, DQF and MQM were integrated in one evaluation metric.
According to Lommel (2018, p.124), the main assessment domains of DQF are adopted as
common shared evaluation criteria such as accuracy, fluency, local conventions whereas the
underlying subcategories of issue types and errors are borrowed from MQM. For example, the
errors of punctuation, spelling and grammar are selected from MQM issue types and listed under
the DQF domain of Fluency. This resulted in simplifying the unified metric and minimizing the
number of error types “to less than one third the size of the full MQM hierarchy”
(p.125).Besides, the new assessment metric is user-friendly as it allows raters to pick the major
top evaluation criteria unless there is a necessity to choose more detailed subcategories of issue
types in the evaluation.

Such merits put the new integrated MQM/DQF metric to practical use in the translation
industry and hence, paved the way for an evolving trend of harmonizing different TQA
approaches to set universal evaluation parameters. From 2016, the new metric has been
embraced by some well- known Computer Assisted Translation Tools as Trados, online search
engines such as Mozilla, and commercial websites as e-bay (Lommel, 2018, p.126). Aiming at a
sound and comprehensive evaluation of their translations before publishing them, those bodies
endorsed the new integrated metric. This attempt fully voices the recent developments in
Translation Studies gauging towards reducing the gap among TQA approaches to achieve
integration through a mixed eclectic evaluation approach combining qualitative and quantitative
elements. In this sense, theory would support translation practice and solve the problems of
defining and measuring translation quality in professional translation.
The main target of DQF is to help evaluators perform translation comparisons, judge their
accuracy and fluency, measure the post editing time and effort needed to fix mistakes and
accordingly assess translated samples based on an error typology suitable to the text type and
communicative context of the translation. In this sense an, the best assessment method is selected
by DQF user from a variety of ample methods.
Retrieved from https://www.taus.net/think-tank/articles/event-articles/dynamic-quality-and-datafication

Dynamic Quality and Datafication


Nick Lambson
10 Nov 2015
As the needs of content producers have evolved, the need for a more flexible understanding of
what constitutes quality has become more apparent. Take for example an agile software
development project, with localization drops spread throughout the production cycle. The most
relevant set of parameters to measure might be translation adequacy, technical functionality (no
errors in the code), and on-time delivery (to support the agile development cycle). On the other
hand, consider the example of a creative text accompanying a marketing campaign. The most
relevant set of parameters here would likely be different from those of the software project. They
might include suitability for the target locale, fluency, style, or creativity – all of these to
maximize the impact of the message on the target.
“What gets measured gets managed.”
Peter Drucker

In light of the statement by Peter Drucker, could we say that what gets measured in the same way
gets managed in the same way? Suppose there existed a one-size-fits-all quality evaluation
metric that measured quality across a range of 100+ parameters. If a content producer were to
apply this single metric to their entire spectrum of content types, they would be wasting
resources on measuring parameters that are unnecessary to certain types of content. Inevitably,
whenever a single metric is applied across differing content types, resources are wasted.
Thus, we accept the fact that different companies will use different metrics to evaluate different
content types in different situations. At the same time, parameters between differing metrics may
overlap. Industry buyers and providers would benefit greatly from comparing these overlapping
metrics in an apples-to-apples fashion. So how do we ensure that when comparing overlapping
parameters across metrics, that the comparison is between apples and apples instead of between
apples and oranges?
Two industry standard frameworks that provide the environment for such quality evaluation
comparison were recently harmonized – the Multidimensional Quality Metrics (MQM), and the
TAUS Dynamic Quality Framework (DQF). DQF’s analytic method and the MQM hierarchy of
translation quality issues have both been modified to share the same basic structure. DQF will
use a subset of the full MQM hierarchy based on the experience of TAUS members, while MQM
will continue to maintain a broader set of issue types designed to capture and describe the full
range of quality assessment metrics currently in use .
1

At the TAUS 2015 Annual Conference in Silicon Valley and at the TAUS 2015 QE Summit at
eBay, several aspects of dynamic quality were discussed, including QE of non-conventional
content, treating translation quality metrics as business intelligence, and managing different
levels of quality. Here is a snapshot of how several companies and organizations are dealing with
dynamic quality in their globalization efforts.
Over the past two years, the Church of Jesus Christ of Latter-day Saints (LDS) has been
implementing standard quality evaluation systems across 70 of its regular 90 language pairs. In
fact, the church is the first organization to implement the TAUS DQF inside of an instance of
SDL WorldServer. Quality evaluation is focused on measuring three types of errors: content
errors, editorial errors, and language errors.
The LDS Church has also set up a tiered system of categorizing content of varying levels of
sensitivity. The highest level would be reserved for scripture, whereas the lowest level might be
reserved for routine publications of a non-doctrinal nature. Between these two extremes,
different production methods and quality evaluation strategies are used.
Autodesk also categorizes its content, and adjusts the weight of two evaluation parameters based
on that categorization. The two parameters are “concrete errors,” which are based on an error
typology, and “global language attributes,” which are based on fluency, suitability, adequacy,
style, and creativity. The final quality score is calculated by weighting the two parameters
differently based on whether the content is software documentation or marketing material.
Autodesk is currently collecting data on content type, word count, raw MT usefulness, review
score, ratings per criteria, and completeness. The company would like to see the following data
measured in the future: timestamps, levels of TM/MT quality, and frequency of changes.
At EMC, the focus on data collection is on viewing a high-level aggregate of data through a
dashboard. Behind the scenes, scoring is done in a feedback form that is sent to a database. That
database allows decision makers to run statistics and metrics. Although the company didn’t
discuss specifically which aspects of translation quality were being measured, the company
stated that “there is a lot of data going into aggregate quality.”
The main goal of EMC’s high-level approach to quality evaluation is to be able to see trends in
quality from quarter to quarter, while keeping an eye on languages that are falling behind. If
there are any concerns, decision makers can take a deeper look into what exactly is causing the
aggregate score in a certain language pair of a certain vendor to fall. That information can then
be a starting point for discussion during quarterly vendor reviews (QVRs).
Reality Squared Games, a video game company that focuses on localizing Chinese games for the
Western market, represents the perspective of QE for unconventional content. For their users, it’s
all about the experience. Shaun Newcomer, the company’s co-founder, remarked, “honestly, our
players don’t really read the quest text. They just want to get out and slay some monsters.” For
that reason, the translation is allowed to be more flexible and creative, as long as it conveys the
essence of the message and contributes to the overall enjoyability of the gaming experience.
For game developer Zynga, the focus is the same. Quality is critical, but player experience
counts the most. When it comes to player experience, Zynga narrows in on several factors such
as platform support, load time, performance, payment methods, pricing, geo-targeted features,
and the list goes on. When it comes to linguistic evaluation, tone and style is evaluated by how
engaging, cool, and fun it is. Terminology must be consistent throughout the game and the
grammar must be coherent.
In general, for Zynga, quality means entertainment, freshness, fun factor, and new content daily.
Quality is evaluated using a tiered system, measuring defects as a percent of the total, with
consideration to severity. The challenge remains in selecting what to log or fix versus what not to
log or fix.
Netflix encountered an interesting challenge localizing the title of the Netflix original
series, Orange is the New Black. The title makes sense only to audiences who associate orange
outfits with the prison system. After all, not every country’s inmates wear orange. This challenge
was not unique, as Netflix localizes thousands of such movie titles.
Netflix relies on unconventional metrics to measure quality of unconventional content. For
example, Netflix users in the target locale discussing a movie on social media by referencing the
English title instead of the localized title could be an indicator of unsuccessful localization. Thus,
social impact can be measured as a function of localization quality.
At LinkedIn, unconventional content includes search queries, support chat sessions, user-to-user
and user-to-company communication, social media, and user generated content. Evaluating this
content using traditional QE methods doesn’t drive the user toward the desired outcome of a
better user experience. With this in mind, LinkedIn uses the definition of quality proposed by Dr.
Alan Melby:
“Translation is the creation of target content that corresponds to source content according to agreed-upon
specifications.”

Mike Dillinger of LinkedIn gave an analogy: asserting that translation quality is a function of
“correctness” or adequacy and fluency is akin to saying that the quality of a gourmet meal is a
function of salt and pepper. Just as there is much more to a quality meal than salt and pepper,
there is much more to translation quality than just “correctness.”
With this broad definition of quality, companies face the challenge of how to measure the impact
of translation quality on broader aspects like user engagement, ROI, and site traffic. These
parameters are of course measureable, but how does a company attribute decreased page clicks
to a defective localization? How much of these parameters can be attributed to localization
versus the marketing effort overall? Companies will likely need to address this issue internally.
Companies face a second challenge. Once the data is in and has been analyzed, how can that data
be compared to industry benchmarks? How is a company to know if the metrics they have used
find concurrence with other metrics in the industry? And where would the data for such industry
benchmarks come from? Several companies at the TAUS Annual Conference indicated their
willingness to contribute their QE data to TAUS for benchmarks, under the condition that it
remain anonymous and aggregated as part of a whole. This challenge is not one of language or
technology, but of organizational policy.
The TAUS DQF is the framework for quality evaluation that includes more parameters than just
“salt and pepper.” It was created to meet the need for a flexible framework that can adapt to the
dynamically evolving nature of quality in today’s world of tiered content with varying
specifications. Now, with the recent harmonization between DQF and MQM, users have
complete flexibility to evaluate quality as they see fit.
The TAUS Quality Dashboard is the tool that allows users to connect their translation tools and
workflow systems of choice with the TAUS DQF. Through the Quality Dashboard, users can
collect data on post-editing productivity, adequacy and fluency of target sentences, and also
count errors based on error-typology. Users are given suggestions on the most appropriate
quality evaluation model for specific requirements through the DQF Content Profiling feature.
The upcoming year’s development roadmap for the Quality Dashboard and DQF includes:

 End of October 2015: QD productivity


 End of December 2015: QD productivity + quality
 Q1 2016: Sampling
 Q2 2016: Automatic Evaluation

TAUS Quality Dashboard and DQF users are encouraged to join the TAUS DQF User Group on
LinkedIn (open group) as well as the subgroups dedicated to specific tools with DQF integration
such as Memsource, SDL TMS, and XTM.

1. Van der Meer, Anne-Maj. DQF and MQM Harmonized to Create an Industry-Wide Quality Standard. https://www.taus.net/think-
tank/news/press-release/dqf-and-mqm-harmonized-to-create-an-industry-wide-quality-standard

TAUS Conference Summit Data Management DQF Quality Localization

You might also like