You are on page 1of 27

Stakeholders in memoQ

Server Projects

A Quick Overview
The Scary Bit

Regular Expression
[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}

Matching Text
202ca4c2-749d-4f54-ae02-fdf19939ef10
What Are Regular Expressions?
• They are not a programming language
• Symbols that describe a text pattern
• Used to match, search and manipulate text
• A more powerful “Search and replace”
• Called “regex” for short
• There are several regex engines or “flavours”
• memoQ uses Microsoft .NET
How Long Does It Take
to Learn a New Language?

*http://www.effectivelanguagelearning.com/language-guide/language-difficulty
How Long Does It Take
to Learn Regex?

You can start creating your own basic expressions within a few minutes.
SIGH OF RELIEF
What Are They Used For?
• Search and match:
– Email addresses
– Urls
– Tags and placeholders
– Phone number formats
– Alternate spellings
– Consistency checks (e.g. lower case v. upper case)
– Trailing spaces
– Other repetitive text
Where in memoQ?
• Source and target filtering
• Find and replace
• Auto-translation rules
• Segmentation
• Filters:
– Regex Tagger
– Regex Text Filter
Search
Two Types of Regex Text
Literal characters Metacharacters
bomb
\ -
bomb . |
bomber * ()
A-bomb ? {}
The bomb went off. + $
Bombs off. [] ^
bomb
Metacharacters
. Any character - Separator in ranges
* Preceding item zero or | Either or
more times {} Bean counting
? Preceding item zero or ^ Start of segment //
one time Negate a character set
+ Preceding item one or $ End of segment
more times ( Begin group
[ Begin character set ) End group
] End character set
Character Sets
Will match any one of the characters in the set
but only once, unless otherwise specified by
bean counting {}
[a-z] Lower case Can be negated using ^
[A-Z] Upper case
[A-z] Any case [^0-9] Any character
[0-9] Digits except a digit
[0-9A-z] Digits + letters
\p{Ll} Lower + special letters Can be combined
\p{Lu} Upper + special letters
\p{L} Any case + special letters [0-9a-e ,]
Shorthand Character Sets
\d Digit
\w Digit OR letter
\s Whitespace
\b Boundary (Beginning OR end of word)
\t Tab
\r Line return
\n New line
\D Not a digit
\W Not a digit OR a letter
\S Not a whitespace
\tag memoQ tag
“Escaping” Metacharacters
If you need to match a \. \(
special character in the \? \)
text, you will have to \* \{
“escape” it, or mark it \+ \}
for its literal meaning. \[ \$
This is achieved by \] \^
putting a backslash in
front of it. \- \!
\| \\
Find and Replace
Replace expressions allow you choose which
parts of the text to replace and which parts to
keep as they are. This is achieved via groups ()

Search: (\d{1,3})\s{1,}[mM][gG]
Replace: $1 mg
Finds: 225 mG
Replaces with: 225 mg
Greedy v. Lazy
Dangers of Greediness
By default, regex expressions are greedy, so it is a good habit
to limit your expressions as much as possible to avoid
matching more text than you intend to.

Use the non-greedy marker ? after * and +.

Example:
pur.*\b will match
“All purées contains at least 10% of the main ingredient,
unless otherwise specified in the purée description.”
pur.*?\b will match
“All purées contains at least 10% of the main ingredient,
unless otherwise specified in the purée description.”
Auto-Translation: Practical Cases
To have memoQ display certain patterns of text as auto-translation
results, you can use expressions as the ones below. Insert them in the
rules section and use a replacement rule in the replace order section.
If you enclose the full match expression between brackets and use $1
for replacement, you will achieve an identical match in auto-
translation, but other types of manipulation are possible.

• Email addresses
(\w+([-+.']\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*)
• URLS
((https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?)
• Phone numbers
(\d{5}\s\d{6}) 01908 443300
(\d{5}-\d{6}) 01908-443300
(\+\d{2}\s\(0\)\s\d{4}\s\d{6} ) +44 (0) 1908 443300
Auto-Translation: Where in mQ?
Segmentation: Practical Case
SOURCE: “Manufactured in China (PRC) for the UK market.
Ingredients: Lemon Grass Purée (15%), Red Chilli Purée
(11%), Onion, Water, Coconut Milk, Red Pepper, Galangal
(5%), Sugar (Sulphites), Lime Juice From Concentrate
(Sulphites), Salt, Rapeseed Oil, Garlic Purée, Rice Wine
Vinegar (Sulphites), Lime Leaves (2.5%), Yeast Extract,
Chilli Flakes, Cornflour, Tamarind Paste, Coriander,
Cayenne Pepper, Paprika Extract.”

REQUIREMENT: Split ingredients after comma and also split


percentages and subingredients between barckets.
Segmentation: Practical Case
SOLUTION
Step 1: Duplicate the default segmentation rule
set. Add comma to memoQ’s #end# custom
list.
Step 2: Add a rule to split segment before opening
bracket if ending bracket is followed by a
comma, a space and an upper case letter. The
point where the segment will be split is
marked by #!#

[\s]+#!#\([\s]*[\p{L}0-9]*\.?\d*\s*%?\),\s+\p{Lu}
Segmentation: Practical Case
Regex Tagger: Practical Case
SOURCE: “Dear [%$FIRSTNAME%] [%$LASTNAME%], Your
online order placed on [%$WEBSITE%] on [%$DATE%] and
processed as the authorized vendor of [%$RANGE%]
products, has been successfully completed (order
number: [%$REFNO%]). Please note that [%if $ORDER !=
""%][%$ORDER%][%else%] [%$COMPANY%] will appear
on your bank statement, instead of [%$RANGE%].”

REQUIREMENT: Convert all placeholders into memoQ tags


to ensure their integrity on export. File format is txt.
Regex Tagger: Practical Case
SOLUTION
Create a cascading filter
(Plain text + Regex tagger)
and add the expression
below to tagger.
\[%.*?%\]
OR, if you want to be more
strict, add three rules as
below.
\[%[a-z]+?%\]
\[%\$[A-Z]+?%\]
\[%if .*\!\=.*?%\]
Resources
• Regex Pal
http://www.regexpal.com/
Online regex expression tester. It is fully compatible with memoQ but it is basic and does not include as
good learning tools as Regex 101. It allows you to save expressions, so it is a good place to save fully tested
and working regex.
• Regex 101
https://regex101.com/
Regex 101 does not support the .NET flavour but it is a good place to practise and learn basic regex for two
reasons: firstly, it explains expressions as you type them and, secondly, it includes a very useful reference
section with the most commonly used symbols. If you create expressions with it for memoQ, please test
them in Regex Pal as well to ensure compatibility.
• Using regular expressions in memoQ (Basic level), by Miklós Urbán
https://www.memoq.com/recorded-webinars
• “Do the magic: Regular Expressions in FrameMaker”, by Marek Pawelec
https://blogs.adobe.com/techcomm/2016/03/framemaker-regular-expressions.html
• memoQ Yahoo Group
https://groups.yahoo.com/neo/groups/
• Regex Hero
http://regexhero.net/reference/
• Regex Cheat Sheet
https://www.cheatography.com/davechild/cheat-sheets/regular-expressions/
Queries and Feedback

Please send any comments, questions or


feedback to:

angela.madrid@k-international.com

You might also like