Professional Documents
Culture Documents
Roland Schock
ARS Computer und Consulting GmbH
Session code: IBM Data Tech Summit, Toronto
24.09.2019 Db2 for Distributed
1
IDUG & IBM Data Tech Summit
Toronto, Canada | September 23 – 24, 2019
Agenda
Character Sets
A, B, C, ... ᇹぁゆ ㌹ ㌺
agpx
A b c d ㍻㋿亹怔떟떥
3
IDUG & IBM Data Tech Summit
Toronto, Canada | September 23 – 24, 2019
Character Encoding
4
IDUG & IBM Data Tech Summit
Toronto, Canada | September 23 – 24, 2019
ASCII
6
IDUG & IBM Data Tech Summit
Toronto, Canada | September 23 – 24, 2019
Double Byte Char Sets (DBCS) and EUC (Extended Unix Code)
• DBCS
• Expansion of SBCS from one byte to two bytes per character
• Mainly for asiatic languages with more than 256 characters to encode
• Latin text is expanded to twice the size of SBCS
• Codepoints < 256 insert a zero byte in the stream
• EUC
• Multi Byte Char Set (MBCS): 2 or 4 bytes/char
• Used for Japanese, Korean, Traditional and Simplified Chinese on Unix platforms
• Uses single shift characters to switch to a another code group to build a multi
byte character
7
IDUG & IBM Data Tech Summit
Toronto, Canada | September 23 – 24, 2019
8
IDUG & IBM Data Tech Summit
Toronto, Canada | September 23 – 24, 2019
Unicode Encodings
9
IDUG & IBM Data Tech Summit
Toronto, Canada | September 23 – 24, 2019
UTF-8
10
IDUG & IBM Data Tech Summit
Toronto, Canada | September 23 – 24, 2019
• Pitfalls
• Truncation
• CODEUNITS32
• Making an educated guess
• Selecting a collation sequence
• IDENTITY
• SYSTEM and SYSTEM_xxx
• UCA collations
• Making Db2 case-insensitive
11
IDUG & IBM Data Tech Summit
Toronto, Canada | September 23 – 24, 2019
• Trivial for Latin text as plain english aka ASCII text will not expand
• Using non-english chars get expanded in UTF-8 to multiple bytes
eating up a potential buffer in string definition
• By default Db2 uses bytes and not characters in definitions
CHAR(20) is actually 20 bytes long and not 20 characters!
• Example:
The German word for apples 'Äpfel' gets stored in 6 bytes
db2 "select 'Äpfel' as text, hex('Äpfel') as hex from sysibm.sysdummy1"
TEXT HEX
Äpfel C3847066656C
12
IDUG & IBM Data Tech Summit
Toronto, Canada | September 23 – 24, 2019
• Without further specification Db2 uses Bytes aka OCTETS for all sizes
• With the global variable NLS_STRING_UNITS or the DB CFG value
STRING_UNITS the default number of bytes per CHAR can be changed
• CODEUNITS16 specify 2 bytes or CODEUNITS32 4 bytes per character
• With NLS_STRING_UNITS=CODEUNITS32 a CHAR(10) occupies 40
bytes in memory and can store 10 ASCII or 10 Umlaut characters
• But as the internal storage of a CHAR uses a length byte, the
maximum number of chars in string depend on (NLS_)STRING_UNITS
→ with CODEUNITS32 you can use max. CHAR(63)
13
IDUG & IBM Data Tech Summit
Toronto, Canada | September 23 – 24, 2019
14
IDUG & IBM Data Tech Summit
Toronto, Canada | September 23 – 24, 2019
• When creating your new database you also have to choose a collating
sequence which determines the sort order of strings
• By default binary IDENTITY is used. This is the fastest method.
• The collating sequences SYSTEM and SYSTEM_xx (xx=SBCS codepage)
use an 8-bit ordering scheme and binary order for the rest → still fast
• UCA collating sequences can accommodate the most complex
ordering schemes, but can cost performance during sorting/searching
• With UCA collating sequences you can make Db2 case-insensitive ;-)
15
IDUG & IBM Data Tech Summit
Toronto, Canada | September 23 – 24, 2019
• Requirements
• "Mudmap" for migration
• Tools
• Tripwires to avoid
• Functions in UTF-8 databases
16
IDUG & IBM Data Tech Summit
Toronto, Canada | September 23 – 24, 2019
Requirements
db2move
• db2move
• IBM Migration Toolkit (generates scripts, unfortunately discontinued)
• homegrown export/loads
• IBM Database Conversion Workbench (DCW)
• HPU*
• InfoSphere CDC*
22
IDUG & IBM Data Tech Summit
Toronto, Canada | September 23 – 24, 2019
23
IDUG & IBM Data Tech Summit
Toronto, Canada | September 23 – 24, 2019
24
IDUG & IBM Data Tech Summit
Toronto, Canada | September 23 – 24, 2019
25
IDUG & IBM Data Tech Summit
Toronto, Canada | September 23 – 24, 2019
26
IDUG & IBM Data Tech Summit
Toronto, Canada | September 23 – 24, 2019
27
IDUG & IBM Data Tech Summit
Toronto, Canada | September 23 – 24, 2019
Summary
30
IDUG & IBM Data Tech Summit
Toronto, Canada | September 23 – 24, 2019
Roland Schock
ARS Computer und Consulting GmbH
roland.schock@ars.de
Session code: A1
31
IDUG & IBM Data Tech Summit
Toronto, Canada | September 23 – 24, 2019
Appendix
32
IDUG & IBM Data Tech Summit
Toronto, Canada | September 23 – 24, 2019
33
IDUG & IBM Data Tech Summit
Toronto, Canada | September 23 – 24, 2019
• As default DB2 server and clients use the local settings of the
operating system or user:
• Windows: The server process is using the default region settings of the operating
system.
• Linux/Unix: The codepage is derived from the locale setting for the instance user
(i.e. the user running the database processes).
• Client (LUW): The current locale settings of the user determine the code page
used during CONNECT.
• Programming language: Java is always using Unicode when connecting to a
database via JDBC.
34
IDUG & IBM Data Tech Summit
Toronto, Canada | September 23 – 24, 2019
35
IDUG & IBM Data Tech Summit
Toronto, Canada | September 23 – 24, 2019
At prepare/bind time
36
IDUG & IBM Data Tech Summit
Toronto, Canada | September 23 – 24, 2019
37
IDUG & IBM Data Tech Summit
Toronto, Canada | September 23 – 24, 2019
Overview
38
IDUG & IBM Data Tech Summit
Toronto, Canada | September 23 – 24, 2019
39
IDUG & IBM Data Tech Summit
Toronto, Canada | September 23 – 24, 2019
Client Server
uses code page X uses code page Y
Other considerations
42
IDUG & IBM Data Tech Summit
Toronto, Canada | September 23 – 24, 2019
More considerations
43
IDUG & IBM Data Tech Summit
Toronto, Canada | September 23 – 24, 2019
More considerations
44
IDUG & IBM Data Tech Summit
Toronto, Canada | September 23 – 24, 2019
Overview
45
IDUG & IBM Data Tech Summit
Toronto, Canada | September 23 – 24, 2019
Troubleshooting
46
IDUG & IBM Data Tech Summit
Toronto, Canada | September 23 – 24, 2019
Pitfalls
47
IDUG & IBM Data Tech Summit
Toronto, Canada | September 23 – 24, 2019
db2set DB2CODEPAGE
• Know what you intend to do, if you use the DB2 environment variable
DB2CODEPAGE
• It tells DB2, that you will feed it with the right code points, regardless
of the displayed symbols.
48
IDUG & IBM Data Tech Summit
Toronto, Canada | September 23 – 24, 2019
db2set DB2CONSOLECP
50
IDUG & IBM Data Tech Summit
Toronto, Canada | September 23 – 24, 2019
Performance considerations
51
IDUG & IBM Data Tech Summit
Toronto, Canada | September 23 – 24, 2019
Links
• Unicode
http://www.unicode.org
52