This action might not be possible to undo. Are you sure you want to continue?
Sebastian Buhlinger SAP Consultant, HP-SAP EMEA CC
© 2004 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice
1. 2. 3. 4.
Introduction to Unicode Unicode & SAP in General Technology in Depth Sizing Information for Unicodebased SAP Systems
Introduction to Unicode
1. Introduction to Unicode
is text? • History of character encoding • Problem of character encoding • From ASCII to Unicode • What is Unicode exactly? • The Unicode Standard • Where is Unicode used? • The Unicode Consortium • Unicode Encodings
What is text?
Code pages & encodings describe the handling of and the way text is stored in
• Computers • Files • Data structures
Inside a computer program or data file, text is stored as a sequence of numbers – just like “everything else” A character is a:
• • • • • • Letter, Digit, Period, Hyphen, Punctuation or Math symbol
Furthermore there are control characters – typically not visible
History of Character Encoding
computers were pretty slow, had fairly little memory and were very expensive • Up to 1960s I/O meant pushing holes into paper tapes • Most of the character sets date back to punch-card age and are designed with these cards in mind • In the early days of computers every hardware manufacturer used proprietary technology (and encodings) • International data interchange was no issue and so nothing needed to fit together
the computer uses the character code as a basis for pulling the character shape of ‘A’ from a font file listing with the same binary number. “ä” à Ä) • All data encoded in the form of binary numerical codes 3/31/2004 7 .Problem of character encoding • Which number is assigned to which character? • When typing an ‘A’ on the keyboard. and displays or prints it • The character ‘A’ may also have different integer values in different programs or data files (‘A’ might be ‘•’ in an Arabic font file) • In some instances no number available for certain characters (f.i.
000 letters • Hundreds of other characters in common use.000 syllables dictionaries: ~ 50.Character repertoire • English alphabet: with some digits and little more: ~ 60 characters • Western • Korean: • Chinese European Standard: ~ 300 characters for several languages ~12. such as math and currency symbols 8 3/31/2004 .
From ASCII to Unicode • Most character sets and encodings in 70s/80s were modifications or extensions of ASCII of them used 8-bit with a subset of the 94 used ASCII characters common encodings nowadays use single byte per character (SBCS) are all limited to 256 characters to that. none of them can even cover the letters for the Western European languages 9 • Many • Most • They • Due 3/31/2004 .
symbols and control characters 10 • Solution • Unicode 3/31/2004 .294.296 different characters.From ASCII to Unicode • Consequence: many different 8-bit encodings were created to fulfill the needs of different user communities for data interchange in global networked information society and collaborative business world: single character set for all languages in use can encode 4.967.
regardless of platform. program or programming language used 3/31/2004 11 .What is Unicode exactly? • Unicode • Unicode = universally encoded character set to store information from any language defines • properties for each character • standardizes script behavior • provides a standard algorithm for bi directional text • defines cross-mappings for other standards • Unicode defines a unique code value for every character.
g. Hangul) • Scripts • In • In 3/31/2004 12 .g.What is Unicode exactly? • The Unicode standard primarily encodes scripts rather than languages comprise several languages that historically share the same set of symbols many cases a script may serve to write dozens of languages (e. the Latin script) other cases one script complies to one language (e.
000 characters. ideograph sets. dingbats etc. the Unicode Standard comprises >95. musical symbols. diacritics. technical symbols. mathematical symbols.0) • In 3/31/2004 13 . arrows. all.What is Unicode exactly? • Additionally it also includes punctuation marks. symbols (version 4.
The Unicode Standard • The Unicode Standard is a character coding system designed to support the worldwide •interchange. •and display of written text of the diverse languages and technical disciplines of the modern world • In addition. •processing. it supports classical and historical texts of many written languages 14 3/31/2004 .
•extend. international agencies. database vendors. •and promote the use of the Unicode Standard • Members of the Consortium include major computer corporations. various user groups.The Unicode Consortium • The Unicode Consortium is a non-profit organization originally founded to •develop. software producers. and interested individuals 16 3/31/2004 . research institutions.
The Unicode Consortium • The Consortium cooperates with •W3C and •ISO •and has liaison status "C" with ISO/IEC/ JTC 1/SC2/WG2. which is responsible for in refining the specification and expanding the character set of ISO/IEC 10646 3/31/2004 17 .
Unicode Encodings • UTF = Unicode Transformation Format • UCS = Universal Character Set • CESU = Compatibility Encoding Scheme • Conversion between different encodings is a simple. bit-wise operation (defined in standard) • No performance excessive conversion table necessary! 3/31/2004 18 .
Unicode Encodings • UTF-8: Unicode Transformation based on 8bit representation Compatibility Encoding Scheme of UTF-16 on an 8-bit base Unicode Transformation based on 16-bit representation • CESU-8: • UTF-16: 3/31/2004 19 .
Unicode Encodings • UCS-2: Universal Character Set 2 byte variation (16-bit) Unicode Transformation based on 32-bit representation Universal Character Set 4 byte variation (32 bit) • UTF-32: • UCS-4: 3/31/2004 20 .
Unicode Encodings • Not all Unicode characters are 2 bytes long ’ no doubling of hw requirements in the first place encoding determines the length of a character in one Unicode encoding can be longer than 1 byte. therefore Unicode characters can be longer than characters defined in a standard code page 21 • Unicode • Character 3/31/2004 .
UTF-8 • UTF-8 • It’s is the 8-bit encoding of Unicode a variable-width encoding and also a strict superset of 7-bit ASCII superset” means that every character in 7-bit ASCII is available in UTF-8 with the same corresponding code point value character = 1byte – 4 bytes in the encoding from European scripts: either 1or 2 • “Strict •1 • Characters bytes • Asian 3/31/2004 scripts: 3 or 4 bytes 22 .
data conversion effort between ASCII based character sets and UTF-8 is reduced significantly 23 3/31/2004 . HTML and most Internet Browsers benefits of UTF-8: •compact storage requirements for European scripts •in general European scripts will occupy less storage on disk and memory •ease of migration –> since 7-bit ASCII data remains the same in UTF-8.UTF-8 • UTF-8 • Main used for UNIX-platforms.
UTF-8 / CESU-8 (8-bit encodings) • 8-bit encodings are well-suited for data transfer since all 7-bit ASCII and 8-bit ISO characters retain the same code points communication with legacy and nonUnicode systems variable character length 24 • Easier • Downside: 3/31/2004 .
65.536) 3/31/2004 25 • .0 Main benefits of UCS-2: • More compact storage requirements for Asian scripts (each character represented with 2 bytes only) • String processing will be faster because all characters are of the same width • Good compatibility with Java and Microsoft clients Downside: • UCS-2 can support Unicode characters defined up to Unicode 3.UCS-2 • • • UCS-2 has a fixed width of 16 bit (2 bytes) UCS-2 is the Unicode encoding for Java & Win NT 4.0 only (max.
UTF-16 • UTF-16 is the 16-bit encoding of Unicode an extension of UCS-2 • Basically • One Unicode character can be 2 or 4 bytes in the encoding from European and most Asian scripts are represented in 2 bytes characters are represented in 4 bytes • Characters • Supplementary • UTF-16 3/31/2004 is the main Unicode encoding from Windows 2K 26 .
UTF-16 • Main benefits of UTF-16: •More compact storage requirements for Asian scripts (2 bytes for commonly used characters) •Ideal if European and Asian scripts are used together --> UTF-16 will occupy less storage on disk and memory than with UTF-8 (3 bytes for Asian part) •Balance of efficient access to characters and economical use of storage • Above 3/31/2004 mentioned points reason for use of UTF-16 in SAP Web Application Server 27 .
but offer quasi fixed character length has a fixed character length.UCS-2 / UTF-16 (16-bit encodings) • 16-bit encodings offer a compromise between the pros and cons of the 8-bit and the 32-bit encodings.636) characters 28 • They • UCS-2 3/31/2004 . respectively do not need as much memory as 32-bit encodings. but it cannot define more than 2^16 (65.
UTF-32 • 32-Bit encoding when memory space is no concern • Popular • Fixed width (4Byte) 3/31/2004 29 .
UCS-4 / UTF-32 (32-bit encodings) • All 32-bit encodings have a fixed length • This advantage is outweighed by the extensive memory & storage requirements 3/31/2004 30 .
Example #1 Character UTF-8 UCS-2 UTF-16 A c Æ Ö 41 63 C3 86 C3 B6 DA 64 E4 BA 75 F0 9D 84 9E 0041 0063 00C6 00F6 0664 9875 N/A 0041 0063 00C6 00F6 0664 9875 D834 DD1E 31 • • • 3/31/2004 .
Example #2 – character “•” U+AC00 UTF8 HEX BIN E 1110 A 1010 B 1011 0 0000 8 1000 0 0000 Lead Byte Indicator Remove lead bytes 1110 1010 1010 Regroup bits 1010 Trailing Byte Indicator 1011 11 0000 0000 0000 1000 00 0000 0000 0000 1100 UTF16 BIN HEX 1010 A 1100 C 0000 0 0000 0 32 3/31/2004 .
Unicode & SAP in General 3/31/2004 33 .
Unicode & SAP in General • Languages and characters • Characters on Disk/Memory • Code Pages • SAP & Code Pages • Language Combinations before Unicode • Recommendations from SAP (w/o Unicode) • Unicode-compliant SAP products • When/why do customers need Unicode? 3/31/2004 34 .2.
Language and characters • Languages • Only •A are written in fonts a few languages use the same fonts font is a group of characters 3/31/2004 35 .
Characters on Disk/Memory •A •a character is stored as a byte sequence on disk code page defines the mapping between the byte sequence and a character Characters on Disk/Memory 3/31/2004 36 .
Code Pages • The code page determine what character you can see and enter Characters on Disk/Memory 3/31/2004 37 .
Code Pages • different code pages map different characters to the same byte sequence Single Byte Double Byte Characters on Disk/Memory 3/31/2004 38 .
SAP & Code Pages 3/31/2004 39 .
Language Combinations before Unicode • Single Standard Code Pages • supports specific sets of languages • the number and combination of languages that are supported cannot be altered • Standard code pages and R/3 languages (w/o EBCDIC) Double-Byte Code Pages 3/31/2004 40 .
Language Combinations before Unicode • It is also possible to specify a customerspecific language. see Note 0112065 3/31/2004 41 . this language must use one of the code pages that SAP supports.
3. regardless of their log-in language 3/31/2004 42 .1D) • SAP proprietary code pages that contain characters from one or more standard code pages • increases the combinations of languages that can be used • functionally. a Blended Code Page system uses a single code page • a Blended Code Page is a single code page system • users can see and enter all characters contained in the code page.Language Combinations before Unicode • Blended Code Pages (≥ Rel.
Language Combinations before Unicode SAP Code Page Supported Languages 3/31/2004 43 .
Language Combinations before Unicode • the availability of SAP blended code pages is platform dependent. because SAP blended locales need to be created for each platform Blended Locale Status (x = available −− = not available) • 3/31/2004 44 .
3. a blended code page cannot support the combination of languages you need for a new installation. For example. one or more additional code pages are required to add languages to your existing installation 2.Language Combinations before Unicode • MDMP (≥ Rel. an MDMP system with the code pages 1100 and 8000.1I) • • • • Multi-Display / Multi-Processing allows dynamic code page switching on the application server therefore permits any combination of standard code pages on one system the log-on language determines the code page that is active for each user an MDMP system is recommended if: 1. allows German and Japanese users to log onto the same R/3 system in their respective languages 3/31/2004 45 .
and all German characters in the database will not be correctly displayed 46 Germany 3/31/2004 .SJIS Front End DB Application Server 1100 – ISO-1 Japan • Each user can only access one code page at a time: a user who logs in as a Japanese user cannot enter German characters.Language Combinations before Unicode Example 8000 .
Language Combinations before Unicode Example Japanese User German User 3/31/2004 47 .
i. Japanese.Language Combinations before Unicode Please Note: • It is possible for a user to log on with German and then manipulate the character set and font settings so that he can enter what appear to be Japanese characters. these characters will not be correctly stored in the database and this data will be corrupt a user wants to enter f. he/she must log on in Japanese • If 3/31/2004 48 .
Language Combinations before Unicode Please Note: • To insure that no data corruption occurs. the following restrictions must be followed: •Global data must contain only 7-bit ASCII characters. which are in all code pages •Users may use only the characters of their log-in language or 7-bit ASCII •Batch processes must be assigned with the correct user ID and language •EBCDIC code pages are not supported 3/31/2004 49 .
using a single standard code page for new installations and upgrades is the optimal decision additional languages or language combinations are needed. SAP recommends Unambiguous Blended Code Pages for new installations and MDMP for existing installations Blended Code Pages only support certain language combinations and therefore an MDMP setup may be the only possibility for new installations as well 50 • If • Unambiguous 3/31/2004 .Recommendations from SAP (w/o Unicode) • In general.
Unicode-compliant SAP products
Unicode installations are currently planned only with written permission of SAP carried out as customer projects together with SAP, except of new installations of R/3 Enterprise Extension Set 2.0
Unicode-compliant SAP products
Web Application Server (≥ 6.20) Customer Relationship Management
• The Unicode version of mySAP CRM 4.0 is available via Ramp-Up
Supply Chain Management (SCM)
• The Unicode version of mySAP SCM 4.0 is available via Ramp-Up
Supplier Relationship Management (SRM)
• The Unicode version of mySAP SRM 4.0 is available via Ramp-Up • conversions (with or without MDMP) of existing SRM installations
Unicode-compliant SAP products
Business Intelligence (BW)
• The Unicode version of mySAP BW 3.5 is available via Ramp-Up • the conversion of existing BW installations as customer project • SAP Note 643813 has a collection of all relevant SAP notes concerning Unicode-based SAP BW installations
Product Lifecycle Management (PLM)
• The Unicode version of mySAP PLM 4.0 is available via Ramp-Up
R/3 Enterprise (Ext. 1.10 & higher) Exchange Infrastructure
When/why do customers need Unciode? • Global businesses that require IT systems to support multilingual data without any restrictions ’ f. and IT systems must consequently be able to support multiple local languages simultaneously • Web 3/31/2004 54 . customers with one WW central SAP system interfaces open the door to a global customer base.i.
mySAP components fully support web standards. and with Unicode.When/why do customers need Unciode? • With J2EE integration. it now can take full advantage of XML and Java Unicode makes it possible to seamlessly integrate inhomogeneous SAP and non-SAP system landscapes ’ NetWeaver • Only 3/31/2004 55 .
Technology in Depth 3/31/2004 56 .
IMIG Lab Test • SAP System-to-System communication • Printing & Output Management 3/31/2004 57 . Technology in Depth • Unicode & Operating Systems • Unicode & Databases • SAP Unicode-based Code Pages • How to Unicode-enable a program • Unicode-enabled ABAP • Migrating to Unicode enabled ABAP • Unicode Conversion.3.
and other related files locale also supports several other scripts for input.x Unicode locales in the HP-UX operating environment are based on the UTF-8 format locale includes a base language in the UTF-8 code set and the regional data related to this base language includes local formatting rules.Unicode & Operating Systems – HP-UX • HP-UX • All is Unicode-enabled since version 10. help messages. text messages. code conversion. display. and printing 58 • Each • This • Each 3/31/2004 .
Unicode & Operating Systems Windows • Some Unicode support has been included in Microsoft Windows since Windows 95. Russia or Thailand that uses a non-Latin alphabet • Windows • Before 3/31/2004 59 . Greece. Israel. your version of Windows may have used a different character set if you live in a country such as Egypt. and Windows NT 4 2000 and Windows XP/2003 are based on Unicode instead of the ANSI or WGL4 character sets Win2K.
Mac OS and UNIX or exchanging documents electronically that were created on computers using different character sets • This 3/31/2004 60 . Windows. but many of the places in the second set of 128 were taken by characters from the Arabic.Unicode & Operating Systems – Windows • The first 128 characters were the same as in ANSI. Greek. Cyrillic or Thai alphabets caused and still causes problems when moving documents between operating systems such as DOS. Hebrew.
BIG5 in Taiwan.Unicode & Operating Systems – Linux • Before UTF-8 emerged. Linux users all over the world had to use various different languagespecific extensions of ASCII popular were ISO 8859-1 and ISO 8859-2 in Europe. ISO 8859-7 in Greece. etc. made the exchange of files difficult and application software had to worry about various small differences between these encodings 61 • Most • This 3/31/2004 . KOI-8 / ISO 8859-5 / CP1251 in Russia. EUC and Shift-JIS in Japan.
major Linux distributors and application developers have now started to phase out these older legacy encodings in favor of UTF-8 support has improved dramatically over the last few years and ever more people now use UTF8 on a daily basis in • text files (source code. pipes •… • UTF-8 3/31/2004 62 .Unicode & Operating Systems – Linux • Because of these difficulties.) • file names • standard input and standard output. HTML files. etc. email messages.
Unicode & Operating Systems – Linux • In UTF-8 mode. terminal emulators (such as xterm) transform every keystroke into the corresponding UTF-8 sequence and send it to the stdin of the foreground process any output of a process on stdout is sent to the terminal emulator. where it is processed with a UTF-8 decoder and then displayed using a 16bit font 63 • Similarly. 3/31/2004 .
Unicode & Operating Systems – Linux • Before you start experimenting with UTF-8 under Linux.1 or Red Hat 8.0 these. update your installation to a recent distribution with up-to-date UTF-8 support is particular the case if you use an installation older than SuSE 8. UTF-8 support was far too limited and experimental to be recommendable for daily use 64 • This • Before 3/31/2004 .
Little vs. Big Endian • UCS and Unicode are first of all just code tables that assign integer numbers to characters exist several alternatives for how a sequence of such characters or their respective integer values can be represented as a sequence of bytes two most obvious encodings store Unicode text as sequences of either 2 or 4 bytes sequences • There • The 3/31/2004 65 .
Big Endian • The official terms for these encodings are UCS2 and UCS-4. respectively otherwise specified.Little vs. the most significant byte comes first in these (Big Endian convention) ASCII or Latin-1 file can be transformed into a UCS-2 file by simply inserting a 0x00 byte in front of every ASCII byte we want to have a UCS-4 file. we have to insert three 0x00 bytes instead before every ASCII byte 66 • Unless • An • If 3/31/2004 .
Big Endian UTF-16 UTF-16 [Big Endian] Character Unicode Scalar Value UTF-8 / CESU-8 [Little Endian] A Ä • • • 3/31/2004 U+0041 U+00C4 U+03B1 U+05D0 U+6653 41 41 00 00 41 C3 84 C4 00 00 C4 CE B1 B1 03 03 B1 D7 90 D0 05 05 D0 E6 99 93 53 66 66 53 67 .Little vs.
Unicode & Databases Supported Databases by SAP (WAS 6.20) P Available ? Currently not available -- Unsupported in general Win2K HP-UX Solaris AIX OS/400 OS/390 Linux SQL Server Oracle DB2 SAP DB P P P P -P P P -P P P -P P P --P ----P P P ? -- 3/31/2004 68 .
2 UTF-8 8 UTF-8 9i UTF-8 / UTF-16 10g UTF-8 / UTF-16 DB2 AIX CESU-8 AS400 UTF-16 SAP DB 7.0 UTF-8 3/31/2004 69 .0 UTF-16 8.Unicode & Databases Manufacturer SQL Server Oracle Version Encodings 2000 UTF-16 7.
SAP Unicode-based Code Pages • With the Unicode enablement of mySAP.com components (check chapter #1). the old code page management had to be changed of using SAP character numbers all code pages are now based on Unicode character Ids 5 digit SAP Character numbers no longer adequate • Instead •’ This change is valid for both Unicode and Non-Unicode Systems! 3/31/2004 70 .
SAP Unicode-based Code Pages 3/31/2004 71 .
• • 3/31/2004 72 .SAP Unicode-based Code Pages • Connection between SAP character number & Unicode character ID is found in table TCP01 You can see the connection in the SPAD character section NOTE: not every character has a corresponding Unicode character ID! f.i.
SAP Unicode-based Code Pages • The migration of all SAP code pages from the old to the new format was done using report RSCP0126 definition of code pages is still in TCP00 • The Customers must migrate their own code pages (9xxx) using RSCP0126 themselves! 3/31/2004 73 .
N. D. N. D.How to Unicode-enable a program • Separate Unicode and Non-Unicode version of • 1 character = 1 byte R/3 ABAP source Non-Unicode R/3 (types C. STRING) • Non-Unicode kernel • Non-Unicode database • 1 character = 2 bytes ’ UTF-16 Unicode R/3 (types C. T. STRING) • Unicode kernel • Unicode database • • No explicit Unicode data type in ABAP Single ABAP source for Unicode and non-Unicode systems 3/31/2004 74 . T.
i.How to Unicode-enable a program • Major • Minor part of ABAP coding is ready for Unicode without any changes part of ABAP coding has to be adapted to comply with Unicode restrictions (f. syntactical restrictions) 3/31/2004 75 .
How to Unicode-enable a program • Program attribute „Unicode checks active“ 3/31/2004 76 .
Unicode Enabled ABAP Design Goals • Platform • Highest independence level of compatibility to the pre-Unicode ØIdentical behavior on Unicode and non-Unicode systems world ØMinimize costs for Unicode enabling of ABAP Programs Main Features • Clear distinction between character and byte processing 1 Character <> 1 Byte 3/31/2004 77 .
Unicode Enabled ABAP ABAP lists: Difference between memory and display length 3/31/2004 78 .
Migrating to Unicode enabled ABAP Step 1 • In non-Unicode system • Adapt all ABAP programs to Unicode syntax and runtime restrictions attribute "Unicode enabled" for all programs • Set 3/31/2004 79 .
Migrating to Unicode enabled ABAP Step 2 • Set up a Unicode system • Unicode kernel + Unicode database • Only ABAP programs with the Unicode attribute are executable • Do runtime tests in Unicode system Check for runtime errors Look for semantic errors Check ABAP list layout with former double byte characters 3/31/2004 80 • • • .
.Migrating to Unicode enabled ABAP Use UCCHECK to analyze your applications: • Remove • Inspect errors statically not analyzable places (optional) • Untyped field symbols • Offset with variable length • Generic access to database tables • Set • Do Unicode program attribute using UCCHECK or SE38 / SE24 / . additional checks with SLIN (e. matching of actual and formal parameters in function modules) 81 3/31/2004 .g..
Migrating to Unicode enabled ABAP 3/31/2004 82 .
Migrating to Unicode enabled ABAP 3/31/2004 83 .
Upgrade to Unicode .
there are no limitations on users.20. see Note 0379940 for more information single code page system (standard or Unambiguous Blended Code Page) can be upgraded to Unicode using the normal upgrade method 85 • Unicode •A 3/31/2004 . and all languages in the ISO639 standard can be used is technically supported as of Basis Release 6.Upgrade to Unicode • With Unicode.
• and the special treatment of MDMP systems have to be taken into consideration 3/31/2004 86 .Unicode Conversion Roadmap Preparation • During preparation. topics such as • additional hardware requirements. • Unicode-enabling of customer developments. • downtime issues.
this is based on an SAP Unload/Reload of the complete database. the database conversion and system shutdown/restart are as automated as possible • For small to mid-size databases (< 1 TB). 3/31/2004 87 . and during this process.Unicode Conversion Roadmap Conversion • The Unicode conversion process is based on a system copy. minimum downtime tools will be used for larger databases.
special emphasis needs to be placed on cross-language handling during the test phase. you need to • verify data consistency on a scenario basis. Correction tools are provided by SAP.Unicode Conversion Roadmap Post-Conversion • Once the Unicode system is up and running. 3/31/2004 88 • . • as well as carry out general integration testing • For systems that support multiple languages. which can be used in the case that conversion did not run properly.
• 3/31/2004 89 .reducing the database size and growth To keep your database costs in check. the SAP Data Management service frees up valuable database resources by showing you how to reduce the size and growth of your database by typically 25 % (see details).Unicode Conversion Roadmap Post-Conversion • Additional Tool: SAP Data Management .
Unicode Conversion at a Glance Preparation Conversion Post-Conversion Set up the Unicode Conversion Project Check Prerequisites Data Analysis for downtime minimization – special MDMP treatment Enabling of Customer Developments Highly automated System will be down during database conversion Unload /reload process for small databases Minimum downtime tool for large databases Unicode system is up and running Verification of Data Consistency Integration Testing focused on language handling 3/31/2004 90 .
6b l R/3 Enterprise Ramp-Up started 2002-07 l Unicode availability follows a phase of restricted shipment with pilot customers 91 R/3 4.1i R/3 4.0b Direct upgrade R/3 Enterprise non-Unicode R/3 Enterprise Conversion Unicode R/3 4. then conversion to Unicode R/3 4.6c 3/31/2004 .5b l First upgrade.Upgrade Paths to Unicode (R/3 Enterprise) Source system Target system R/3 3.
then conversion to Unicode l BW 3.Upgrade Paths to Unicode (BW 3.0B BW 3.1C BW 3.1 non-Unicode BW 2.1 Ramp-Up starting 2002-12 BW 3.1 Conversion Unicode l Interfacing R/3 MDMP on a project base only l Unicode BEXGUI restrictions apply l First upgrade.1) Source system Target system BW 2.0 3/31/2004 l Unicode availability follows a phase of restricted shipment with pilot customers 92 .
1) Source system Target system CRM 2.0 3/31/2004 l Unicode availability follows a phase of restricted shipment with pilot customers 93 .1 Conversion Unicode l Selected scenarios only çè cooperation with SAP GBU CRM required l First upgrade.1 Ramp-Up starting 2002-12 CRM 3.1 non-Unicode CRM 2. then conversion to Unicode l CRM 3.0B CRM 3.Upgrade Paths to Unicode (CRM 3.0C CRM 3.
Unicode Conversion at a Glance Preparation Conversion Post-Conversion Set up the Unicode Conversion Project Check Prerequisites Data Analysis for downtime minimization – special MDMP treatment Enabling of Customer Developments Highly automated System will be down during database conversion Unload /reload process for small databases Minimum downtime tool for large databases Unicode system is up and running Verification of Data Consistency Integration Testing focused on language handling 3/31/2004 94 .
also additional steps are mandatory 3/31/2004 95 • . special MDMP treatment • OSS Note 548016 Conversion from Unicode to non-Unicode is not possible The Unicode Conversion of MDMP AND also Ambiguous Code page systems ( Code Page numbers 6100.1 system requires additional steps regarding the system copy OSS Note 573044 If you are using HR functionality within R/3 Enterprise . 6200 and 6500 ) is only supported on project basis with SAP involvement • OSS Note 543715 The Unicode Conversion of a BW 3.Prerequisites.
30 a database abstraction layer for the Java stack was introduced – OpenSQL for Java • Tables of the Java stack are stored in the same database instance like the tables of the ABAP stack in two different schema (except Informix) • The concept of MCOD installations is fully supported by the combined stack of ABAP and Java ABAP Stack (non Unicode/Unicode) System QA1 Java Stack (Unicode) ABAP Stack (non Unicode/Unicode) System TC2 Java Stack (Unicode) SAPTC2DB SAPQA1DB SAPTC2 SAPQA1 3/31/2004 96 .30 Unicode & MCOD • With SAP WebAS 6.6.
Unicode Conversion at a Glance Preparation Conversion Post-Conversion Set up the Unicode Conversion Project Check Prerequisites Data Analysis for downtime minimization – special MDMP treatment Enabling of Customer Developments Highly automated System will be down during database conversion Unload /reload process for small databases Minimum downtime tool for large databases Unicode system is up and running Verification of Data Consistency Integration Testing focused on language handling 3/31/2004 97 .
HTM 3/31/2004 98 .Unicode Conversion .bbn.hp.com/Global/Compet/migration/migration.IMIG Whitepaper: „SAP R/3 incremental migration test“ http://saphpcc.
SAP System-to-System Communication 3/31/2004 99 .
SAP provides standard tools for the installation of (and conversion to) Unicode-based systems that can also be used for checking and Unicode-enabling of customer developments 3/31/2004 100 . Furthermore.g.SAP System-to-System communication • SAP Web Application Server (≥ 6. so that communication between other Unicode-based systems or non-Unicode-based systems is possible. RFC) have been extended. ’ new developments can be smoothly exchanged • The interfaces (e.20) • Only one source code exists for Unicode-based and nonUnicode-based systems.
But as long as you restrict the character set. Unicode R/3 Latin-1 SJIS http/RFC MDMP R/3 • WWW SJIS Latin-1 http/RFC Non-Unicode SJIS R/3 3/31/2004 101 .SAP System-to-System communication • solid lines: receiver can receive all characters dotted lines: receiver cannot receive characters. data can be sent from everywhere to everywhere. which are not in its own code page.
SAP System-to-System communication RFC • Unicode <-> Unicode • no problem • non Unicode <-> non Unicode <-> non Unicode • old stuff. receiver converts code page if possible • Unicode • the Unicode side converts from/ to the code page of the non Unicode side • MDMP is converted with a languages key • System settings allow the configuration of error handling 3/31/2004 102 .
SAP System-to-System communication RFC (SM59) – Unicode <–> non Unicode 3/31/2004 103 .
SAP System-to-System communication RFC (SM59) – Unicode <–> non Unicode 3/31/2004 104 .
a distinction is made between "printer driver" and "device type“ A device type consists of a variety of attributes defined for an output device One of these attributes is the printer driver to be used by SAPscript (R/3 forms processor) for this particular printer 3/31/2004 105 .Printing & Output Management What is a SAP device type? • configuration file for the SAP printer driver that ensures proper functionality between the SAP data stream and the printer or output device where the data is sent Printer drivers & device types • • • In R/3.
com/printing_solutions/Device_Types.www4.html 3/31/2004 106 • • • .Printing & Output Management • device types cover aspects such as control commands for font selection. PCL6 and PostScript SAP develops.hp. page size. tests and supports device types for HP products that can be found here: http://h40045. character set used and so on a device type must be specified to enable directprinting from the SAP applications for every new printer defined in SAP environment device types are created by SAP for the entire HP LaserJet printer family on the basis of PCL5. character set selection.
Kyocera FS-1500) • device types SWIN/SAPWIN/xxSWIN/xxSAPWIN 3/31/2004 107 .Printing & Output Management • at present.6 series) • PostScript printers (PS level 2) • PRESCRIBE (for example. HP Laserjet 3. there are five SAPscript printer drivers They include: • HP-PCL5 (for example.5.4.
customers need to have a UNICODE compliant printer and a SAP UNICODE device-type UNICODE compliant printer are defined by firmware support for UTF8 and/or UTF16 and UNICODE fonts loaded on the printer today LEXMARK is the preferred vendor for SAP UNICODE printing 3/31/2004 108 Background: • • • . in order to support UNICODE character-sets on an HP printer. claiming that only LEXMARK could support SAP UNICODE printing.Printing & Output Management Unicode Device Types • LEXMARK is going into HP accounts.
LJ9000.) or Stephen Westberg (EMEA) 3/31/2004 109 • • • . CLJ9500 and future products will support UTF8 fonts in PCL5 firmware role is planned to also support all current OZ based printers (LJ4200/4300.Printing & Output Management Solution for HP • • • all OZ based printers (LJ2300 and higher) support by default UNICODE UTF16 fonts in PCL6 the LJ2300.S. stored on internal hard-disc) today we have a UNICODE-prototype-solution available to print from an SAP environment for more information.g. CLJ5500) to support UTF-8 in PCL5 furthermore the UNICODE fonts need to be loaded on the printer (e. contact Alan Cooke (U. CLJ4600.
Sizing Information for Unicode-based SAP Systems 3/31/2004 110 .
more for Chinese/ Japanese/ Korean). compared to encodings currently in use (8 bit per character for European languages.Sizing Info .General The space requirements for encoding a text. is as follows ’ next Slide This has an influence on disk storage space and network download speed (when no form of compression is used) 3/31/2004 111 .
Greek and Cyrillic UCS-4 100% more for Chinese/Japanese/Korean. 300% more for US ASCII and ISO-8859-1.Sizing Info . Greek and Cyrillic 3/31/2004 112 . 100% more for Greek and Cyrillic UCS-2 and UTF-16 No change for Chinese/Japanese/Korean. 100% more for US ASCII and ISO-8859-1. just a few percent more for ISO-8859-1. 50% more for Chinese/Japanese/Korean.General UTF-8 No change for US ASCII.
Expected Hardware Requirements • Increase of CPU requirements ØDepending on existing solution: ISO-LATIN1 (ASCII) ð Unicode: +30% Double-Byte/MDMP ð Unicode: + <5% • Increase of memory requirements ØIncrease of memory requirements depending on underlying DB (+ ~50%) ØApplication Server internally based on UTF-16. DB either UTF-8. CESU-8 or UTF-16 3/31/2004 113 .
Unicode Conversion Demo JAVA Applet Demo 3/31/2004 114 .
UTF-16) Ø Languages in use • A 1 Byte Ä 1100 8000 CESU-8 UTF-16 1100 8000 CESU-8 UTF-16 1100 8000 CESU-8 UTF-16 Encoding Manufacturers UTF-8 CESU-8 UTF-16 • Oracle.0) Additional Storage Req‘s 35% 60-70% Network load: (draft results) <7% for Latin-1. CESU-8. DB/2 (AS400).g. SAP DB (8. about 15% for Japanese. SAP DB (7.Expected Hardware Requirements • Database growth depending on Ø DB Unicode encoding schema (e.0) DB/2 (AIX) SQL Server. 25% for other Asian languages 3/31/2004 115 .
7 (6. App:+10% +10% +5% +5% Disk 1 +10% +10% NON-Unicode 3/31/2004 116 .Expected Hardware Requirements R/3 Release 4.0 4.6c 4.5 4.20) non-Unicode CPU Memory 1 1 +20% +20% +15% DB: +20%.
7 (6.7 with Unicode CPU Memory Disk 1 1 1 +30% to 35% +50% +~35% (UTF-8) +60-70% (UTF-16) Unicode 3/31/2004 117 .Expected Hardware Requirements R/3 Release 4.20) non-Unicode 4.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue reading from where you left off, or restart the preview.