You are on page 1of 4

Code page is another term for character encoding.

It consists of a table of values that describes the character set for a particular language. The term code page originated from IBM's EBCDIC-based mainframe systems, Corporation.
[3] [1] [2]

but many vendors use this term including Microsoft, SAP, and Oracle

Vendors often allocate their own code page number to a character encoding, even if it is

better known by another name (for example UTF-8 character encoding has code page numbers 1208 at IBM, 65001 at Microsoft, 4110 at SAP).
Contents
[hide]

1 The code page numbering system

o o

1.1 Relationship to ASCII 1.2 Relationship to Unicode

2 Noteworthy code pages

o o o o o

2.1 IBM PC (OEM) code pages 2.2 Code pages for DBCS character sets 2.3 Microsoft code page numbers for various other character encodings 2.4 Miscellaneous 2.5 Windows (ANSI) code pages

3 Criticism 4 Private code pages 5 See also 6 References 7 External links

[edit]The

code page numbering system

IBM introduced the concept of systematically assigning a small, but globally unique, 16 bit number to each character encoding that a computer system or collection of computer systems might encounter. The IBM origin of the numbering scheme is reflected in the fact that the smallest (first) numbers are assigned to variations of IBM's EBCDIC encoding and slightly larger numbers refer to variations of IBM's extended ASCII encoding as used in its PC hardware. With the release of PC-DOS version 3.3 (and the near identical MS-DOS 3.3) IBM introduced the code page numbering system to regular PC users, as the code page numbers (and the phrase "code page") were used in new commands to allow the character encoding used by all parts of the OS to be set in a systematic way.
[4]

After IBM and Microsoft ceased to cooperate in the 1990s, the two companies have maintained the list of assigned code page numbers independently from each other, resulting in some conflicting assignments. At least one 3rd party vendor (Oracle) also has its own different list of numeric assignments.
[5]

IBM's

current assignments are listed in their CCSID repository. Microsoft's assignments seem not to be documented anywhere, but a list of the names and approximate IANA abbreviations for the installed code pages on any given Windows machine can be found in the Registry on that machine (this information is used by Microsoft programs such as Internet Explorer). Most well-known code pages, excluding those for the CJK languages and Vietnamese, fit all their codepoints into 8 bits and do not involve anything more than mapping each code-point to a single character; furthermore, techniques such as combining characters, complex scripts, etc., are not involved. The text mode of standard (VGA-compatible) PC graphics hardware is built around using an 8-bit code page, though it is possible to use two at once with some color depth sacrifice, and up to 8 may be stored in the display adaptor for easy switching
[6]

. There were a selection of 3rd party code page fonts that could

be loaded into such hardware. However, it is now commonplace for operating system vendors to provide their own character encoding and rendering systems that run in a graphics mode and bypass this hardware limitation entirely. However the system of referring to character encodings by a code page number remains applicable, as an efficient alternative to string identifiers such as those specified by the IETF and IANA for use in various protocols such as e-mail and web pages. [edit]Relationship

to ASCII

The vast majority of code pages in current use are supersets of ASCII, a 7-bit code representing 128 control codes and printable characters. In the distant past, 8-bit implementations of the ASCII code set the top bit to zero or used it as a parity bit in network data transmissions. When the top bit was made available for representing character data, a total of 256 characters and control codes could be represented. Most vendors (including IBM) used this extended range to encode characters used by various languages and graphical elements that allowed the imitation of primitive graphics on text-only output devices. No formal standard existed for these extended character sets and vendors referred to the variants as code pages, as IBM had always done for variants of EBCDIC encodings. [edit]Relationship

to Unicode

Unicode is an effort to include all characters from previous code pages into a single character enumeration that can be used with a number of encoding schemes. In the process, duplicate characters are eliminated and new variants are introduced, like Fullwidth ASCII. While consistent use of any single Unicode encoding would theoretically eliminate the need to keep track of different code pages or character encodings, the existence of multiple encodings of Unicode as well as the need to remain compatible with existing documents and systems that use the older encodings remains. In practice the

various Unicode character set encodings have simply been assigned their own code page numbers, and all the other code pages have been technically redefined as encodings for various subsets of Unicode. [edit]Noteworthy [edit]IBM

code pages

PC (OEM) code pages

These code pages were originally embedded directly in the text mode hardware of the graphic adapters used with the IBM PC and its clones, including the original MDA and CGA adapters whose character sets could only be changed by physically replacing a ROM chip that contained the font. The interface of those adapters (emulated by all later adapters such as VGA) was typically limited to single byte character sets with only 256 characters in each font/encoding (although VGA added partial support for slightly larger character sets). Since the original IBM PC code page (number 437) was not really designed for international use, several partially compatible country or region specific variants emerged. Microsoft refers to these as the OEM code pages because they were defined by the OEM's who licensed MS-DOS for distribution with their hardware, not by Microsoft or a standard body. Examples include: 437 The original IBM PC code page 720 Arabic 737 Greek 775 Estonian, Lithuanian and Latvian 850 "Multilingual (Latin-1)" (Western European languages) 852 "Slavic (Latin-2)" (Central and Eastern European languages) 855 Cyrillic 857 Turkish 858 "Multilingual" with euro symbol 860 Portuguese 861 Icelandic 862 Hebrew 863 French (Quebec French) 865 Danish/Norwegian Differs from 437 only in the letter () in place of and 866 Cyrillic 869 Greek 874 Thai
[7]

When dealing with older hardware, protocols and file formats, it is often necessary to support these code pages, but use of newer code pages, in particular Unicode, is encouraged for new designs.

You might also like