Code page 936 (IBM)
|Transforms / Encodes||GB 2312|
|Other related encoding(s)||Shift JIS|
IBM code page 936 was a character encoding for Simplified Chinese including 1880 user-defined characters (UDC). It was a combination of the single-byte Code page 903 and the double-byte Code page 928. Code page 946 used the same double-byte component, but an extended single-byte component (Code page 1042).
IBM code page 936 should not be confused with the identically numbered Windows code page, which is a variant of the GBK encoding; GBK is called Code page 1386 by IBM. While GBK is a superset of the EUC-CN encoding of GB 2312, IBM-936 uses a different coded form of GB 2312, more closely resembling the relationship of Shift JIS to JIS X 0208.
The encoding was in use mainly during the 1980s and early 1990s. While the original IBM PC (IBM 5150) lacked functionality for processing data in CJK languages, the IBM 5550 possessed such functionality, and was available in models supporting Japanese, Korean, Traditional Chinese or Simplified Chinese. Code page 936 for Simplified Chinese accompanied code page 932 (Shift JIS) for Japanese, code page 934 for Korean and code page 938 for Traditional Chinese.
The last revision of IBM-928/936/946 was documented in 1992, and it was superseded in 1993 by the EUC-CN-based code pages 1380 through 1383; code page 1380 encodes the same characters as code page 928, but in a different layout. As of 1998, "some older Chinese packages" still included an algorithm for converting between IBM-936 and other encodings of GB 2312.
Although chart definitions for Code page 1380 (the document C-H 3-3220-130 1993-11) are provided online by IBM, IBM does not similarly provide the chart definition for the older Code page 928 (the document C-H 3-3220-130 1992-11, i.e. an earlier revision of the same specification). International Components for Unicode (ICU) does not include an IBM-936 or IBM-946 codec, and uses the Windows code page for the "cp936" label. The ICU project does possess mapping data for IBM-946, which it makes publicly available, but does not ship it with ICU.
Code page 928, the double byte component, included 9,355 characters as double-byte sequences starting with 0x81 through 0xAC and 0xF0 through 0xFA.
The 0x81–AC lead byte range was used for GB 2312 characters: lead bytes 0x81–87 were used for non-hanzi, 0x88–9C were used for level 1 hanzi and 0x9C–AC were used for level 2 hanzi. Like Shift JIS, trail (second) bytes were in the range 0x40–FC excluding 0x7F, allowing two GB 2312 rows to be encoded per lead byte; unlike Shift JIS, the bytes 0xA0–AC were not excluded from the lead byte range, since JIS X 0201 compatibility was not required. The 0xF0–FA lead byte range was used for IBM extensions: 0xF0 through 0xF9 were used for user-defined characters, and 0xFA was used for additional non-hanzi.
- ^ a b c Leisher, Mark (2008) [1998-03-06]. "SHIFTGB.TXT: Shifted GB2312.1980. Generated from an algorithm provided with some older Chinese packages". Department of Mathematical Sciences, New Mexico State University.
- ^ a b c Lunde, Ken (2009). "Chapter 4: Encoding Methods (§ Code Pages)". CJKV Information Processing (2nd ed.). Sebastopol, California: O'Reilly Media. pp. 278–282. ISBN 978-0-596-51447-1.
- ^ "CCSID 936". IBM. Archived from the original on 2016-03-27.
- ^ "CCSID 946". IBM. Archived from the original on 2016-03-26.
- ^ a b c d e "Table 1: Registration of GCSGID and CPGID for the IBM CH-S Graphic Character Set". C-H 3-3220-130 1993-11: IBM Simplified Chinese Graphic Character Set (PDF). 1993. p. 6.
- ^ "Code page 928 information document". Archived from the original on 2016-03-17.
- ^ "windows-936-2000 (alias cp936)". ICU Demonstration - Converter Explorer. International Components for Unicode.
- ^ a b c d "ibm-946_P100-1995". International Components for Unicode Data Repository. Unicode Consortium, IBM.
- ^ "CCSID 928 information document". Archived from the original on 2016-03-26.