![]() |
|
|
| Character Set StandardsWarning: This page has not been updated since December 1999. A more up-to-date version can be found on The Diffuse Project website. This section of the OII Standards and Specifications List provides information on character sets that can be used for data interchange. It contains details of:
Further information on these character sets, and other, proprietary, character sets, can be obtained from http://www.dkuug.dk/i18n/charmaps/. Part 5 of the Netherlands Ministry of the Interior's series on Standards for the electronic exchange of personal data (ISBN 90-5414-019-4) provides a tutorial on the character set standards and their historical development. Information on character set conversion software can be obtained from the TERENA project by contacting http://www.nada.kth.se/i18n/c3/. The standards in this section have been prepared by both private and public organizations. The following public bodies have been involved in their preparation:
|
|
|
ASCIIExpanded nameAmerican Standard Code for Information Interchange
Area covered
Sponsoring body and standard details
Characteristics/description
Usage (Market segment and penetration)
Further details available from: A list of ASCII codes can be obtained from http://www.dkuug.dk/i18n/charmaps/ANSI_X3.4-1968.
|
|
|
EBCDICExpanded nameExtended Binary Coded Decimal Interchange Code
Area covered
Sponsoring body and standard details
Characteristics/description
Usage (Market segment and penetration)
Further details available from: Details of the most commonly used sets of EBCDIC codes can be obtained from http://www.dkuug.dk/i18n/charmaps.
|
|
|
ISO 646Expanded nameISO 7-bit coded character set for information interchange
Area covered Sponsoring body and standard details
Characteristics/description Character positions 00-31 (ISO positions 0/0 to 1/15) and 127 (ISO position 7/15) are reserved for control codes. Code 32 (2/0) identifies a space. The sequence in which other codes appear in the IRV is: ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ `
a b c d e f g h i j k l m n o p q r s t u v w x y z { | } ~
Usage (Market segment and penetration)
Further details available from:
|
|
|
ISO 2022Expanded nameCharacter code structure and extension techniques
Area covered Sponsoring body and standard details
Characteristics/description Up to 4 code sets (G0-G3) can be mapped into the left-hand side of an 8-bit ISO code set. Three of these (G1-G3) can also be used on the right-hand side. Escape code sequences are used to identify which code sets are to be used. Users can also select variant control code sequences using Escape code sequences. Escape code sequences are also used to provide a single character change of character sets.
Usage (Market segment and penetration)
Further details available from: Details of CEN work to define a Minumum set of control functions for Europe can be found in the OII Multimedia and Hypermedia Standards Activity Report, March 1996
|
|
|
ISO 4873Expanded name8-bit code for information interchange
Area covered Sponsoring body and standard details
Characteristics/description Note: ISO 4873 has not been updated to conform to changes made to ISO 2022.
Usage (Market segment and penetration) Note: The British Standard Specification for United Kingdom 8-bit data code (BS 6006) is based on ISO 4873.
Further details available from:
|
|
|
ISO 6429Expanded nameControl functions for coded character sets
Area covered Sponsoring body and standard details
Characteristics/description
Usage (Market segment and penetration)
Further details available from:
|
|
|
ISO 6937Expanded nameCoded graphic character set for text communication -- Latin alphabet
Area covered Sponsoring body and standard details
Characteristics/description
Usage (Market segment and penetration) ISO 6937 also provides the character set repetoire used for X.400 message handling systems and X.500 directory services.
Further details available from: Details of the ISO 6937 code sets can be obtained from http://www.dkuug.dk/i18n/charmaps/ISO_6937-2-ADD.
|
|
|
ISO 8859Expanded name8-bit single-byte coded graphic character sets
Area covered Sponsoring body and standard details
Characteristics/description
Usage (Market segment and penetration)
Further details available from: Details of Parts 1-10 of the ISO 8859 code sets can be obtained from http://www.dkuug.dk/i18n/charmaps.
|
|
|
ISO 9036Expanded nameArabic 7-bit coded character set for information interchange
Area covered Sponsoring body and standard details
Characteristics/description ISO 11822 covers the use of the Arabic alphabet in bibliographic entries
Usage (Market segment and penetration)
Further details available from: Details of the ISO 9036 code set can be obtained from http://www.dkuug.dk/i18n/charmaps/ASMO_449.
|
|
|
ISO 9541Expanded nameFont Information Interchange
Area covered Sponsoring body and standard details
Characteristics/description Part 1 of the standard explains the general architecture of the font information interchange standard. Part 2 defines the metrics used to describe the weight, width, height, etc, of a glyph. Part 3 defines how the information needed to generate a glyph should be interchanged, and defines ASN.1 and SGML interchange formats for this information. ISO 9541 has been defined so that metric information can be interchanged separately from the more commercially sensitive glyph generation information. The metrics defined in Part 2 are of relevance to composition and other software that needs to calculate the relative position of glyphs. Only when the characters are actually being displayed/printed does access need to be provided to the much bulkier glyph drawing information. ISO 9541 Type 1 fonts are compatible with Version 23.0 of the Postscript interpreter. A register of glyph identifiers is maintained on behalf of the ISO by Association for Font Information Interchange (AFII). This register is based on ISO/IEC 10036.
Usage (Market segment and penetration)
Further details available from:
|
|
|
ISO 10367Expanded nameStandardized coded graphic character sets for use in 8-bit codes
Area covered Sponsoring body and standard details
Characteristics/description Registration of character repertoires is carried out using the procedures laid down in ISO/IEC 7350:1991.
Usage (Market segment and penetration) Details of the ISO 10367 character set that allows box drawing characters to be used in conjunction with ISO 8859 can be obtained from http://www.dkuug.dk/i18n/charmaps/ISO_10367-BOX.
Further details available from:
|
|
|
ISO 10538Expanded nameControl functions for text communication
Area covered Sponsoring body and standard details
Characteristics/description
Usage (Market segment and penetration)
Further details available from:
|
|
|
ISO 10646Expanded nameUniversal Multiple-Octet Coded Character Set (UCS)
Area covered Sponsoring body and standard details
Characteristics/description The code set is split into 128 "groups" of "planes" containing 256 "rows" with 256 "cells" for characters. Each character is addressed using multiple octets, the third (first) of which identifies the row containing the character and the fourth (second) its cell number. The first 127 characters of the Basic Multilingual Plane (BMP) used for 16-bit code interchange are those of the ISO 646 International Reference Version of ASCII. The characters forming the second half of the first row are those used in ISO/IEC 8859-1, the Latin-1 character set. Other rows provide access to:
Usage (Market segment and penetration) A plan to extend the code set to cover all languages was agreed in September 1998. A revised version of the standard is expected to appear shortly.
Further details available from: Details of the ISO 10646 code set can be obtained from http://www.dkuug.dk/i18n/ISO_10646. (Warning: This is a large, 81Kb, file.) To join the ISO 10646 mailing list send a message consisting of subscribe followed by your e-mail address to listproc@listproc.hcf.jhu.edu
|
|
|
JIS X 0201Expanded nameJapanese Industrial Standard Code for Information Interchange
Area covered Sponsoring body and standard details
Characteristics/description In 7-bit environments the SO (0/14) and SI (0/15) codes are used to switch from the Latin to the Katakana code set. In 8-bit environments the Katakana characters form the right-hand sector (11/1 to 13/15).
Usage (Market segment and penetration)
Further details available from: Details of the JIS X 0201 code set can be obtained from http://www.dkuug.dk/i18n/charmaps/JIS_X0201.
|
|
|
JIS X 0202Expanded nameExtension techniques for use with the Code for Information Interchange
Area covered Sponsoring body and standard details
Characteristics/description
Usage (Market segment and penetration)
Further details available from:
|
|
|
JIS X 0208/0212Expanded nameCode of the Japanese Graphic Character Set for Information Interchange
Area covered Sponsoring body and standard details
Characteristics/description
Usage (Market segment and penetration)
Further details available from:
|
|
|
OCRExpanded nameOptical Character Recognition
Area covered Sponsoring body and standard details
Characteristics/description In June 1995 ISO/IEC JTC1/SC2 voted to accept a Turkish proposal to extend the OCR-B character set to include a range of accented and related characters. The new characters include characters suitable for use in Iceland, Greece and Lithuania. 33 new characters and 6 accents will be added by this extension.
Usage (Market segment and penetration)
Further details available from:
|
|
|
Other Character SetsExpanded nameCharacter sets not listed elsewhere
Area covered Sponsoring body and standard details
Characteristics/description
Usage (Market segment and penetration)
Further details available from:
|
|
|
This information set on OII standards is maintained by Martin Bryan of The SGML Centre and Man-Sze Li of IC Focus on behalf of the European Commission Information Society DG. File last updated: December 1999 |