What's
New
Reference
Business Guides
Standards
List
Standards
Fora List
RTD
Project List
News
Electronic Commerce
Information Management
Information Society RTD
Standards Conferences
Diffuse Conferences
User Support
Index
Search
Help
Desk
Background
About
IST
About
Diffuse
Diffuse
FAQ
RTD
Initiatives
IPR
Statement
Disclaimer
|
Character Set Standards
|
 |
Purpose of Section
This section of the Diffuse
Standards and Specifications List provides information on character
sets that can be used for data interchange.
Subjects Covered
-
ASCII - American Standard Code for Information Interchange
-
EBCDIC - Extended Binary Coded Decimal Interchange Code
-
ISO 646 - ISO 7-bit coded character set for information
interchange
-
ISO/IEC 2022 - Character code structure and extension
techniques
-
ISO/IEC 4873 - 8-bit coded character set for information
interchange
-
ISO/IEC 6429 - Control functions for coded character sets
-
ISO/IEC 6937 - Coded graphic character set for text
communication - Latin alphabet
-
ISO/IEC 8859 - 8-bit single-byte coded graphic character
sets
-
ISO 9036 - Arabic 7-bit coded character set for information
interchange
-
ISO 9541 - Font information interchange
-
ISO/IEC 10367 - Standardized coded graphic character sets
for use in 8-bit codes
-
ISO/IEC 10538 - Control functions for text communication
-
ISO/IEC 10646 - Universal multiple-octet coded character
set (UCS)
-
JIS X 0201 - Japanese Industrial Standard code for information
interchange
-
JIS X 0202 - Extension techniques for use with the code
for information interchange
-
JIS X 0208/0212 - Code of the Japanese graphic character
set for information interchange
-
JIS X 0213 - 7-bit and 8-bit double byte coded extended Kanji sets for information interchange
-
OCR
-
Other character set standards
Entry updated this
month
Active Fora
The standards in this section have been prepared by both private and
public organizations. The following public bodies have been involved in
their preparation:
-
ISO/IEC JTC1/SC2
-- JTC1 is the first (and only) Joint Technical Committee of ISO and IEC,
and deals with Information Technology. SC2 is the subcommittee of JTC1
responsible for the description of Coded character sets and Code extension
techniques
-
ISO/IEC JTC1/SC34
-- SC34 is the subcommittee of JTC1 responsible for Document description
and processing languages
-
ITU -- International
Telecommunication Union
-
CEN -- European
Committee for Standardization TC
304, ICT - European Localization Requirements (formerly called Character
Set Technology)
-
TERENA -- Trans-European Research
Networks Association Working Group on Character Sets and Internationalization
of Networks Services
-
ANSI -- American National Standards
Institute
-
JISC -- Japanese Industrial
Standards Committee.
Related Initiatives
Unicode, an industry consortium
that produces the Unicode standard which has the same character repertoire
and coding as the ISO/IEC 10646 (UCS) makes its
Standard Version 3 and related Technical Reports available on its web pages.
The standard also defines character properties and provides implementation
guidelines that are not part of the UCS.
Statskontoret, the Swedish
Agency for Administrative Development has published Comparisons of Standardized
Character Sets for Europe (2000:2), a revised report (ISBN 91-7220-374-9)
on a representative number of 7- and 8-bit coded standardized and proprietary
character sets and registrations. Code tables for selected pairs
indicate which characters exist in both sets with the same encoding, which
with different encoding and which don't exist in the comparison set.
Further information on certain standardized character sets, and other,
proprietary, character sets, can be obtained from
http://www.dkuug.dk/i18n/charmaps/.
Part 5 of the Netherlands Ministry of the Interior's series on Standards
for the electronic exchange of personal data (ISBN 90-5414-019-4) provides
a tutorial on the character set standards and their historical development.
Information on character set conversion software can be obtained from
the
TERENA project by contacting
http://www.nada.kth.se/i18n/c3/.
The Diffuse Guide to Character Sets
provides on overview of the role of character sets. |
Section Contents
Standards
List
Index
Help

|
ASCII
Expanded name
American Standard Code for Information Interchange
Area covered
7-bit coded character set for information interchange
Sponsoring body
American National Standards Institute (ANSI)
Source documents
Information Systems – Coded Character Sets – 7-Bit American National
Standard Code for Information Interchange (7-Bit ASCII)
Characteristics/description
Specifies coding of space and a set of 94 characters (letters, digits
and punctuation or mathematical symbols) suitable for the interchange of
basic English language documents. Forms the basis for most computer code
sets and is the American National Version of ISO/IEC
646.
Usage
Used as the basic US code set for personal and workstation computers.
The following IST RTD projects use this standard: M-PIRO.
Further details available from
ANSI, 25 West 43rd Street, New York,
NY 10036, USA
Other references
A list of ASCII codes can be obtained from
http://www.dkuug.dk/i18n/charmaps/ANSI_X3.4-1968. |
Section Contents
Standards
List
Index
Help

|
EBCDIC
Expanded name Extended Binary Coded
Decimal Interchange Code
Area covered 8-bit coded character
set for information interchange between IBM computers
Sponsoring body Proprietary
specification developed by IBM
Characteristics/description A set of
national character sets for interchange of documents between IBM
mainframes. Most EBCDIC character sets do not contain all of the
characters defined in the ASCII code set but
there is a special International Reference Version (IRV) code set that
contains all of the characters in ISO/IEC 646 (and,
therefore, ASCII). Several national versions have been updated to support
the encoding of the euro sign (in lieu of the currency sign).
Usage Not much used outside of IBM
and similar mainframe environments. When transmitting EBCDIC files between
systems care needs to be taken to ensure that the systems are set up for
the relevant national code set.
Further details available from Your
local IBM office.
Other references Details of the most
commonly used sets of EBCDIC codes can be obtained from http://www.dkuug.dk/i18n/charmaps
which, however, has not necessarily been updated to cover the new code
pages that also support the euro sign..
- Unicode Consortium report on EBCDIC-Friendly UCS
Transformation Format
- OII Standards
and Specifications Activity Report, December 1998
|
Section Contents
Standards
List
Index
Help

|
ISO 646
Expanded name ISO 646: 7-bit
coded character set for information interchange
Area covered Unaccented Latin
letters, digits and punctuation characters
Sponsoring body ISO/IEC JTC1/SC2
and ITU
Source documents ISO 646:1991/ITU-T Recommendation
T.50 (09/92) Information technology -- 7-bit coded character set
for information interchange
Characteristics/description
Specifies 7-bit coding for space and 94 characters. There is an
International Reference Version (IRV), which is identical to ASCII, and national
variants that provide accented and other special characters required in
different countries.
Character positions 00-31 (ISO positions 0/0 to 1/15) and 127 (ISO
position 7/15) are reserved for control codes. Code 32 (2/0) identifies a
space. The sequence in which other codes appear in the IRV is: ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ `
a b c d e f g h i j k l m n o p q r s t u v w x y z { | } ~Usage Base character set used by most systems.
Contains all characters provided by the standard shift positions of basic
US QWERTY keyboards.
Further details available from ISO and national standards
bodies. |
Section Contents
Standards
List
Index
Help

|
ISO/IEC 2022
Expanded name ISO/IEC 2022:
Character code structure and extension techniques
Area covered Structure of 7-bit or
8-bit code tables, rules for code extension
Sponsoring body ISO/IEC JTC1/SC2 Source documents ISO/IEC 2022:1994
Information technology -- Character code structure and extension
techniques. A technical corrigendum was published in 1999.
Characteristics/description ISO
standard for switching between code sets in 7-bit and 8-bit environments.
Describes the role of the Escape, Shift-Out (SO) and Shift-In (SI) codes
in the base control code set for controlling which character sets are used
in an 7-bit environment, and how the role of these characters changes in
an 8-bit environment to provide a locking shift code swapping function.
Up to 4 code sets (G0-G3) can be mapped into the left-hand side of an
8-bit ISO code set. Three of these (G1-G3) can also be used on the
right-hand side. Escape code sequences are used to identify which code
sets are to be used. Users can also select variant control code sequences
using Escape code sequences. Escape code sequences are also used to
provide a single character change of character sets.
Usage Forms the basis for code
switching in other standards, including SGML, but Shift functions
are not used on standard hardware platforms, making use of this standard
problematical.
Further details available from ISO and national standards
bodies. |
Section Contents
Standards
List
Index
Help

|
ISO/IEC 4873
Expanded name ISO/IEC 4873: ISO
8-bit code for information interchange
Area covered Rules for developing
8-bit code sets
Sponsoring body ISO/IEC JTC1/SC2
Source documents ISO/IEC 4873:1991
Information technology -- ISO 8-bit code for information interchange --
Structure and rules for implementation
Characteristics/description Standard
explaining the structure of 8-bit coded character sets based on the
concepts of ISO/IEC
2022. Three levels of implementation are specified, 1 for No Shifts, 2
for Single Shifts, 3 for Locking Shifts.
Note: ISO/IEC 4873 has not been updated to conform to changes made to
ISO/IEC 2022.
Usage Provides basic rules for later
ISO standards.
Further details available from ISO and national standards
bodies. |
Section Contents
Standards
List
Index
Help

|
ISO/IEC 6429
Expanded name
ISO/IEC 6429: Control functions for coded character sets
Area covered
Control codes for 7-bit and 8-bit coded character sets
Sponsoring body
ISO/IEC JTC1/SC2
Source documents
ISO/IEC 6429:1992
Information technology -- Control functions for coded character sets
Characteristics/description
Defines 163 control functions, including the control characters that
can be used in the C0 (0/0 - 1/15) positions in 7-bit and 8-bit environments
and C1 (8/0 - 9/15) positions in 8-bit environments.
Usage
Forms the basis for control code definitions in many systems.
Further details available from
ISO and national standards
bodies. |
Section Contents
Standards
List
Index
Help

|
ISO/IEC 6937
Expanded name
ISO/IEC 6937: Coded
graphic character set for text communication -- Latin alphabet
Area covered Defines a character set
supporting most western European languages in a limited fashion
Sponsoring body ISO/IEC JTC1/SC2 and ITU
Source documents
Characteristics/description The left-hand side
of this 8-bit code set is based on ISO 646. The code set
for the right-hand set of 94 characters contains a set of diacritical
marks that can form predefined combinations with letters on the left-hand
side to produce accented characters, together with other characters used
in European languages based on the Latin script that are not suitable for
splitting into letter plus diacritic, such as the thorn (þ) used in
Icelandic. The set also includes a set of single, double and French style
angle open and closing quotation marks, Copyright and Registered symbols,
the Spanish inverted question mark, some maths signs, fractions, superior
numbers (2 and 3 only) and a set of arrows.
Usage Basic character set used on
teletext, videotext and related systems. For computers this code set has
mostly been superseded by ISO/IEC 8859 and ISO/IEC 10646.
ISO 6937 also provides the character set repertoire used for X.400 message handling
systems and X.500
directory services and its repertoire was the basis for the first
version of the ISO/IEC 9995-2 international keyboard layout standard .
The ITU version of the standard is based on the 1994 version of the ISO
standard.
Further details available from ISO and national standards bodies.
Other references Details of the ISO/IEC
6937 code set can be obtained from http://www.dkuug.dk/i18n/charmaps/ISO_6937-2-ADD. |
Section Contents
Standards
List
Index
Help

|
ISO/IEC 8859
Expanded name
ISO/IEC 8859: 8-bit single-byte coded graphic character sets
Area covered Defines accented and
non-Latin characters used in European languages
Sponsoring body ISO/IEC JTC1/SC2
Source documents ISO/IEC 8859
Information processing -- 8-bit single-byte coded graphic character
sets
Characteristics/description
Specifies coding for sets of accented characters that cover the needs
of most European languages, including limited sets of Greek, Hebrew and
Arabic characters and some Cyrillic characters. Part 1 covers Western
European languages, some of which have been more fully covered by Part 15,
which also supports the euro sign. Part 2 covers Eastern European (Slavic,
Albanian, Hungarian and a variation of Romanian) languages, Part 3 covers
Southern European languages (Maltese) and Esperanto, and Part 4 covers
Northern European languages. Part 9 covers characters used for Turkish,
replacing those in Part 1 for Icelandic, while Part 10 deals with the
Icelandic, Nordic and Baltic character sets. Part 11 combines Latin and Thai
characters while Part 16 (Latin No. 10) replaces Part 2 for Romania
and supports some characters with comma below as opposed to with cedilla
and also the euro sign. Part 7 is under revision (to
include, among others, the euro sign),
Usage Used by a few systems as the
underlying code set. ISO/IEC 8859-1 has been commonly used as the basis of
extended 8-bit code sets within the European Community. Mixing of code
sets cannot be done so that there are problems when trying to move between
environments using different parts of the standard (e.g. Greece, where
Part 7 is used, and the Netherlands, where Part 9 is officially
preferred).
The following IST RTD projects use this standard: M-PIRO.
Further details available from ISO and national standards bodies.
Other references Details of the ISO/IEC
8859 code sets can be obtained from http://www.dkuug.dk/i18n/charmaps. |
Section Contents
Standards
List
Index
Help

|
ISO 9036
Expanded name
ISO 9036: Arabic 7-bit coded character set for information interchange
Area covered
Defines set of Arabic characters
Sponsoring body
ISO/IEC JTC1/SC2
Source documents
- ISO
9036:1987 Information processing -- Arabic 7-bit coded character
set for information interchange
-
ISO 11822:1996 Information
and documentation -- Extension of the Arabic alphabet coded character set
for bibliographic information interchange
Characteristics/description
ISO 9036 defines the stand-alone version of Arabic character in a form
that can be used for interchange between computer systems using a 7-bit
code set.
ISO 11822 covers the use of the Arabic alphabet in bibliographic entries
Usage
Unknown.
Further details available from
ISO and national standards
bodies.
Other references
Details of the ISO 9036 code set can be obtained from
http://www.dkuug.dk/i18n/charmaps/ASMO_449. |
Section Contents
Standards
List
Index
Help

|
ISO 9541
Expanded name
ISO 9541: Font information interchange
Area covered
Provides mechanism for the interchange of information related to the
metrics and drawing of glyphs used to display characters
Sponsoring body
ISO/IEC JTC1/SC34
Source documents
- ISO/IEC 9541:1991 Information technology
-- Font information interchange
- ISO/IEC 10036:1996 Information
technology -- Font information interchange -- Procedure for registration
of font-related object identifiers
- ISO/IEC
TR 15413:2001 Information technology -- Font services -- Abstract
service definition
Characteristics/description
While other standards describe the numeric codes to be assigned to
"characters" within a computer, this standard defines how information about
the representation of these characters on a screen or a printed sheet should
be interchanged. A single coded character can have many different physical
representations, depending on the type face (font) being used. Each such
representation forms a unique "glyph".
Part 1 of the standard explains the general architecture of the font
information interchange standard. Part 2 defines the metrics used to describe
the weight, width, height, etc, of a glyph. Part 3 defines how the information
needed to generate a glyph should be interchanged, and defines ASN.1 and
SGML
interchange formats for this information.
ISO 9541 has been defined so that metric information can be interchanged
separately from the more commercially sensitive glyph generation information.
The metrics defined in Part 2 are of relevance to composition and other
software that needs to calculate the relative position of glyphs. Only
when the characters are actually being displayed/printed does access need
to be provided to the much bulkier glyph drawing information. ISO 9541
Type 1 fonts are compatible with Version 23.0 of the
Postscript
interpreter.
A register of glyph identifiers is maintained on behalf of the
ISO by the Graphics Communication Association of America (GCA). This register
is based on ISO/IEC 10036.
Usage
Not widely adopted.
Further details available from
ISO and national standards
bodies.
Other references
-
An operational model for characters and glyphs (ISO/IEC TR 15285)
-
OII Multimedia
and Hypermedia Standards Activity Report, September 1996
|
Section Contents
Standards
List
Index
Help

|
ISO/IEC 10367
Expanded name ISO/IEC 10367:
Standardized coded graphic character sets for use in 8-bit codes
Area covered Defines graphic
characters used for general purpose applications in typical office
environments
Sponsoring body ISO/IEC JTC1/SC2
Source documents ISO/IEC 10367:1991
Information technology -- Standardized coded graphic character
sets for use in 8-bit codes
Characteristics/description
Specifies a unique coded graphic character set for use as the G0 set
and a series of coded graphic character sets of up to 96 characters for
use as the G1, G2 and G3 sets defined in ISO 4873 when
shifting levels 2 or 3 are implemented. It provides a comprehensive
repertoire, including all characters from ISO/IEC 6937, 8859 Parts 1-9 and
a box character set.
Registration of character repertoires is carried out using the
procedures laid down in ISO/IEC 7350:1991.
Usage Adopted as national standard
in Austria. Uptake elsewhere restricted by limited support for ISO/IEC 2022.
Further details available from ISO and national standards bodies.
Other references Details of the ISO/IEC
10367 character set that allows box drawing characters to be used in
conjunction with ISO/IEC
8859 can be obtained from http://www.dkuug.dk/i18n/charmaps/ISO_10367-BOX. |
Section Contents
Standards
List
Index
Help

|
ISO/IEC 10538
Expanded name ISO/IEC 10538: Control
functions for text communication
Area covered Control functions
required for text in page-image format, and for mixed formatted and
formattable text
Sponsoring body ISO/IEC JTC1/SC2
Source documents ISO/IEC 10538:1991
Information technology -- Control functions for text
communication
Characteristics/description
Describes the role of ISO 6429 control
characters when used in page images or in text that has been, or is
capable of being, formatted prior to presentation. Applies to text
characters only, not graphics. The codes are defined for interchange
purposes only: they are not intended for the actual processing of text.
Usage Unknown.
Further details available from ISO and national standards
bodies. |
Section Contents
Standards
List
Index
Help

|
ISO/IEC 10646
Expanded name ISO/IEC 10646: Universal
Multiple-Octet Coded Character Set (UCS) Area covered Multilingual,
multi-octet character
set covering all major trading languages. The intent is to provide coding
for all the characters of all the scripts of the world.
Sponsoring body ISO/IEC JTC1/SC2
and ISO/IEC
JTC1/SC22 WG20
Source documents
- ISO/IEC 10646-1
Information technology -- Universal Multiple-Octet Coded Character
Set (UCS)
- Part
1: Architecture and Basic Multilingual Plane
- Part
2: Supplementary Planes
-
ISO/IEC
DIS 14651 International string ordering and comparison --
Method for comparing character strings and description of the common
template tailorable ordering
-
ISO/IEC
PRF TR 14652 Information technology -- Specification method for
cultural conventions
- ISO/IEC 14755:1997
Information technology -- Input methods to enter characters from
the repertoire of ISO/IEC 10646 with a keyboard or other input
devices
- Unicode 3.2
- RFC
2279 UTF-8, a transformation format of ISO 10646
Characteristics/description Integrates previous internationally/nationally agreed character sets into a single
code set together with additional characters to previously encoded scripts
and new, both current and ancient scripts. ISO/IEC 10646 is based on 4 octet (32-bit) coding scheme known
as the "canonical form" (UCS-4), but a 2-octet (16-bit) form (UCS-2) is
used for the Basic Multilingual Plane (BMP), where the missing two high
order octets are
assumed to be 00 00.The code set is split into 128 "groups" of 256
"planes", each containing 256
"rows" with 256 "cells" for characters. Each character is given a code
position using
multiple octets, the third (first) of which identifies the row containing
the character and the fourth (second) its cell number. The first 127 characters of the Basic Multilingual Plane (BMP)
that can be encoded in 16 bits are those of the ISO 646 International
Reference Version of ASCII. The characters forming the second half of the
first row are those used in ISO/IEC 8859-1, the
Latin-1 character set. Other rows provide encoding for:
- extended Latin characters
- the International Phonetic Alphabet (IPA)
- Greek (including accented characters, "monotoniko" and "polytoniko")
- Cyrillic, Georgian and Armenian
- Hebrew, Ethiopic
- all four forms of Arabic characters (initial, medial, final and
stand-alone)
- Indic languages, mostly used on the Indian subcontinent (including
Bengali, Devanagari, Gujarati, Gurmukhi, Kannada, Malayalam, Myanmar,
Oriya, Sinhala, Tamil and Telugu)
- Khmer, Lao, Mongolian, Thai, Tibetan
- Chinese/Japanese/Korean (CJK) unified ideographs, radicals, letters
and months; Bopomofo, Hangul syllables, Hiragana, Kangxi radicals,
Katakana, Yi and Yi radicals
- Cherokee, unified Canadian aboriginal syllabics
- Ogham, Runic, Syriac, Thaana
- currency symbols
- mathematical symbols and operators and special character forms
- box and line drawing characters, blocks and arrows
- geometric shapes and Dingbats
- special OCR characters used on cheques, Braille patterns
- encircled characters and numbers
- etc.
The planes specified in Part 2 are:
- Plane 1: SMP, Secondary Multilingual Plane
- Plane 2: SIP, Supplementary Plane for CJK Ideographs
- Plane 14: GPP, General Purpose Plane
ISO/IEC 14651 defines a "reference comparison method" that allows
programs to determine the relative order of two UCS strings. It also defines a
Common Template Table that describes an order for
all characters encoded in the first edition of ISO/IEC 10646-1 up to Amendment
7.
ISO/IEC Technical Report 14652 defines a general
mechanism to specify cultural conventions, and formats for a number of specific
cultural conventions in the areas of character classification and conversion,
sorting, number formatting, monetary formatting, date formatting, message
display, addressing of persons, postal address formatting, and telephone number
handling.
Usage This standard has become the basic coding form
for all 16 and 32-bit computer systems.
Users of Internet Explorer 5, and XLink-aware XML browsers, can
obtain more details about applications of ISO
10646 from our Diffuse Topic Map
service.
Further details available from ISO and national standards bodies.
Other references Details of the
Unicode standard, the repertoire and coding of which are identical to
those of the ISO/IEC
10646 code set can be obtained from http://www.unicode.org.
- European Ordering Rules (for the Multilingual European Subsets
of ISO/IEC 10646-1, CWA 13873:2000) have been published by CEN TC304 as ENV
13710:2000. More information can be found at: http://www.stadlar.is/TC304/EOR/eorhome.html.
- Requirements for String Identity Matching and String
Indexing for ISO 10646 coded documents
- OII
Standards and Specifications Activity Report, July 1998
- New languages to be covered in next edition
- OII
Standards and Specifications Activity Report, October 1998
- Unicode Consortium report on EBCDIC-Friendly UCS
Transformation Format
- OII Standards
and Specifications Activity Report, December 1998
- Unicode 3.0 to be based on 2nd Edition of ISO 10646
- OII Standards
and Specifications Activity Report, August 1999
- Unicode in XML and other Markup Languages
- OII Standards
and Specifications Activity Report, September 1999
- Unicode 3.0 published
- Information
Management Standardization Activity, March 2000
- Character Normalization in IETF Protocols
- Information
Management Standardization Activity, September 2000
- ISO 10646-1:2000 published
- Information
Management Standardization Activity, October 2000
- Unicode in XML and other Markup Languages
- Information
Management Standardization Activity, December 2000
- Character Model for the World Wide Web 1.0
- Unicode 3.1
- Information Management
Standardization Activity, January 2001
- Unicode 3.1
- Information
Management Standardization Activity, March 2001
- Use for e-business standardization
- Electronic Commerce Interoperability
Report,
December 2001
- Use within Character
Model for the World Wide Web
- Information
Management Standardization Activity, December 2001
- Unicode 3.2
- Information
Management Standardization Activity, June 2002
- UTF-8 specification updated to conform to Unicode 3.2
- Information
Management Standardization Activity, October 2002
|
Section Contents
Standards
List
Index
Help

|
JIS X 0201
Expanded name
Japanese Industrial Standard Code for Information Interchange
Area covered
Interchange of Latin and Katakana characters
Sponsoring body
JISC - Japanese Industrial
Standards Committee
Source documents
JIS X 0201:1976 (reaffirmed 1984)
Code for Information Interchange
(published in Japanese and English)
Characteristics/description
Provides 7-bit and 8-bit code sets for Latin characters (based on
ISO
646) and the simple Katakana letters used to aid phonetic interpretation
of Kanji ideograms. (Katakana is used for teaching Japanese children to
read.)
In 7-bit environments the SO (0/14) and SI (0/15) codes are used to
switch from the Latin to the Katakana code set. In 8-bit environments the
Katakana characters form the right-hand sector (11/1 to 13/15).
Usage
Used to transfer Japanese information between early Japanese computer
systems.
Further details available from
Japanese Industrial Standards Committee, c/o Standards Department,
Ministry of International Trade and Industry, 1-3-1 Kasumigaseki, Chiyoda-ku,
Tokyo 100, Japan.
Other references
Details of the JIS X 0201 code set can be obtained from
http://www.dkuug.dk/i18n/charmaps/JIS_X0201. |
Section Contents
Standards
List
Index
Help

|
JIS X 0202
Expanded name
Extension techniques for use with the Code for Information Interchange
Area covered
Switching of Japanese character code sets
Sponsoring body
JISC - Japanese Industrial
Standards Committee
Source documents
JIS X 0202
Extension techniques for use with the Code for Information
Interchange (published in Japanese and English)
Characteristics/description
Japanese equivalent of ISO 2022.
Usage
Used on 8-bit Japanese word processors to call in multiple character
sets.
Further details available from
Japanese Industrial Standards Committee, c/o Standards Department,
Ministry of International Trade and Industry, 1-3-1 Kasumigaseki, Chiyoda-ku,
Tokyo 100, Japan. |
Section Contents
Standards
List
Index
Help

|
JIS X 0208/0212
Expanded name
Code of the Japanese Graphic Character Set for Information Interchange
Area covered
Interchange of Latin, Kanji, Hiragana and Katakana characters
Sponsoring body
JISC - Japanese Industrial
Standards Committee
Source documents
- JIS X 0208:1990
Code for the Japanese
graphic character set for information interchange (published in Japanese
and English)
-
JIS X 0212:1990 Code of the supplementary Japanese graphic character
set for information interchange
Characteristics/description
Multiplane standard providing access to 6353 Kanji ideographs, 86 Katakana
character and sound identifiers, 83 Hiragana character and sound identifiers,
52 Roman, 48 Greek and 66 Cyrillic letters, together with associated numeric,
punctuation and line drawing codes.
Usage
Whilst only providing access to a portion of the extensive Japanese
ideograph set this standard is used by many Japanese word processing and
general computing systems.
Further details available from
Japanese Industrial Standards Committee, c/o Standards Department,
Ministry of International Trade and Industry, 1-3-1 Kasumigaseki, Chiyoda-ku,
Tokyo 100, Japan. |
Section Contents
Standards
List
Index
Help

|
JIS X
0213
Expanded name
7-bit and 8-bit double byte coded extended Kanji sets for information
interchange
Area covered
Interchange of Kanji characters
Sponsoring body
JISC - Japanese Industrial
Standards Committee
Source documents
JIS X 0213-2000: 7-bit and 8-bit double byte coded extended Kanji sets
for information interchange
Characteristics/description
JIS X 0213 specifies 11,223 characters and their bit combinations. The
characters consist of an extension to the coded character set of JIS
X 0208
with an additional 4344 characters. JIS X 0213 also defines two
implementation levels, level 3 and level 4, in addition to the two defined in
JIS X 0208
Usage
Used in Japanese word processors.
Further details available from
Japanese Industrial Standards Committee, c/o Standards Department,
Ministry of International Trade and Industry, 1-3-1 Kasumigaseki, Chiyoda-ku,
Tokyo 100, Japan. |
Section Contents
Standards
List
Index
Help

|
OCR
Expanded name
Optical Character Recognition
Area covered
Coding of machine readable characters by defining the repertoire and the
related glyphs
Sponsoring body
ISO/IEC JTC1/SC31
Source documents
- ISO 1073 Alphanumeric character sets for
optical recognition
- Part 1: Character set
OCR-A -- Shapes and dimensions of the printed image
- Part 2: Character set
OCR-B -- Shapes and dimensions of the printed image
- JIS X9003:1980 Katakana character set for optical recognition
- JIS X9005:1979 Handprinted Katakana characters for optical character
recognition
- JIS X9006:1979 Handprinted numerals for optical character recognition
- JIS X9007:1981 Handprinted alphabets for optical character recognition
- JIS X9008:1981 Handprinted symbols for optical character recognition
- JIS X9009:1991 Handprinted Hiragana characters for optical character
recognition
- JIS X9010:1984 Coding of machine readable characters (OCR and MICR)
Characteristics/description
Limited character sets that are designed to be machine readable. OCR-A provides
numbers and other characters needed for automated cheque handling. OCR-B allows
alphabetic characters to be used in machine-readable data. The Japanese
Industrial Standards (JIS) committee has a number of extensions for the
recognition of Japanese characters and for the recognition of handwritten
numbers, characters and symbols.
The revision decided upon in June 1995 by ISO/IEC
JTC1/SC2 to extend the OCR-B glyph set to include a range of accented and
related characters has been cancelled and the transfer of the maintenance of ISO
1073 (together with ISO 1831 - Printing Specifications) from SC2 to SC31 has
been completed in May 2001.
CEN TC304 is working
on an ENV to extend the normative OCR-B repertoire of ISO 1073-2 by the
Euro sign and also add some of the European characters from the previous SC2
attempt to an informative annex of the ENV. Once completed, this ENV is
envisaged to be fast-tracked for a revision of the ISO standard.
Usage
Widely used to enable accurate machine scanning of information. With the current
progress in character recognition technologies, however, considerable
flexibility is available for the scanning of particularly less critical
information.
Further details available from
ISO and national
standards bodies.
Other references
- Proposed addition of Euro to OCR-B
- OII
Standards and Specifications Activity Report, July 1998
|
Section Contents
Standards
List
Index
Help

|
Other Character Sets
Area covered
Machine readable characters for non-European languages not covered
elsewhere
Sponsoring body
Various national standards bodies
- CNA GB-2312-89 Code of Chinese ideograms
for information interchange -- Basic set
-
CNA GB-7590-87 Code of Chinese ideograms for information interchange
-- 4th supplementary set
-
CNA GB-8565-88 Coded character set for text communication
-
CNA GB-12345-90 Code of Chinese ideograms for information interchange
-- Supplementary set
-
IS 13194:1991 Indic Script Code for Information Interchange (ISCII)
-
KS C5601-1992 Code for information interchange (Korean)
-
KS C5636-1993 Code for information interchange (Latin characters)
-
KS C5627-1991 Extension code sets for information interchange
-
MS 1362:1983 Jawi character set (Malaysian)
-
TIS 620:1990 Thai character codes for computers
Characteristics/description
Character sets whose use is normally specific to one or two countries.
Usage
Used within local markets. Often form the basis of an ISO
10646 code plane.
Further details available from
National standards body |
File
last updated:
October 2002 |
The
Diffuse Project is funded under the European Commission's Information
Society Technologies programme. Diffuse publications are maintained
by TIEKE (the Finnish Information Society Development Centre), IC Focus and The
SGML Centre. |