WGL Assistant v1.1
The Multilingual Font Manager


WGL Assistant allows convenient use of the multilingual (Unicode/WGL4) TrueType and OpenType/TTF fonts in all MS Windows applications.

  1. Character encoding
  2. 1.1. Codepage soup
    1.2. Unicode
  3. Fonts
  4. 2.1. TrueType fonts
    2.2. Single-codepage fonts
    2.3. Multilingual fonts
    2.3.1. Unicode fonts
    2.3.2. WGL4 fonts
  5. Applications
  6. 3.1. Using Unicode
    3.2. WGL4 and codepage selection
    3.3. Font substitutes


I. Introduction

1. Character encoding ^

1.1. Codepage soup ^

Most computer applications and operating systems can easily access only 256 characters at a time. This is so because they originally used 8 bits (one byte) which makes 256 different combinations (28=256). Text is stored in computer memory as strings of codes, i.e. text is encoded. Every character (letter, digit, special character) is represented by a number, e.g. the letter "A" is usually stored as 65. At the very beginning computers could work with 128 different combinations (one bit was reserved for other purposes). Meanwhile the American National Standard Commitee (ANSI) developed the American Standard Code for Information Interchange (ASCII), which was a standarised encoding table (a codepage) covering all uppercase and lowercase English letters, digits, punctuation characters, as well as some special and control characters.

Engineers soon noticed that 128 codes are not sufficient for all characters. Even the diactirical marks of the Western European languages could not be covered. The ASCII codepage covered 128 characters, but newer computer systems were able to work with 256 codes. So the standard commitees (ANSI, ISO) and computer companies (IBM, Apple, Microsoft) started extending the ASCII codepage with various character sets. The complementary 128 codes had been filled with graphical symbols, mathematical signs, Western European diacritical marks etc. Each organization elaborated their own "standards".

The characters for the most Western European languages have been ordered into several codepages. ANSI and Microsoft invented the codepage 1252 (ANSI Latin-1), the International Standards Organization (ISO) established the ISO 8859-1 (ISO Latin-1), IBM developed the codepage 850 (IBM Latin-1), Apple created the Macintosh Roman character set, etc.

Because of the "iron curtain", the countries in the Central and Eastern Europe were initially separated from the Western world. Nevertheless, each CEE country started to modify the Western codepages by adding their own diacritical marks. So-called local "quasi-standards" have been developed. Russians invented KOI-8, in Poland Mazovia became most popular, yet there were others also.

Political changes in the early 90s had forced Western companies and commitees to quickly develop codepages for the "new" markets. As a result, the major regions of Central Europe (Poland, former Czechoslovakia, Hungary, former Yugoslavia), Baltic area (Lithuania, Latvia, Estonia) and Cyrillic area (former Soviet Union) have been defined. Each of these regions received its own developer-specific codepages.

For example, Polish, Czech, Slovak, Hungarian, Albanian, Romanian, Croatian and Slovenian languages have been artificially grouped into Central European codepages (also called Eastern European or Latin-2). ANSI developed CP 1250 (ANSI Latin-2), ISO had the ISO 8859-2 (ISO Latin-2), IBM invented CP 852 (IBM Latin-2), Apple created Macintosh CE and so on.

1.2. Unicode ^

The Unicode standard is a character coding system designed to support written texts of diverse modern, classical and historical languages. It's based on double-byte character encoding, so it can enumerate 65,536 characters. It's hopefully the "one and only" future standard, and it may solve the "codepage soup" problem. Unicode is compatible with the ISO 10646 standard. The current version 2.1 includes 38,887 coded characters used in written languages of the Americas, Europe, the Middle East, Africa, India, Asia, and Pacifica.

2. Fonts ^

2.1. TrueType and OpenType fonts ^

TrueType fonts used in Microsoft Windows 3.1 initially contained only the characters from the Latin-1 (CP 1252) codepage. But the TrueType font technology allows many more characters in a single font. Microsoft decided to encode the characters according to the Unicode standard. But most of the applications were not yet programmed to work with such fonts. Even if a font contained many characters, the applications could only use 256 characters of each font.

In Microsoft Windows 2000, OpenType fonts will be used. The OpenType font format is an extension of the TrueType font format, adding support for PostScript font data. The format was developed by Microsoft and Adobe.

OpenType fonts may contain TrueType or PostScript outlines (so Type 1 won't be necessary anymore), but the general structure of an OpenType font file is that of a TrueType font.

2.2. Single-codepage fonts ^

In Microsoft Windows 3.1, characters for each codepage had to be stored in a separate font. A user desiring to switch from English to Cyrillic to Greek while typing would have to choose three different fonts: Times New Roman, Times New Roman Cyr, and Times New Roman Greek.

The non-Latin-1 fonts were encoded in Unicode, but each font contained only the character set used in the specific codepage. Microsoft used a trick, so the applications could see the non-Latin-1 fonts just as if they were normal Western fonts. Unfortunately, this trick was not published and many font developers used different ways to get international characters into TrueType fonts.

ANSI codepages used:
1252 Latin-1 (Western) 1253 Greek
1250 Latin-2 (Central Europe) 1254 Turkish
1251 Cyrillic 1257 Baltic

2.3. Multilingual fonts ^

2.3.1. Unicode fonts ^

In Microsoft Windows NT 3.1, some Unicode functionality was included in the system. Thus a particular application could address all characters contained in a TrueType font, not only the first 256 of them. Unfortunately few developers wrote compliant programs, so Unicode didn't become popular at that time. Nevertheless, Windows NT was the first Microsoft system which included multilingual TrueType fonts, containing characters for more than one codepage.

2.3.2. WGL4 fonts ^

In 1995, Microsoft Windows 95 was introduced. It supported a subset of Unicode which included characters required by Western, Central, and Eastern European writing systems, plus characters required by Greek and Turkish. This so-called PanEuropean character set contained 652 characters and was called WGL4: Windows Glyph List 4.

Fonts which were shiping with Windows 95 included the full WGL4 character set (all Western and Central European languages, Cyrillic, Greek, Baltic and Turkish alphabets). Later, Microsoft and other vendors released fonts which included a smaller or larger character set. For instance, Trebuchet MS contains characters from the Western, Central European and Turkish codepages, thus not covering the full WGL4 set. On the other hand, there are fonts like Bitstream Cyberbit (over 29,000 glyphs) or Arial Unicode MS (over 51,000 glyphs), covering most of the Unicode standard.

The character set included in a font can be examined using the free Microsoft Font Properties Extension

MS Font Properties Extension

3. Applications ^

There are various ways in which applications may access the extended characters (>256) included in the multilingual TrueType fonts.

3.1. Using Unicode ^

Applications which are Unicode-compliant use double-byte text encoding to store the text and may address the extended characters directly. Currently there are only few such applications (e.g. Microsoft Word 97 and Word 2000, Adobe InDesign, MGI Calamus Publisher 2.0) yet the number is slowly growing.

3.2. WGL4 and codepage selection ^

To make the use of multilingual fonts in non-Unicode applications easier, Microsoft introduced a mechanism which allowed the user to choose a single WGL4 font, and change codepages as needed. In order to use this codepage selection mechanism, applications were supposed to use the system font selector.

Windows Font Selector

Alternatively, applications could use a special form of font enumeration, listing all TrueType fonts including the script name in the brackets. This method is used by Windows WordPad.

WordPad Font Enumeration

Unfortunately, not all applications use the system font selector or the WordPad-like font enumeration. They use the custom font selection dialogs and the old Windows 3.1-like font enumeration (e.g. Microsoft Word 7.0, Adobe PageMaker 6.5, Adobe Photoshop 5.0, QuarkXPress 4.0, Corel DRAW! 9.0).

3.3. Font substitutes ^

Most applications still use the old Windows 3.1-style font enumeration, listing the installed system fonts including so-called font substitutes (or: font aliases). These are the entries in the [FontSubstitutes] section of the Windows 95/98 win.ini file or the Windows NT 4.0 registry.

Although the font aliasing mechanism is quite primitive and hacky (every change requires a Windows restart), there is one interesting feature about this mechanism. By entering a line:

Arial CE,238=Arial,238
a "virtual font" named Arial CE is being installed, which contains all characters of the Central European script of the Arial font. Non-WGL4 applications can now access the Polish, Czech, Slovak etc., characters in the Arial CE "virtual" font using the codes of the Windows 1250 codepage.

Analogically, following entries may be made:

Arial Cyr,204=Arial,204
Arial Greek,161=Arial,161
Arial Baltic,186=Arial,186
Arial Tur,162=Arial,162

The mysterious numbers were initially intended to be used only by Microsoft and were never supposed to be published. Thus, they were chosen based on the visual associations of the hexadecimal notation:
DEC HEX Meaning Codepage
238 EE Eastern Europe 1250
204 CC CyrilliC 1251
186 BA BAltic 1257
Later, more values have been added without such "linguistic" associations.

Last modified on July 24, 1999 ^