Multilingual/programmer keyboard layout

From WikiGagneLague
Jump to: navigation, search


Choosing a sensible keyboard layout is no easy feat if your everyday tasks include both programming and writing texts in a language other than English. A first difficulty comes from the fact that the standard US English QWERTY layout, whilst very efficient for programming thanks to the location of the non-alphanumeric characters, cannot be used to type accented letters and many other characters found in Western alphabets.

An alternative exists in the form of the US International QWERTY layout, where some characters (for example backtick and apostrophe) are replaced by dead keys for the composition of accented letters. The problem with this layout is that it is no longer suitable for programmation, as the dead keys make it harder to type some characters (again, backtick and apostrophe for instance) that are very important for programmers.

A number of keyboard layouts change the position of some special characters (notably /, ?, [, ], {, }, \ and |) to make it easy to type additional characters, accented letters, diacritics and such. Whilst these layouts are usually efficient for typing a text in a specific language, their usefulness is limited. First, a given layout is unlikely to adapt well to more than one language (not counting English) as the additional characters and accented letters used are likely to be different in those two (or more) languages. Second, the relocalisation of the special characters (/, \, [ and ] for instance) make it hard to write in the many programming languages that make ample use of them. Third, as most of the computing world uses the US English QWERTY layout, you might to forced to switch between layouts when working on different computers, and that is annoying.

Another solution is to configure two or more keyboard layouts on your system, and to switch between them on the fly, using the US English QWERTY layout when programming, and another layout when typing text in a language other than English. Switching back and forth between layouts all the time may be confusing, however, and will quickly get tiresome.

In this page, I describe a fourth alternative to solve the keyboard layout problem. The proposed solution has the following features:

  • The positions of the letters, digits and special characters are identical to those of the US English QWERTY layout.
    • It is easy to adapt to the new layout.
    • Programming is not made any harder.
    • There should be no problem switching between this layout and US English QWERTY.
  • Additional characters are all produced using one dead key located on the home row of the right hand.
    • Language-specific characters are easy to type.
    • A great number (theoretically unlimited) of language-specific characters may be added to the layout.
    • The layout is suitable for any language that uses the latin alphabet.
  • It uses the Xmodmap and XCompose features of X.Org-X11.
    • Works only under UNIX systems using X.Org-X11.
    • Does not require root access to the system.

I will not discuss alternatives to the QWERTY basic layout in this document, for three reasons. The first is that QWERTY is so widespread that adapting to another layout is too difficult to be worth it, in my opinion. The second is that such variants are usually optimised for a given language, and therefore useless when you have to type in many. The third is that the technique proposed here is independent of the basic keyboard layout. If you use an alternative basic layout, you should be able to adapt the following instructions easily.

Building the framework

We will now put in place the basic tools to develop new efficient keyboard layouts that behave like the US English QWERTY layout. We will do this using tools from the X.Org X11 implementation. Whilst the following instructions may also work with other X11 implementations, such as XFree, I have only tested this setup on GNU/Linux/X.Org systems. Of course, the keyboard layout will only function inside X sessions (it will not work in the Linux console, for instance).

Available tools

We could develop a new XKB layout from scratch or by modifying the existing US layout. However, modifying the server's XKB files is bad practice, requires root access, and needs you to be careful not to overwrite your modifications when updating configuration files. Making new layouts from scratch in your home directory is not documented at all, and might not even be possible, for all I know. The good news is that we do not need to do it. Since our layouts will not vary much from the original US layout, the Xmodmap utility (that can be used to remap keys) will be enough for us. Finally, the XCompose utility will allow us to fake additional keys and make very powerful layouts.

Before starting, make sure you use the US keyboard layout in your X sessions (by editing either your personal settings or the system's defaults). If setting the system defaults, your xorg.conf file would contain:

Section "InputDevice"

    Identifier  "Keyboards"
    Driver      "kbd"

    Option      "XkbModel"   "microsoftpro"
    Option      "XkbLayout"  "us"

Xmodmap and the Mode_Switch key

The first step in designing our framework is to activate the AltChar key so we can use XModmap to bind additional characters to the existing keys. To do so, you need to use the grp:switch XkbOption. If setting the system defaults, your xorg.conf file would contain:

Section "InputDevice"

    Identifier  "Keyboards"
    Driver      "kbd"

    Option      "XkbModel"    "microsoftpro"
    Option      "XkbLayout"   "us"
    Option      "XkbOptions"  "grp:switch"

The AltChar key is now the group switch modifier. If you maintain it pressed, you can use the other keys on the keyboard to produce alternate characters. To configure what characters are available, create and edit a file named .Xmodmap, located in your home directory. This file contains a list of key mapping commands. The xmodmap man page will tell you all you need to know about the syntax of these commands. As an example, the following command would remap the d key to produce the small and capital variants of the greek letter delta.

keysym d = d D Greek_delta Greek_DELTA

There is no list of the available keysyms anywhere as far as can tell, but you can use grep on the files in /usr/lib/X11/xkb to find the exact name of most interesting keysyms.

Using XCompose and a dead key to fake a keyboard layout

Using AltChar in combination with another key is useful for those symbols you do not use often, but not so much for those you frequently have to type (accented letters in the French language, for instance). The reason is AltChar is at an awkward place on the keyboard; reaching for it everytime you need a non-English letter will slow your typing down considerably. Also, having to maintain the key pressed while reaching for a second one means having to stretch you hand in a suboptimal manner.

What alternative do we have? Using modifiers is impossible because of their akward locations on the keyboard and because we have to keep them pressed while typing, and we do not want to move characters around to make place for new ones, as our layout must be consistent with the US English layout. What we need is a conveniently placed key that would act as a dead key. By pressing this key, then another key, we would obtain the desired character. The obvious problem is all keys are already assigned to characters. We need to remap a key so it acts as a dead key, while at the same time keeping the behavior of the US layout unchanged. It sounds impossible...

The solution is to use the semicolon key as that new dead key. We can do it without changing the behavior of the US English keyboard layout by taking advantage of some interesting properties of the semicolon usage in texts and source code:

  • Whenever a semicolon is used in a text, it is followed by a space. (This is valid for all Western languages, as far as I know.)
  • Whenever a semicolon is used in a programming language, it either is followed by a space or is the last character on the line. (This is valid for shell scripting, C-style languages, etc.)

This means that when a semicolon is typed, it is almost certain that it will be followed by a space or a new line character. We therefore can use the semicolon as a dead key as long as we make sure that typing a semicolon followed by a space yields a semicolon followed by a space, and that typing a semicolon followed by pressing the return key yields a semicolon followed by a new line character. (In the unlikely event that we want to type just a semicolon (not followed by a space or new line), the semicolon key will have to be pressed twice in a row. This is the only non-compatible difference between our layout and the US English QWERTY layout.)

We can do all this using the Xmodmap and XCompose facilities of X.Org.

First, we need to remap the semicolon to a dead key using Xmodmap. Add the following line to your .Xmodmap file:

keysym semicolon = dead_hook colon

You can choose the dead key to use freely. I used dead_hook since I never use that character. Now that the dead key is set up, we need to have the semicolon + space and semicolon + return sequences yield the proper characters.

We can do this using a custom XCompose file. Strictly speaking, XCompose has nothing to do with the keyboard layout. The only thing it does is intercept sequences of characters that were typed at the keyboard, and replace them with other sequences. It is the mechanism by wich typing a dead key followed by another key produces only one character, usually an accented letter. For example, the following rules are used in some French layouts to produce the letters à and À:

<dead_grave> <a> : "à"
<dead_grave> <A> : "À"

To make your own compositions, first copy a compose file to your home directory. Since I use the UTF-8 encoding, I copied /usr/lib/X11/locale/en_US.UTF-8/Compose. Choose a compose file appropriate for your encoding. You need to remove all mentions of the chosen dead key. For example, you could use:

grep -v '<dead_hook>' /usr/lib/X11/locale/en_US.UTF-8/Compose > ~/.XCompose

Next, add the rules for the semicolon + space, semicolon + return and semicolon + semicolon sequences to that file:

<dead_hook> <space> : "; "
<dead_hook> <Return> : ";\n"
<dead_hook> <dead_hook> : ";"

The \n escape sequence produces a new line character.


At this point, you should have configured your X session to use the US English layout, and set AltChar as the switch modifier. In xorg.conf, this translates to:

Section "InputDevice"

    Identifier  "Keyboards"
    Driver      "kbd"

    Option      "XkbModel"    "microsoftpro"
    Option      "XkbLayout"   "us"
    Option      "XkbOptions"  "grp:switch"

You should also have remapped the semicolon to a dead key in your .Xmodmap:

keysym semicolon = dead_hook colon

Finally, you should have copied a compose file to your home directory as .XCompose, removed all references to the dead key chosen above from that file, and added the following lines to take care of the semicolon + space, semicolon + return and semicolon + semicolon sequences:

<dead_hook> <space> : "; "
<dead_hook> <Return> : ";\n"
<dead_hook> <dead_hook> : ";"

Your keyboard has not changed much. Typing a semicolon followed by a space or by the return key has the expected behavior, and typing a semicolon twice generates a single semicolon. You now have a new dead key, conveniently located on the home row of your right hand. By adding new rules to your .XCompose file, you can use this dead key to generate any character (or sequence) you want. You have also activated the AltChar key. Adding new mappings to your .Xmodmap file is another way to extend your personal layout with new characters.

Designing a layout


Now that the framework is in place, it is up to you to design a keyboard layout that meets your needs. As an example, here is the configuration I use, which allows me to type in French and to use a few mathematical/scientific symbols. I have the following rules in my .XCompose:

# Diacritics

<dead_hook> <a> : "à"
<dead_hook> <A> : "À"
<dead_hook> <s> : "â"
<dead_hook> <S> : "Â"
<dead_hook> <d> : "ä"
<dead_hook> <D> : "Ä"

<dead_hook> <f> : "é"
<dead_hook> <F> : "É"
<dead_hook> <e> : "è"
<dead_hook> <E> : "È"
<dead_hook> <r> : "ê"
<dead_hook> <R> : "Ê"
<dead_hook> <t> : "ë"
<dead_hook> <T> : "Ë"

<dead_hook> <i> : "ì"
<dead_hook> <I> : "Ì"
<dead_hook> <j> : "î"
<dead_hook> <J> : "Î"
<dead_hook> <n> : "ï"
<dead_hook> <N> : "Ï"

<dead_hook> <o> : "ò"
<dead_hook> <O> : "Ò"
<dead_hook> <k> : "ô"
<dead_hook> <K> : "Ô"
<dead_hook> <m> : "ö"
<dead_hook> <M> : "Ö"

<dead_hook> <u> : "ù"
<dead_hook> <U> : "Ù"
<dead_hook> <h> : "û"
<dead_hook> <H> : "Û"
<dead_hook> <b> : "ü"
<dead_hook> <B> : "Ü"

<dead_hook> <c> : "ç"
<dead_hook> <C> : "Ç"

# Ligatures

<dead_hook> <y> : "œ"
<dead_hook> <Y> : "Œ"
<dead_hook> <g> : "æ"
<dead_hook> <G> : "Æ"

# Punctuation

<dead_hook> <q> : "«"
<dead_hook> <w> : "»"
<dead_hook> <period> : "…"
<dead_hook> <Tab> : " " # Non breaking space

# Mathematical, scientific and misc characters

<dead_hook> <minus> : "÷"
<dead_hook> <equal> : "×"
<dead_hook> <plus> : "?"
<dead_hook> <underscore> : "±"
<dead_hook> <8> : "•"
<dead_hook> <0> : "°"
<dead_hook> <5> : "‰"
<dead_hook> <asterisk> : "?"
<dead_hook> <4> : "¢"
<dead_hook> <dollar> : "€"
<dead_hook> <p> : "¶"

To be able to type Greek letters easily, I added the following to my .Xmodmap:

keysym a = a A Greek_alpha
keysym b = b B Greek_beta
keysym g = g G Greek_gamma Greek_GAMMA
keysym d = d D Greek_delta Greek_DELTA
keysym e = e E Greek_epsilon
keysym z = z Z Greek_zeta Greek_ZETA
keysym h = h H Greek_eta Greek_ETA
keysym o = o O Greek_theta Greek_theta
keysym k = k K Greek_kappa
keysym l = l L Greek_lamda Greek_LAMDA
keysym m = m M mu
keysym n = n N Greek_nu Greek_NU
keysym f = f F Greek_xi Greek_XI
keysym p = p P Greek_pi Greek_PI
keysym r = r R Greek_rho
keysym s = s S Greek_sigma Greek_SIGMA
keysym t = t T Greek_tau
keysym u = u U Greek_upsilon Greek_UPSILON
keysym v = v V Greek_phi Greek_PHI
keysym x = x X Greek_chi Greek_CHI
keysym y = y Y Greek_psi Greek_PSI
keysym w = w W Greek_omega Greek_OMEGA

Character "groups"

If you feel limited by the number of easily accessible keys you can combine with your new dead key, you can extend that number by using compose sequences of more than two characters. For instance, the diacritics in the above example could be grouped to free many keys which could then be reused for other purposes. Here is an example (XCompose rules) for grouping French diacritics:

<dead_hook> <g> <a> : "à"
<dead_hook> <g> <A> : "À"
<dead_hook> <g> <e> : "è"
<dead_hook> <g> <E> : "È"
<dead_hook> <g> <i> : "ì"
<dead_hook> <g> <I> : "Ì"
<dead_hook> <g> <o> : "ò"
<dead_hook> <g> <O> : "Ò"
<dead_hook> <g> <u> : "ù"
<dead_hook> <g> <U> : "Ù"

<dead_hook> <v> <s> : "â"
<dead_hook> <v> <S> : "Â"
<dead_hook> <v> <r> : "ê"
<dead_hook> <v> <R> : "Ê"
<dead_hook> <v> <j> : "î"
<dead_hook> <v> <J> : "Î"
<dead_hook> <v> <k> : "ô"
<dead_hook> <v> <K> : "Ô"
<dead_hook> <v> <h> : "û"
<dead_hook> <v> <H> : "Û"

<dead_hook> <y> <d> : "ä"
<dead_hook> <y> <D> : "Ä"
<dead_hook> <y> <t> : "ë"
<dead_hook> <y> <T> : "Ë"
<dead_hook> <y> <n> : "ï"
<dead_hook> <y> <N> : "Ï"
<dead_hook> <y> <m> : "ö"
<dead_hook> <y> <M> : "Ö"
<dead_hook> <y> <b> : "ü"
<dead_hook> <y> <B> : "Ü"

Extending the framework

The above suggestions should allow you to develop programs and write texts in any Western language with minimal efforts. Still, there is provision in Xmodmap and XCompose to do a lot more.

Working with the Vim editor

That Escape key is out of reach On most keyboard, the Escape key, along with all function keys, is split from the rest of the keyboard and not at all easy to reach. In Vim, the Escape key is of utmost importance. Here is a trick to make it more accessible. First, add the following to your Xmodmap mappings:

keycode 49 = Escape asciitilde

Then, add the following to your XCompose rules:

<dead_hook> <Escape> : "`"

You can now use the backtick key as an escape key, which I find very convenient. To type a backtick, just use the magical dead key (semicolon), followed by the backtick key. One might argue that since this is Vim specific, it would be more sensible to define it as a Vim mapping. However, I use the escape key in lots of situations (closing dialog boxes and some programs), so having it mapped globally in all X programs makes sense for me.

What key can I map this command to? There are so many keyboard commands in Vim that finding a simple mapping for your own custom commands is no small feat. A workaround is to use XCompose to define "dummy" characters that can be mapped to Vim commands. As an example, here is how I copy paste between Vim sessions and other programs using the X11 clipboard. (To have access to the X11 keyboard from the editor, Vim must be compiled with X11 support (the vim-with-x USE flag in Gentoo Linux)). The Vim commands are mapped to the following keys:

Cut: "+x
Copy: "+y
Paste: "+gP

These mappings are horrible, as you need to hold down Shift, press two keys, release Shift, and press another key merely to copy or cut some text. (It is even worse when you want to paste something.) I get around this problem by defining the following dummy characters in my .XCompose:

<dead_hook> <z> : "?"
<dead_hook> <x> : "?"
<dead_hook> <v> : "?"

I then map them to the cut/copy/paste commands in my .vimrc:

map ? "+y
map ? "+x
map ? "+gP

I can now use the following to easily transfer text to and from Vim:

Cut: MagicDeadKey + x
Copy: MagicDeadKey + z
Paste: MagicDeadKey + v