[Introduction|Browsing in Japanese|Operating Instructions|Translating Text|Dictionary
Files|Links|Examples|Verb Conjugations|Codes|Copyright|FAQ|What's New|History|Planned Improvements|Known
Bugs|Technical Bits|Backdoor Entry|Acknowledgements]
||Jim Breen's WWWJDIC Server
Last updated: 25 January 2003
(You can jump straight to WWWJDIC).
Welcome to WWWJDIC, the dictionary server I have developed
to enable direct
WWW access to the Japanese-English dictionary files I have compiled or
collected, and to provide
some of the functionality on the WWW of the various dictionary
search engines with which I have been associated. The server is called
WWWJDIC because it is a member of the JDIC/xjdic/MacJDic family of
It was based on code from the Unix/Linux xjdic program, but has a
rewritten front-end to drive HTML forms.
WWWJDIC operates at several mirror sites around the globe.
All sites carry identical information. Check here
for the location of the nearest mirror site.
As WWWJDIC provides no support for the display of Japanese words in a
romanized form (Romaji),
you will require some capability for displaying Japanese kana and
kanji. The best way to do this is to install the appropriate Japanese
fonts and set your
browser to use them. Most modern browsers support that facility.
If you do not wish to do that, you may access
WWWJDIC via a special server that will send out bit-mapped versions
of Japanese characters (see below.)
If you are a Unix/Linux person using Mozilla, Netscape, Galeon, etc.
all you have to do is
make sure that a Japanese font file has been installed in the
correct directory (e.g. /usr/X11R6/lib/X11/fonts/misc).
Recent releases of Linux come with this included.
You may have to make sure mkfontdir has been run too.
You will then have to make sure that the browser knows to use this font
when it encounters Japanese text. This is done (e.g. in Netscape) via the
Edit/Preferences/Appearance/Fonts menu. If the WWW page is correctly
marked as using Japanese, any Japanese text should appear immediately.
Many WWW pages are not marked correctly, so you may have to to turn on
Japanese viewing via the View/Character Set/Japanese (autodetect) menu.
(Note that some Unix/Linux browsers do not allow
input of Japanese via input methods such as kinput2. I use
Mozilla, which does support kinput2.
For Windows users, probably the best method is to make sure a
Japanese True-Type Font (ttf) has been installed on your system,
and set your browser to use it. The Monash ftp archive has two
Microsoft Japanese fonts available: Gothic
Mincho. These are both self-installing executable files. Once a
font has been installed, you need to tell your browser to use that
font for Japanese text. In Netscape this would be done via the
Edit/Preferences/Appearance/Fonts menu. As ever, you will probably
need to restart Windows to make it work.
Windows users also have a more complete solution which is to
install the language support Windows Update from
Microsoft. It has become hard to find from that page, but
fortunately it appears also to be available
This brings in the Japanese Language Support and Japanese
Input Method Editor which allow users to view and input Japanese
with reasonable ease. The IME works with MS-IE and from V4.72 also
works with Netscape.(Note that even if you have no intention of
using IE, you may need to have it installed in order to be able to
install the IME.) Later versions of Windows based on NT (2000, XP)
come with fonts and an IME already.
Another alternative for Windows users is to install a package
which traps and converts Japanese characters. A good one is Hongbo
NJWIN program. His NJCOMM
program extends this to providing Japanese input as well.
Macintosh users have various ways of browsing in Japanese. For
example, there is Apple's Multilingual Internet Access, which is
available as a Custom installation from the Macintosh OS 8.5 CD-ROM.
OS 9 comes with the Apple
Language Kit built-in. There is some useful summary information
on Asian Languages and Macs at the
University of Sydney.
If you do not want, or cannot operate a full Japanese
environment for your browser, you can access WWWJDIC via another
server which will insert bit-mapped graphic characters as required.
One such server is available on the Monash site here.
These will be minimal, as I have tried to make the operation of WWWJDIC
as intuitive as possible.
There is an FAQ section at the back of this page as people
keep asking me things.
Care is needed with the form of Romaji used for input.
WWWJDIC expects "wapuro
romaji", i.e. it should be typed as though it was going into Input
Method (IM or IME) of a Japanese-capable word-processor.
Thus it is "toukyou" and "oosaka". Also, I expect an apostrophe
(') to disambiguate things like hon'yaku and Shin'ichi. (Some WPs
use repeated n's for this, but I don't.) Note that I can accept
both Hepburn and kunrei/nihon shiki; both sin'iti and shin'ichi map
to the same kana. The only couple of romaji forms that may give trouble
are the voiced forms of "tsu" and "chi". For these use "dzu" and "dji".
Thus you need to look up "tsudzuku", not "tsuzuku". Note that as is many
IMEs, xa, xi, etc. can be used for the small kana vowels.
For people who don't like having to click the "Japanese Keyword in
Romaji" radio button on the dictionary search page, you enter romaji even
when it is set to "English or ..." by prefixing the romaji with an "@"
character, e.g. "@koujou".
An option on the Word Search page is "Require exact word-match". If you
select this option, only a restricted number of entries will be displayed, as
- for a Japanese key, the headword (kanji or kana) must match
exactly, without other characters before or following;
- for non-Japanese keys, one of the senses in the dictionary entry must match
the key exactly, however two exceptions are made:
- any characters in parentheses before the keyword are ignored;
- the characters "to " preceding the keyword are ignored (thus allowing
matches on English verbs).
Searching for English Words
You need to know that the dictionary files are based on Japanese
head-words, and selecting entries using English keys can result in
misleading results. For example, looking for "book" in the full EDICT
file will return
potentially 350 entries. For searching the EDICT file, you may
be able to get better results by setting the common word restriction via
the checkbox on the initial menu. Also using the "Exact Match" option, may
improve the results. Checking the example sentences (if available) will help
verify if the word is suitable. At all times the user should exercise
Multi-Radical Kanji Selection
I should also mention that the Multi-Radical Kanji Selection feature
does not use the 214 classical radicals. Instead it uses a
slightly different set which included more basic shapes. Note that the
identification of the kanji is based on the visual appearance of the
elements; not on their classical radical.
You have the opportunity to change some of the visual aspects of
WWWJDIC's input and display. There is a "customization" page which lets you change
the basic colours, lines/display, etc. It also lets you change from the
default EUC input and output coding to either Shift-JIS or Unicode (UTF-8).
Note that for modern browsers like IE and Netscape, this option need not usually be
exercised, as the browser will detect the code and display almost all
characters correctly. The option is really
only for browsers that cannot handle EUC at all (e.g. Japanese mobile phones
which only support Shift_JIS), or for regular use of dictionary files such as
the Buddhism and French/German files, which contain characters outside the
basic Japanese set. For users with modern browsers, Unicode (UTF-8) may be
worth using as it avoids the use of bitmapped images.
The customization can take place either by setting a cookie in
your browser, or by setting some URL parameters. Note that the cookies only
work for the server which set them.
A word of warning about
changing the colours. Since the in-line images of the JIS212 characters were
converted from GIF to PNG format, they are now black_on_ivory, not
black_on_transparent. This is because many browsers cannot handle
transparent PNG files.
A Word about IMEs.
WWWJDIC can be used successfully with the IMEs now available for
Win95/98, Windows 2000 and XP:
- Hongbo Ni's NJCOM works well with Netscape. A word of warning -
leave the code setting at the (default) Shift-JIS. Netscape will
convert it into EUC for sending in to the server. (I have not yet
been able to get NJCOM to work with IE.)
- Microsoft's IME works fine with WWWJDIC running via MS-IE, and with
versions on Netscape from 4.72.
- Mozilla 1.0 and later under Linux will work with kinput2
using the XIM protocol, i.e. kinput2 needs to be started with the
-xim option. Also Mozilla needs to be started with the following
variables set: LANG=ja_JP XMODIFIERS=@im=kinput2, otherwise it will not
Stroke Order Diagrams
For the Jouyou and Jinmeiyou kanji
WWWJDIC has the option of displaying a semi-animated stroke order
diagram. These diagrams use the art-work from Jack Halpern's New
Japanese-English Character Dictionary, which was scanned and cleaned up
by Jeffrey Friedl to go into Jack's Kanji Learner's Dictionary. Jack
agreed to me including them as an option in WWWJDIC, and I was able to
create animated GIF files using the panels of the artwork. Some twitch
a bit due the occasional alignment inaccuracies. (See
Technical Bits for more details.)
One of the options of WWWJDIC is to translate the words in Japanese
text. Please note, the function does NOT attempt to
translate Japanese text into English; it simply attempts to
identify the words in the text and to display the translations of
those words. The user is expected to know enough Japanese grammar
to make sense of the results. The input text is displayed in
sections, with the words detected/translated in red, or in blue
where an inflected verb or adjective is assumed. If a user requests that
a word/phrase only be translated once (see below), the text is displayed
in brown for subsequent occurrences.
You can use this option in two ways:
- cut-and-paste text from
another application into the text box on the browser screen. (It
usually seems to go automatically into the EUC I require, but if
you are having problems, try the option of forcing the server to
convert it to EUC.) In some cases the cut-and-paste may break
characters up, resulting in a load of mojibake. Sorry if this
happens, but it's a browser problem and can't be fixed in the
- specify the URL of a WWW page, and the server will
fetch that page and translate the words in it. Note that in doing
so, it deletes everything between < and >, i.e. all HTML
labels, etc. and as a default deletes all non-Japanese characters,
so all you get is the raw Japanese. (You can override this and get
it to leave the non-Japanese in if you wish.) Where non-Japanese
has been deleted, a "|" is inserted. (In this option, you may wish
to set a new timeout value if the fetch of the WWW page takes
longer than the default 60 seconds allowed.) Please note that I make no
attempt to handle cookies. If you can't use this facility because the
site you are viewing requires cookies enabled, you will have to use the
text to be marked and then dropped straight into this function by
clicking on a Taskbar button. See my buttons
The server detects words in the text as follows:
Matches against complete dictionary entries are favoured over
partial matches of longer entries, and if two equivalent matches
are found, the longer is returned. Matched jukugo which are
followed by what appears to be a particle (i.e. "wa", "no", "ni",
"na", etc.) are trimmed back to just the jukugo to avoid
misreporting matches from phrases and similar long dictionary
- gairaigo in katakana are detected and looked up;
- jukugo beginning with kanji are detected;
- where a kanji is followed by two or more hiragana, an attempt
is made to match the kana against known verb/adjective inflections.
If this succeeds, the equivalent dictionary form of the word is
sought. If this is successful, the match is displayed, and the
matched text displayed in blue;
- single kanji which have not been detected in the above will be
matched against dictionary entries (if any). (This may be turned
off by the user.)
- sequences of four or more hiragana are matched against a small
file of words and phrases typically written in kana alone. Only exact
matches are reported. (This
function may be expanded, but the possibility of false matches is high.)
- a special case is made of an o or go hiragana, or the
kanji preceding a kanji. In this case a check is made to see if the word
is present in the dictionary files with and without the prefix.
Users may request that translations only appear once for
each Japanese word or phrase.
The user can invoke any dictionary file for the matching, but
the combination THE_LOT file is the default. One advantage of using
this combined file is that it increases the chance of getting a
correct match for a word, particularly if the text contains names.
Also, the component sub-files in THE_LOT are tagged, and the match
function gives preference to entries in the following order (tags shown "EP", etc.):
- a small file of special words and phrases (SP);
- a subset of the most common 20,000 entries in the EDICT file (EP);
- from the rest of the EDICT file (ED);
- the other glossary files (PP, AV, CO, LW, LS, FM, BU);
- the ENAMDICT entries. (A special version of the file is used in
which kanji names with multiple readings are combined into a single
entry, with the most frequently used readings first);
- the J_PLACES file of Japanese place-names (PL);
The reason the EDICT subset is used is so that the appropriate
match is made when there are several readings of a jukugo, for example
the "adult" compound will
be matched against the word "otona" instead of the less common
The full details of all the dictionary files are provided below.
Further Comments on WWW Page Translation
Please note that if you are wanting to examine Japanese text
within a frame, you may have to examine the source file (e.g.
View/Source) to get the address of the actual file containing the
text. An alternative is to open the frame in a window of its own.
Please appreciate that the function is somewhat crude and
simplistic. Also, a large amount of text will result in hundreds of
searches, so the server may take a while to respond.
I have created a front page
for this function which uses frames so you can have the viewed page
and WWWJDIC side-by-side.
In this section I provide a
few words about each dictionary file, and a link to that file's documentation.
(The many people who contributed to the files are acknowledged in
The dictionary files used by the server are:
WWWJDIC also uses the "radkfile" file from the xjdic
distribution, which contains the radical-element breakdown for the
JIS208 kanji. This file was originally prepared by Michael Raine
and revised and extended by me. The file is used to drive the
multi-radical kanji-selection feature.
- The KANJIDICand KANJD212 files, which are used by the
server for all the kanji search functions.
- The KANJIDIC file contains comprehensive information about Japanese
kanji. It is a text file currently 6,355 lines long, with one line
for each kanji in the two levels of the characters specified in the
JIS X 0208-1990 set.
(There is a summary page
about the file, as well as the full
- The KANJD212 file contains information about the 5,801 kanji in the JIS X
0212-1990 standard. It is in the same format as the KANJIDIC file.
- The EDICT file.
The EDICT file is the outcome of a voluntary project to produce a
freely-available Japanese/English Dictionary in machine-readable
form. This project has been under way since early 1991, and has
involved hundreds of people. It now has over 70,000 entries, and is
the major free-ware Japanese-English lexicon. (There is a summary page
about the file, as well as the full
- The COMPDIC file.
The COMPDIC file is a glossary of terms used in the computing and
(tele)communications industries. It is in the "EDICT" format, and
is intended for use as a generally available file for dictionary
and WP software systems. (Full
- The ENAMDICT file.
The ENAMDICT file contains Japanese proper names; place-names,
surnames and given names. These were originally included in the
EDICT file, along with other non-name entries. By late 1995, the
number of name entries had exceeded the others, and the file was
becoming unmanageably large, so the decision was made to split it.
(The split was done automatically, and may have been imperfectly
performed. Please notify any errors.) The format of the ENAMDICT
file is exactly the same as the EDICT file, and the EDICT documentation
should be consulted for more information. (Full
- The LIFSCIDIC file.
The LIFSCIDIC is currently the "EDICLSD4" Japanese-English Life
Science dictionary in the EDICT format. This dictionary contains
34,274 Japanese bio-medical words frequently used in Life Science
publications. EDICLSD4 was compiled by the
Life Science Dictionary
Project, led by Professor Shuji Kaneko at Kyoto University. (See
overall explanation, plus some information
about the 1997 edition.)
- The JDDICT file.
The JDDICT file is a combination of:
- a transcription in EDICT format of the
"Langenscheidts Lehrbuch und Lexikon der jap. Schrft" author
Wolfgang Hadamitzky. The transcription was done in 1992 with the
author's permission. (Full documentation.)
- an extract from the WaDokuJT
Japanese-German dictionary file compiled by Ulrich Apel. (59,000 entries)
- The FINMKTDIC file.
This file is a concatenation of Kevin Seaver's glossary of
financial terms (FINDIC), and Adam Rice's business & marketing
glossary (MKTDIC). (Documentation files: here and
- The MISCDIC file.
This file is a concatenation of several small glossary files. These
have been joined using "ejoin", and the entries have been given
two-letter tags to show their source.
- GEODIC (GE) - geological terminology file compiled by Bruce
Bain and Leslie Oberman. (documentation)
- PANDPDIC (PP) - Jim Minor's Pulp & Paper Industry Glossary
- AVIATION (AV) - Ron Schei's Aviation Dictionary File (documentation)
- CONCRETE (CO) - Gururaj Rao's Concrete Terminology Glossary (documentation)
- LAWDIC (LW) - the EDICT-format version of the Japanese Legal
Glossary compiled by the Asian Law Program, School of Law,
University of Washington. It was transcribed to file by a team of
volunteers in 1995. (documentation.)
- STARDICT (ST) - a list of star and constellation names prepared by
Raphael Garrouty in 2001.
- LINGDIC (LG) - a list of linguistics terms compiled by Francis Bond
in 1998 (documentation).
- FORSDIC_E (FO) - a list of forestry terms compiled by Juan Cardona (documentation).
- The J_PLACES file.
This file has been extracted from the Postal Code database
available on the WWW. It has partially been converted into
EDICT-format. Note that the kana style is non-standard, e.g. in the
representation of youon.
- The ENGSCIDIC file - a 14,000 entry file of words mostly
relating to engineering and science, which became available in October
- The J-RUSSIAN file - a small but growing Japanese-Russian
dictionary file being compiled by Oleg Volkov. WWWJDIC is able to use and
display this material because the Cyrillic script is part of the JIS
character set. See Oleg's documentation (in Russian).
- The J-FRENCH file - a 17,000 entry Japanese-French dictionary file
from the Dictionnaire français-japonais
project being undertaken by Jean-Marc Desperrier. As Jean-Marc says on
that page, his project's aim "est de traduire en français une partie du
dictionnaire japonais-anglais Edict de Jim Breen". His project is
continuing and is being supported by a number of French speakers.
- The BUDDHDIC file - an extract of about 17,000 entries from the
Digital Dictionary of Buddhism
(DDB). See the brief
documentation. When using this file to look up words, you have the option
of linking to the related entry in the full DDB. Note that you have to enter
the login name "guest" (no password), and you are limited to 10 DDB accesses
per 24-hour period.
- The THE_LOT file - a combination of most of the above files. (See earlier
section on Translating Text.) Useful for text
glossing and wide-ranging searches. NB: if you are using this
file for ordinary word searches, you will probably encounter a number of
apparent duplications, as material in the priority subset of EDICT is
also in the main EDICT file, and many of the ENAMDICT entries are also
in the J_PLACES file. The THE_LOT file is mainly for text glossing.
- A small file of words and phrases written in hiragana. These
are drawn from the EDICT file, and are used only when translating words
Some of the dictionary files contain characters used in languages such as
French, German, Sanskrit, etc., which are not available in the common JIS X
0208 character set. These characters are coded in the extension set - JIS X
0212 - however most browsers cannot display these characters correctly
in the default EUC-JP coding, and they are not available at all in
Shift-JIS coding. For
this reason the characters are sent from the server either as HTML entities,
e.g. é for é, or as a bit-mapped PNG image. Depending on the
have chosen for your browser, these characters may appear a little strange.
Please note that the dictionary material is for the most part copyright.
Unauthorized publication of material from WWWJDIC is prohibited. See the
Copyright section below for more information on
A number of entries in the EDICT dictionary file are linked to
Japanese/English example sentences that can be displayed by clicking on
the "Ex" tag after the entry. The examples are drawn from a collection of
Japanese/English sentences compiled by Professor Yasuhito Tanaka at Hyogo
University and his students. The collection is in the Public Domain, and was
supplied by Professor Christian Boitet. It was described by Professor Tanaka
paper. It is under consideration for
use as a source of multi-lingual examples in the
The collection is large (approximately 180,000 pairs) and is being edited
as there are a number of errors and duplications in both the Japanese and
English texts. A number of the sentences have been tagged with [M] or [F]
to show they have gender-specific language or words.
The process of making the examples available to users of WWWJDIC was as
- the more obvious duplications were removed automatically, reducing the
original file from 210,000 pairs to 180,000 pairs.
- each Japanese sentence was parsed using the Chasen morphological analyzer
to extract the lexical components.
- the components (Japanese words) which contained at least one kanji were
retained, along with all gairaigo (in katakana), and an index file was built
to relate each word to the sentences
which contain it. This resulted in approximately 22,000 words being
identified, most of which are in the main EDICT file.
- a function was added to WWWJDIC to check each word found to see if it is
also in the example index file. If it is, a link to the collection of
sentences is added. Only the first 100 examples can be seen for any one word.
As mentioned above, the collection is in need of considerable editing. It
also needs pruning, as some words are over-represented and many others have no
examples at all. Still it is a beginning, and a method for testing the
I will be progressively editing the file, particularly to remove obvious
duplications, and also adding additional example
sentences. Any suggested corrections or
sentences to add to the collection are welcome, and should be emailed to
If you are sending suggested sentences, please make sure both the
Japanese and English are correct. Also, if the sentences are drawn from
an existing publication, please provide the details of that publication.
If you would like to collect a complete copy of the current file of
example sentences, including the index words, it is available
Most of the verbs in the main EDICT file allow an optional display of a
table of verb conjugations. Where this is available, a [V] tag
appears to the right of the verb display.
The table of conjugations is generated automatically according to the
part-of-speech tag in the entry. It should not be assumed that for every
verb, any single conjugation is as frequently used or as natural as any
Associated with the table of conjugations is a page of
comments which attempts to expand some of the more obscure points.
An interesting feature of WWWJDIC is the system of links to other
servers and files. These are:
- to other
WWW kanji/hanzi/hanja character dictionaries. These links go from
the kanji information page, and enable direct access to the
information about that kanji held on other databases. The databases
currently linked are:
- Charles Muller's World
Wide Web CJK-English Dictionary Database.
This database contains a wealth of information, with a particularly
classical emphasis. A feature is an index into his dictionary of
- Rick Harbaugh's Zhongwen
Zipu (Etymological Chinese-English Dictionary).
This is a fascinating dictionary (available as a CD-ROM too), with
a wealth of etymological information about the characters,
including a genealogical chart. It has a specifically Chinese
- Christian Wittern's
KanjiBase WWW character dictionary. This is under development,
and carries a wealth of information from Christian extensive
- Timothy Huang's Big5 Database. This is a file of codes and
related information in the Big5 set of hanzi compiled by Professor
Timothy Huang, co-author of the book "An Introduction to Chinese,
Japanese & Korean Computing". For further information, contact
The "unifying" code we use to implement these links is the Unicode
(UCS2) code-point. We intend to have all the systems cross-linked.
You can index from Chuck's and Rick's systems back to WWWJDIC.
Project. This project is developing a WWW-based dictionary of extended
information about words & phrases in Japanese. WWWJDIC examines the
jeKai index and when it displays a Japanese word which is in the
jeKai files, it creates a link. Try it out for a word like "noren"
to see how it works.
- the online Sanseido dictionary at Goo. The link
goes from the normal word display, and triggers the JE server at that
site. You can use the other dictionaries at that site, including the big
- the Google search engine, which is called with the displayed
Japanese word(s) as a search key.
- the built-in display of animated stroke order diagrams for about
1000 common kanji.
The dictionary entries contain a number of abbreviations and codes,
mainly to reduce storage and display space.
||adjectival nouns or quasi-adjectives (keiyodoshi)
||nouns which may take the genitive case particle "no"
||pre-noun adjectival (rentaishi)
||special adjective (e.g. ookii)
||Expressions (phrases, clauses, etc.)
||female term or language
||gikun (meaning) reading
||honorific or respectful (sonkeigo) language
||humble (kenjougo) language
||word containing irregular kanji usage
||word containing irregular kana usage
||irregular okurigana usage
||martial arts term
||male term or language
||noun (common) (futsuumeishi)
||adverbial noun (fukushitekimeishi)
||noun (temporal) (jisoumeishi)
||negative (in a negative sentence, or with negative verb)
||negative verb (when used with)
||word containing out-dated kanji
||out-dated or obsolete kana usage
||polite (teineigo) language
||quod vide (see another entry)
||word usually written using kanji alone
||word usually written using kana alone
||Godan verb (not completely classified)
|| v5u, v5k, etc.
||Godan verb with `u', `ku', etc. endings
||Godan verb - Iku/Yuku special class
||Godan verb - -zuru special class (alternative form of -jiru verbs)
||Godan verb - -aru special class
||Godan verb - Uru old class verb (old form of Eru)
||noun or participle which takes the aux. verb suru
||suru verb - special class
||Kuru verb - special class
||vulgar expression or word
||"Priority" entry, i.e. among the 20,000 more common words in Japanese
||rude or X-rated term (not displayed in educational software)
The following abbreviations are used in the Names dictionary file.
||person name, as-yet unclassified
||given name, as-yet not classified by sex
||female given name
||male given name
||a full (family plus given) name of a historical person
The THE_LOT file, used for translating words in Japanese text, has the
following codes attached to each entry to show the dictionary file from
which it has been selected.
The material being displayed in WWWJDIC's pages is copyright. It is
drawn from dictionary files the copyright of most of which is held by the
Dictionary Research and Development Group (EDRDG) at Monash University.
What does this mean in practical terms? Well:
||special words & phrases
||edict (priority subset)
||edict (the rest)
||enamdict (higher frequency names)
||enamdict (lower frequency names)
||j_places (entries not already in enamdict)
||small hiragana dictionary
For more details, see the licence
statement covering the dictionary files.
- you can use WWWJDIC in the same way as you use a published
dictionary to assist you with translating text and words. The results of
your translation may be published, sold, etc. If you make heavy use of
WWWJDIC it would be nice to acknowledge that, but there is no
requirement to do more;
- you can link to WWWJDIC, e.g. using the backdoor entry, from other
servers, provided those servers are operating free-of-charge. Servers
operating on a fee basis must not use WWWJDIC without authorization.
- if you wish to publish significant extracts of the output from
WWWJDIC, for example if you use the Translate Words in Text function
to generate a vocabulary list for a textbook of reading passages, then
this comes under the scope of the licence for the dictionary files,
which prohibits unauthorized publication of subsets of the files. You
must first obtain approval from the copyright holder, and if it is a
large and commercial publication, a fee may be required. Small-scale
use WWWJDIC for these purposes, especially by educational institutions,
will usually be approved free-of-charge provided the usage is acknowledged
in the publication.
- the Stroke Order Diagrams are under Jack Halpern's copyright. You
may link to the pages displaying those images, but you must not download
and store the images without Jack's permission.
- [Q] I can't use WWWJDIC from a J-Phone. I put in a search word, but
get no reply, instead it goes to the main menu.
[A] Yes, I hope to fix that eventually. J-Phones use MML not HTML, and
for some reason forms are sending in information that can't be decoded.
- [Q] Are you planning to have a WAP interface for WWWJDIC?
[A] Perhaps one day, but in the meantime a WAP frontend site which
accesses WWWJDIC via backdoor calls is close to being released (March
- [Q] I like the Stroke Order Diagrams. Why do some kanji not have
[A] The raw diagrams were provided by Jack Halpern, and were prepared
for the Kodansha Kanji Learners Dictionary. The coverage of that book is
a bit over 2,000 kanji, so that's all the diagrams available.
- [Q] Your server is very slow. Why don't you rewrite it in ... or move it
to the .... server technology?
[A] Actually the servers are not slow at all. They are all fast
systems, and the code is quite light-weight. Most requests are served in
a fraction of a second. To some users it may seem slow because of
network delays and congestion. If this is your case, try using a mirror
site closer to you.
- [Q] I have hunted for the source of WWWJDIC and can't find it. Where
[A] Locked up on the servers. I haven't released it, and at this stage
have no intention of doing so. It is continually being modified, and I
want to keep it under my control (after all, it is my ego trip.) I
don't want any clones of WWWJDIC running around at this stage.
- [Q] I want to have WWWJDIC's functions on my PC without having to use
an Internet connection. Is there a stand-alone version I can download?
[A] Not at present. There is no reason why the functionality can't be in a
stand-alone program, and some program such as JQuickTrans do a similar
job. One day I may do a port to a stand-alone, but I am still seeking a
suitable cross-platform environment, i.e. all of Unix/Linux, Macintosh and
I have some information about stand-alone software on the EDICT home page.
- [Q] In the text word translation you don't do all the words written
just in hiragana - why is that?
[A] There are several reasons: (a) the beginnings of such words can be very
difficult to detect when they are preceded by other kana as is often the
case (particles, etc.). You need sophisticated segmentation software to
do this. (b) many Japanese words share the same reading/pronunciation,
and hence I would probably pick the wrong word.
At present I only handle words which are at least 4 kana long and which
are found in a small list of kana-only words.
- [Q] What are all those "vs" and "an" tags on the dictionary
displays? And what are the "ED" and "LS" when I translate words in
[A] Fair question. I have now added a section to this file explaining
- [Q] It's hard to read the translated web pages. I can't read it
so it makes sense.
[A] Can you display Japanese text on your browser?
Can you read hiragana and katakana at recognize a few kanji?
Can you understand the basics of Japanese grammar and syntax?
If the answer to the above is "no", then I agree the
translations of the words from the WWW page will not be much use.
What I do is assist a person to read the text by providing
translations of some of the words. It is NOT an attempt to
translate the page into English.
- [Q] Why do you just translate the words in the text? Why don't you
go the rest of the way and translate properly into English?
[A] Machine Translation (MT) is a huge and complex task. The WWWJDIC
server is comparatively simple. If I ever developed a Japanese-English
MT system (most unlikely), I'd sell it; not have it free on a WWW site.
- [Q] I don't have a Japanese-capable browser. Will you support
graphics display of kana and kanji?
[A] Not via my server, however you can use WWWJDIC via Silas
Brown's "ACCESS-J" server, which has a link on the WWWJDIC front
page. With things like NJWIN, any PC browser can display Japanese.
Also there are fonts you can download for Netscape and IE.
- [Q] I can't read the kana readings. Will you add romaji display
as an option.
[A] No. Better to learn kana. It will only take a week or
- [Q] I see you use EUC-JP. Will you add Shift-JIS as an
[A] I stalled on this for years, and finally added it in March 2000,
because the damned DoCoMo mobile phones need it.
If you aren't using the special DoCoMo interface, you have to set it
via the customization page.
- [Q] I have a Japanese IM with my browser. Can I key in Japanese
keys directly to the dictionary search?
[A] Yes. (Actually you can call the key "English" if you like;
- [Q] I get blanks instead of kanji/kana, but I have obtained
[A] If you have changed to Netscape 4.0x, you'll need to get the
latest NJWIN. Otherwise, I have no suggestions.
- [Q] I don't get any of the JIS X 0212 kanji when I specify a
[A] You need to click on the button to enable these (normally they
are suppressed, as few users need them.)
- [Q] How do I specify a JIS X 0212 kanji when selecting a JIS
[A] Put an "h" in front of it, e.g. "h4064". ("h" is for
- [Q] How do a specify that I want my default dictionary to be
"the_lot"? The customization doesn't allow that.
[A] I really should add that to the customization. In the meantime
you can either (a) bookmark the dictionary search screen, then
without your browser running edit the URL in the bookmark file to
say "wwwjdic?9C", or (b) go to the initial dictionary search
screen, change the "wwwjdic?1C" to "wwwjdic?9C", press enter to go
to the "new" URL, then bookmark it.
No sooner had the WWW come into being that servers accessing my
dictionary files began to appear.
The first, which operated briefly in 1994, was a slight rework of my
xjdic program by Otfried Schwarzkopf. It overtaxed his
386, and was closed down fairly quickly, however by that stage
Jeffrey Friedl's famous Dictionary engine was running. There are
also Rafael Santos' system, the EVA/POETS engine at Notre Dame in
Tokyo, PSP's ALISE-based system, etc. etc., as well as Lambert
Schomaker's WWW edition of the KANJIDIC file.
I had intended to have a WWW version of xjdic right from the moment I
knew about the WWW, and in 1994 collected some information on writing CGI
programs ready for the assault. It always seemed too big a task, and
anyway Jeffrey's server was doing a good job. Eventually in mid-1997 it
got too much for me, as I wanted to experiment with some features not
handled by Jeffrey's server, and I also wanted to see my name in
the WWW lights too, so I filleted out the search-engine parts of xjdic
and dashed off a new CGI-oriented front-end. It only took a week or two
of spare time and was up and running. I could easily have done it years
WWWJDIC has proved popular, although it has probably not overtaken the
early lead Jeffrey's server established. It has been relatively easy to
modify, so I have tinkered with it quite a bit (see below.)
Starting in late in 1998 I have installed a number of mirrors. The
first two were quite a bit of work as I had effectively written a lot of
hard-coded stuff pointing at the Monash site. The code is now fairly
portable (for a Unix/Linux box running Apache.) Having a lot of mirrors
brought in the problem of keeping them up-to-date. To handle this, in
2000 I set
up an "rsysnc server" at Monash and have set "cron" scripts running at
the mirror sites which periodically interrogate the Monash site and
collect and install any updated files.
- added the [V] verb conjugation function. November 2002
- replaced the ENAMDICT version used in text glossing. The version now
used has kanji names with multiple readings in a single entry. October
- added the cookie option to the customization system. September 2002
- added UTF8 as a coding option. September 2002
- added the "exact match" option. August 2002
- added links from EDICT entries to the example sentences in the
Tanaka corpus. August 2002
- replaced the small German dictionary with a bigger one incorporating
part of the WaDokuJT file. August 2002
- added the Russian and Buddhism dictionary files. Extended the JIS212
handling to include the non-kanji and to use HTML entities when it can. Added
the links from the Buddhism dictionary to the main DDB. July 2002
- add the "@" trick from xjdic for flagging romaji input
strings. June 2002
- added ISO-2022-JP support for backdoor strings. Note that the
code-setting for these is the same as for EUC. June 2002
- replaced LIFSCIDIC with V4 of that file. May 2002
- added UTF-8 coding as an option to backdoor strings to enable Mozilla,
- extended the Stroke Order Diagram handling to cover all 2,230 jouyou
and jinmeiyou kanji. Apr 2002
- replaced the buttons on the front page with coloured table entries,
using CSS. Why? (a) it loads more quickly than the previous images, (b)
easier to update, especially on mirror sites. Mar 2002
- expanded the text glossing: (a) now handles many compound verbs, (b)
handles words with o/go prefixes in kana. Dec 2001
- added a facility to match against hiragana-only words in text.
- added a stripped-down text translation facility suitable for
I-mode devices. Nov 2001
- added links to the Unicode.org database for each kanji being
displayed. The Uxxxx is now a link. Nov 2001
- added the links to animated stroke order diagrams for the Grades
1 to 6 kanji. Aug 2001
- tightened the parsing rules for long runs of kanji to reduce
mistakes; allowed trailing particles when the match is correct; included
the j_places file in the "the_lot" file. Aug 2001
- option to restrict search to the more common entries. Aug
- option to look up a displayed word in the Sanseido dictionary at
the Goo server.May 2001
- support for the O'Neill's Essential Kanji indices, which are now in
the kanji database. May 2001
- included the option to do a Google search for each displayed
headword (thanks to Shaun Lawson for showing me how easy it is).
- support for the Kanji Learners Dictionary codes, which are now in
kanji database. April 2001
- the "stardict" file was added to miscdic and the_lot Jan
- A new version of the "radkfile", which drives the "Multi-radical"
kanji lookup. At the same time some JIS212 images were introduced on
that display which better match the elements used. Jan 2001
- Redid the front page, using images for the "buttons" and adding the
DoCoMo options and the new button generator as full items. Jan
- Installed a Japanese mirror site (finally) at the ILCAA in the Tokyo
University of Foreign Studies.
- A section in the documentation explaining the codes and tags.
- Extended the "backdoor" method to handle (a) Japanese codes not in
EUC (b) text glossing. Added the ability to handle Unicode (UCS2)-coded search
keys in some circumstances. All these were done to support the various
- A system in which the mirror sites are
automatically updated with the latest files. Now working for all mirror sites.
- Added the option to suppress all but the first of duplicated
translations in the word-in-text translation function. Tightened up the
removal of trailing particles for jukugo, and extended this function to
- Converted all the JIS212 images to PNG format to avoid violating the
Unisys patent over GIF formats.June 2000
- Added the linking to the jeKai Project.June 2000
- Split the ENAMDICT entries in the THE_LOT file into two priority
sets to help the choice of the more appropriate version when there are
multiple readings of a name. (Now superseded.)June 2000
- Revised the TITLE headings on pages to make them different. This is
to help book-marking the main entry pages. May 2000
- Added special stripped-down starting pages tailored to the
microbrowsers used in the NTT DoCoMo mobile phones. These pages
turn on Shift-JIS operating, and invoke an internal "docomo" mode which
limits the amount of detail in the resulting display.(Apr
- Added the option of outputing in Shift-JIS as well as the default
EUC. (Did I hear you ask why? Well the NTT DoCoMo phones won't hack
EUC pages, and some people want to use WWWJDIC on them.)
- Added the option to break on end-of-line characters when glossing
- Changed the front page to a slightly more modern-looking set of
buttons. Added Silas Brown's "access" bit-map server as an option.
- tidied up the Text Translation feature, eliminating line
breaks, tabs, etc. from the text, and putting in a go-back-to-the
start. Extended the "the_lot" file by marking out the 15,000 most
important entries.(Sep 1999)
- reformatting the displays to make the follow-on actions a bit
more logical. Adding support for the De Roo codes. Restructuring
the site-specific aspects to facilitate setting up mirrors.(Feb
- enabling multiple kanji to examined at once via pasting them
into the request line. (October 1998)
- enabling the kanji-selection to be limited to Jouyou &
Jinmeiyou kanji. (October 1998)
- enabling the retention of non-Japanese text in the WWW-page
word translation feature. (September 1998)
- detection of Shift-JIS in cut-and-paste text, and its
conversion to EUC. (Was not reliable for short text, so was changed
to a user option.)(August 1998)
- the creation of the THE_LOT combination dictionary file, and
its setting as the default for text and WWW page glossing, and the
incorporation of the LAWDIC file into the MISCDIC file. Fine-tuning
the glossing function to favour some subfiles. (August
- the extension of the jukugo translation function to operate on
specified WWW pages. (July 1998)
- addition of the function to translate jukugo, etc. from a slab
of Japanese text. (July 1998)
- addition of the ability to repeat a search in different
dictionaries. (June 1998)
- expansion of the kanji database to include itaiji
cross-reference information and SKIP codes in the JIS X 0212 kanji.
- expansion of the display of the XJnnnnn itaiji cross-reference
information in KANJIDIC/KANJD212 to include a link to the variant,
and the display of each variant. (May 1998)
- inclusion of the J_PLACES file. (Apr 1998)
- support for the index numbers from the New Nelson dictionary.
These are now an option on the Kanji Selection screen. (Feb
- the three initial entry screens (from the front-page) can now
be saved as book-marks. (Dec 1997)
- the inclusion of the classic Four Corner index on the Kanji
page, and at the same time I added links to pages describing the
Four Corner & SKIP codes. (4 Dec 1997)
- the addition of Timothy Huang's Big5 Database information to
the kanji-level links. (17 November 1997)
- the unification of the KANJD212 file into the kanji database
now used by the server. The KANJD212 file is no longer treated as
just another dictionary file. Display of the JIS X 0212 kanji is
done by in-line GIF images, as very few browsers support this
standard. (22 Oct 1997)
- links to Christian Wittern's KanjiBase character database at
the University of Goettingen in Germany. (19 Sep
- a direct URL access (no POST) to enable cross-linking from
other WWW dictionaries, etc. Email me if you want details. (12
- the inclusion of my KANJD212 file as one of the dictionaries.
(12 Sep 1997)
- a system of links to other WWW dictionaries. The first to go
live are Chuck Muller's WWW CJK dictionary and Rick Harbaugh's
Chinese Character Genealogy Dictionary. You can link to them from
the kanji display page, and see their information about the
selected kanji. (12 Sep 1997)
- the support for a second "word" in English keyword searches.
This word is used as a filter, and is case-sensitive, however it
can occur within a longer word. Try looking for "home stay" or
"treasure house" to see how it works. (11 Sep 1997) (BTW,
this works in Japanese searches too!)
- user customization of screen parameters, colours, and input
coding. (8 Sep 1997)
(I hope/intend to fix these eventually.)
- Allowing for "relaxed" romaji spelling, blurring the various
ambiguities such as writing "ji" and "zu", vowel lengths, etc.
- Improving the front-end by a judicious use of frames and,
- WWWJDIC can sometimes be made to crash by sending very long strings
into the text-glossing function via the backdoor (URL) method. It is due
to something being overwritten, and is platform dependent. I suspect an
undersized environment variable is the problem. Try a smaller amount of
text if this happens.
- WWWJDIC occasionally crashes, producing a "core" dump. This
occurs about once every month, i.e. in a minute proportion of
accesses. The user will probably be sent an "internal error"
message. I am curious to track down the cause of these crashes, so
if one occurs while you are using it, please email me on:
firstname.lastname@example.org with the details.
- If you choose a compound from the display to look at the kanji
within it, and at the same time change dictionaries, it tries to
get the compound from the new dictionary, with unpredictable
results. (I might not fix this; more a feature than a bug.)
- If you combine a two-word search with the common-word restriction,
it stops working after the first page of results is displayed.
WWWJDIC is a single C program which takes its parameters from the
URL (QUERY_STRING) and from the various buttons (POST method). It
carries as much as it can of the user's state by loading the values
of the various radio/checkboxes. View the source of some of the
screens if you want to see how the CGI stuff is working.
No database system is used. Each dictionary file is a single text file
with a dictionary entry per line. Associated with each text file is an
index file containing pointers to each element in an entry (see the
xjdic documentation and source for more details on this.) The
dictionary lookup is extremely fast and efficient.
The program runs under the Apache server and on a number of different
Unix-like operating systems, including Solaris, AIX, FreeBSD and several
Linux distributions. No attempt has been made to run it under Windows.
I originally planned to have a permanent dictionary search engine, with
CGI programs calling it, as happens with Jeffrey's dictionary server. In
the end I did not go ahead with this, as memory-mapped handling of the
read-only dictionary files, and the significant caching carried out by
the file system, achieves the same efficiency goal anyway.
Mirror sites stay up-to-date by connecting to the master site at Monash
once each day, retrieving a manifest file, then retrieving any updated
source or data files. The file retrieval is done using the rsync
system, which is excellent for retrieving small portions of large files.
(There is an anonymous rsync server running at Monash for this
According to the settings in the manifest file, modified source files are
compiled, index files are generated, etc. as part of this daily update.
I get a number of enquiries from people offering to host mirrors. I am
not actively seeking many more mirrors, however I like to have a
reasonable geographic spread. The basic requirements for a mirror site
- I must have an account on the system. Installation is complicated
and not well documented.
- it must be a permanent arrangement, or at least one capable of
being used for several years. I don't want to go to trouble setting it
up only to have it withdrawn.
- it must be a Unix-like operating system (Solaris, Linux, AIX, etc.)
- it must have an Apache server running, plus a full suite of utility
software, including gcc, wget, lynx, rsync, etc.
- it must be very well connected to the Internet. Having a poorly
connected mirror is a waste of time.
Stroke Order Diagrams
The Stroke Order Diagram animation was carried out as follows:
All this took a bit of debugging, but once it was working, it only took
a few minutes to generate the diagrams for the whole 2230 kanji. All
this was done on a Sun system running Solaris, so the GIF files are quite
legal under the Unisys patent.
- the source of the diagrams is the digitized multi-panel form from
the printed kanji dictionaries, in
which the kanji is built up stroke by stroke. Jack Halpern sent these to
me as BMP files.
- I happened to have some software I had written to extract sections
of graphics files and create new files from the pieces. This software
worked on PNM (Portable Anymap) graphics files so using standard Linux/Unix
utilities such as bmptopnm and ppmtogif
in conjunction with my own software I was able to
break each static diagram up into a series of GIF files; one for each
panel of the diagram.
- for each kanji, I used the gifsicle
utility to make an animated GIF of the whole kanji.
Japanese Character Codes
WWWJDIC uses the EUC-JP coding for all its files and all internal processing.
EUC-JP is also the default coding for the HTML it generates.
The characters encoded in the files are from the JIS X 0208 character set
which contains the Japanese kana and most common 6,355 kanji along with the
Russian and Greek sets, plus the JIS X 0212 character set which includes a
further 5,801 kanji plus some Latin characters with diacritics (acute, grave,
When pages are displayed using the EUC-JP or Shift_JIS encodings, characters
from JIS X 0212 are displayed either as HTML entities or as 16x16 bitmapped
images. If the optional UTF-8 coding is used, all characters are displayed in
If you want interface to WWWJDIC from another page or a CGI
program, there is a "backdoor" entry which enables simple searches
to be initiated via the URL QUERY_STRING. To use this, you must use
the cgiwrapped URL, with the "backdoor" code set. The format
(or its equivalent on a mirror)
- n = dictionary to use (1 = EDICT)
- M = backdoor entry
- t is the search type: D (dictionary in EUC, ISO-2022-JP or UCS),
S (dictionary in Shift-JIS),
U (dictionary in UTF-8),
K (kanji), L (kanji - limited fields displayed), G (text glossing - EUC,
or UCS), H (text glossing - Shift-JIS) or I (text glossing - UTF-8).
- k is the key type: E or J for dictionary, U, V, N, etc. for
- xxxxxx is the search key or text itself.
1MKU4ed8 - look up the kanji with the Unicode codepoint
4MDJkoujou - look up the Japanese word "koujou" in
1MDErabbit - look up the word "rabbit" in EDICT
9MGG%xx%xx%xx%xx%xx%xx%xx - gloss the (EUC) text
Note that if you want to use this method with other sites, you will
need to modify the URL accordingly.
I want to record my thanks to a few of the key people who have helped
with the server. E&OE.
- Otfried Schwarzkopf, who showed me initially that it could be done
- Jeffrey Friedl, whose server eventually made me get started.
- Jamie Scuglia, our School's sysdmin at Monash, who has always been
most helpful and supportive.
- Chuck Musciano and Bill Kennedy, authors of the O'Reilly HTML book;
- the good people who made the mirrors available: Kendon Stubbs and
Susan Munson (UofV), William Maton (Canada), Masayuki Toyoshima (Japan),
Jacek Rutkowski (Poland) and Ola (Sweden).
- Brodie Thiesfield, who showed me how to redo the front page using
- Shoji Yamazaki, who gave me a lot of feedback and help on the verb
- the many people who emailed suggestions and messages of support.
Jim Breen's Japanese Page.