engdic: English-Korean Lexicon

As part of joint research between NTT and ATR, we made some modifications to the English-Korean dictionary engdic, including reformatting and fixing many typos. The revised dictionary and some documentation can be downloaded from this page.

There is a nice web searchable interface to engdic and many other free dictionaries at the free dictionaries project (click on English-Korean to access engdic).

The final dictionary (engdic v0.4) consists of 217,620 entries. Of these, 20,841 have no Korean, leaving 196,779 Korean-English pairs. There are 92,982 unique English lemmas, of which 19,441 are MWEs. There are 130,228 unique Korean Translation/Explanations, of which 64,087 are MWEs. Not all the Korean entries are translation equivalents, many explanations remain in the dictionary. 5,823 of the entries have Hanja. 20,763 of the entries have pre-position FLEs. 20,673 of the entries have post-position FLEs. 27,587 of the entries have some Meta Information, including 2,624 with information about the semantics, 1,910 with information about the domain, and 915 with information about dialect use.

Note, however, that this is still a work in process, although we have not made much progress recently.

Dave Oftedal has written a C-program to convert engdic to the format used by edict (a Japanese-English dictionary). You can get the code and a spiffy screen-shot of the dictionary being used with gjiten at http://home.no.net/david/engdic/.


Francis Bond <bond@cslab.kecl.ntt.co.jp>
Machine Translation Research Group
NTT Communication Science Laboratories
2-4 Hikari-dai, Seika-cho, Soraku-gun, Kyoto, JAPAN, 619-0237
Tel: 0774-93-5313 (+81); Fax: 0774-93-5345 (+81)

Valid HTML 4.0 Transitional! Last modified: Thu Jun 26 11:25:53 JST 2003