Update : I’ve changed the addon words list to “UCSC/LTRL Sinhala Corpus Beta“. This provides much more accuracy. Updated version is in the addons site. I’ll combine both word lists in the next version.
Sinhala language has been used in computers for a long time. In the beginning, it was simple ASCII fonts, replacing the English glyphs with sinhala letters. However, sinhala unicode came to play around in 2004. Around that time, we built a search engine converting ASCII text to unicode, so that it can search sinhala text written in any font. Actually, that’s how Paradox Software started. There were few blocks with rendering the unicode fonts, and people weren’t exactly using them. However, those problems are solved with newer releases and sinhala unicode is extensively used today.
You might have seen that there is a english to sinhala dictionary developed by UCSC Language Lab. It’s released under GPL as a firefox addon. Today, I extracted the words from the addon database and built a spell checker for firefox.
Following python code is used to extract the words from sqlite database :
#!/usr/bin/env python import sqlite3 import re conn = sqlite3.connect('en-si.db') c = conn.cursor() c.execute("select * from dict") out = file("words","w") for row in c: words = re.split(r"[ |]",row) for i in words: out.write(i.encode("utf8")+"\n") out.close()
After that simply running, “cat words | uniq | sort > words.sorted” produced a sorted uniq list of words. The “affixcompress” tool comes with hunspell generated the affix rules file and I’ve placed some rules to support some common mistakes.
Install the addon from here. Once after you installed, you can right click a textbox, enable spell checking and select Sinhala as the language.
(Don’t ask me how සොක්කා got recommended for මඤ්ඤොක්කා)