CFEclipse and (missing) BOM marks

Most of the websites I work on has Norwegian as the main or only language. To make sure Norwegian letters (æ,ø and å) looks good in web browsers we have to use UTF-8 as page encoding, and also insert a BOM mark.

In Dreamweaver this is really simple to do, and I've set it to automatically use UTF-8 and include BOM mark on every new page I create.

In Eclipse/CFEclipse however there seems to be no choice for adding BOM marks in your pages. After a while of desperate Googling (this would be crucial if I'm to use CFEclipse as my editor) I came upon a workaround that seems to be working. Not surprisingly the solution was proposed by Paul Hastings.

The solution was found here, where Paul says "each of our coldfusion pages starts with: <cfprocessingdirective pageencoding="utf-8">"

So now I added ...

<cfprocessingdirective pageencoding="utf-8">

... to one of the pages containing Norwegian letters and it seems to work.

Now I hate the idea of having to add this to all pages we create, but if Paul Hastings is doing it, then it is probably not a bad idea anyway....

Comments
It seems "wrong" to me to have anything other than standard ASCII (that's be the 7-bit variety) in source code files anyhow.

Am I being old-fashioned? Anglo-centric?

Interesting food for (my) thought, anyhow.

--
Adam
# Posted By Adam Cameron | 12/16/06 6:05 PM
Personally I think the default encoding and character sets all should be UTF8 :)
# Posted By Trond Ulseth | 12/18/06 5:30 AM
I don't have Eclipse in front of me, but I know I've seen a select box for changing a file's (or a bunch of files recursively) encoding in Eclipse's file properties. (Right click the file in Navigator, and one of the sub dialogs there has it.)

Does this not do what I think it does? I thought this managed BOMs.
# Posted By Jamie Jackson | 1/13/08 11:03 AM
You can set the encoding to be utf-8 (it is actually the default encoding now - I don't think it was last time I checked).

But it does not manage BOM marks (at least I can't find any references to it).
# Posted By Trond Ulseth | 1/14/08 7:39 AM
I read Paul's article more thoroughly this time, and I see what you mean. It *seems* like the only difference between an Eclipse ISO-8859-1 file and a UTF-8 file, is that it won't let you save down any UTF characters that you might have typed into a given file. Also, I confirmed in a hex editor that it doesn't put in a BOM for UTF-8.

When you stray from the default encoding in a project, Eclipse starts maintaining a text file under a ".settings" directory, which looks something like:

#Mon Jan 14 16:32:05 GMT 2008
eclipse.preferences.version=1
encoding//controller/8859.txt=ISO-8859-1
encoding//controller/utf8.txt=UTF-8

...and this is the *only* place that such settings are saved, so the setting is not persistent outside of eclipse.

This is all old news to you, I'm sure, but I figured I'd post my findings somewhere, so at least I can find them later.
# Posted By Jamie Jackson | 1/14/08 11:22 AM
Trond et al, I have created an Eclipse plugin that allows you to add and remove UTF-8 BOMs. I have done testing, but would like more testing done before I release it publicly. If you are interested, please drop me a line at christopher dot bradford (at) aliveonline dot com.
# Posted By Christopher Bradford | 1/17/08 5:48 PM