Categories
Software and Programming

Uh… The F**k? (- 8)

Being an Israeli, you learn a whole lot more about how text is represented on computers than you really want to know. Or think about.

UTF-8 is a text format compatible with ascii (what people who insist that computers can only speak English call “plain text”) that can represent text in many different languages, including Hebrew, Russian and European (by which I mean, it includes Hebrew and Cyrillic letters, as well as those funny accented characters that Germans and Scandinavians use).

However, for silly historical reasons, it is customary to insert a small piece of gibberish called a BOM at the begining of a UTF-8 file. You might possibly imagine some reason someone thought at the time that sticking it there might be a good idea, but generally, it’s a damn stupid one, because it messes up the whole “compatible with plain text” thing. Certain stupid programs, like, say, PHP (written in part by a couple of Israelis, so naturally its International text support is teh suck).

Now, my text editor of choice knows how to handle UTF-8; in fact, it offers 2 ways of encoding it, called “UTF-8” and “UTF-8 with cookie”.

So, quick quiz: If you wanted to save a UTF-8 file without a BOM, which would you choose? With cookie or without?

I thought that “cookie” might be some technical cute way of referring to this BOM thing. However, turns out that by “cookie” my editor means that it will try to guess if the file is UTF-8 automatically by reading the first line and seeing if it uses the words “coding” and “utf-8” together in some way (this is an XML convention).

Apparently, lots of other people don’t know from BOMs either: they just know that, if they use UTF-8, they can write French (or, with a little more difficulty because of directionality, Hebrew) text in their “plain text” files. This guy uses BBEdit, which actually offers “UTF-8” and “UTF-8, without BOM”, and he still got confused.

Ugh.

2 replies on “Uh… The F**k? (- 8)”

Being an Israeli, you learn a whole lot more about how text is represented on computers than you really want to know.

I’m sorry, that sentence should read ‘Being an Israeli Windows user, you learn a whole lot more about how text…’

As if I don’t get enough Mac elitism from my brother…

While there’s no doubt that Windows is a more educational platform, as in it encourages you to learn why things don’t work, I think it would be most accurate to say ‘Being an Israeli web user’. I do mention that the same problem bit a French Mac user (I don’t think they have BBEdit on Windows). The moment you put up a web page, you’re playing in a pool where, umm, everyone’s legacy software solutions are floating about.

Comments are closed.