Logo of Big Brother Mouse, publishing books in Laos
 

FONTS, KEYBOARDS, AND UNICODE

Font Creator, the software program we use to make Lao fonts. If your literacy or reading project is publishing books in a language that uses the Latin alphabet (also called the Roman or English alphabet) you can skip this page. If you're using any other alphabet or system for which there is an established computer writing system, and the system works, you can also skip it.

But we were publishing in Lao, for which computer support was, and remains, less advanced.

The Lao language uses a different alphabet from any other language. The Lao alphabet is closely related to the Thai alphabet, but is shorter, and letter shapes are different. Vowels may go above or below a consonant. Tone marks always go above a consonant; also above any vowel that's already there. So there are four horizontal zones in which letters may fall, although only an error-prone typist will use all four at the same time.

Lao is simpler for a computer than a language that's written vertically, or right-to-left, but it's still not simple. When I started trying to type Lao on computers, in 2004, several systems were in use. Most of them used a semi-standard keyboard layout which I gather had evolved from the Lao typewriter layout, which had evolved from Thai typewriter layout. It wasn't good for Lao computer use.

Punctuation had shifted to the middle row. The numbers 1 to 4 were the upper-shift of 1 to 4; 5 to 8 were upper-shift of 7 to 0; 9 and 0 were somewhere else though I could never remember where. Some obsolete letters occupied prime real estate in the center of the keyboard, while the common "p" was so far to the top right that after hitting it, my fingers lost their way as they went back home.

Furthermore, the most common Lao system involved a program which operated in the background, doing some mysterious things. In Photoshop, the text occasionally, suddenly, inexplicably, turned into Greek. Or maybe Russian, which at least was politically more explicable. But it certainly wasn't Lao anymore.

"I can't publish books with this!" I announced (to myself, since no one else was listening). So I learned a bit about fonts. Fortunately, this was before Big Brother Mouse had started; I was simply working with a few Lao people with the vague goal of somehow improving literacy and book access, and had some extra time for an excursion into a new field.

Font systems involve several separate aspects. You can develop or make changes in one, and not the others. We, however, got into all three.

Keyboard layout. As long as we were creating a system, I decided to start with a more logical keyboard, putting the most-used letters and symbols on the easiest-to-reach keys. There shouldn't be too many readers who ever need to do this, but if you're one, here are a few things I tried, or learned:

* I kept standard punctuation where it is on the standard Roman keyboard. Same with numbers. That seems obvious, but it hadn't been to whoever created the previous Lao keyboards. It's not just easier; it avoids computer problems. A computer will in some cases interpret a period as a decimal point; it may change straight quotation marks into directional marks. If you've put letters in those positions, odd things will happen.

* I put vowels on the right, consonants on the left, tone marks in the middle. That creates some likelihood that left and right hands will alternate, which helps speed and accuracy. It also makes it easier for new typists to learn the keyboard, and for hunt-and-peckers to find the key they want.

* I used the home keys: asdf jkl; for the most common letters. The next most common letters went on the row above: qwert uiop; and then upper-shifts of the home keys: ASDF JKL:. And so on.

* In Lao, you don't put spaces between most words. Spaces are used roughly like commas are in English, to show a break in thought. Consequently, a computer doesn't know where it's okay to start a new line. It either goes back to the last space, which might be at the beginning of the line; or the computer picks a place to end the line, based on its understanding of Lao, which isn't very good. Our solution has been to convert the hyphen key into a "word-break" key. It's designed as a blank character that takes up no space. When you hit the hypen key, the computer records a hypen in memory, but doesn't show it. But the software knows that's an acceptable place to start a new line.

* Try to map letters in your alphabet to letters, rather than symbols, on the keyboard. Many software assign special meaning or treatment to various symbol keys. The Access database, for example, assumes that square brackets define a field name, so if you assign an ordinary letter to the bracket key, you may not be able to use it in certain situations.

* Here's an issue that never occured to me before, but I'd watch for it if ever I did this again. I regularly touch-type in both English and Lao, and usually have no problem keeping them separate. But the Lao letter similar to "G/K" is mapped to the English letter "D", the same finger as English "K" but on the other hand. My most common typing mistake in Lao is swapping these two letters. Should you be designing a new keyboard, some users will probably also type with the Roman/English alphabet. Assuming some other brains are wired like mine, it seems best not to let any sounds in your alphabet be mapped to a mirror-image position of where they are on the QWERTY keyboard.

How it's stored. Your computer doesn't remember that you typed "A" or the Lao letter "gaw-gai". It remembers a number: 65. When it prints your document, it prints the letter (or character, or any other shape) that your font stores at location 65.

* Lower-range ASCII: In 1970, computers stored information paper tape. Seven punched or unpunched holes represented a character or other bit of information, so there were 128 characters (albeit numbered 0 to 127). That was enough for 52 upper- and lower-case letters, the numbers, punctuation, and some now-obsolete commands such as "ring a bell to tell the teletype operator to come over and do something."

The slots from 0 to 32 have special uses, but from 33 to 127, the computer will print whatever picture the font shows there. If it's a picture of "A", the computer will print "A". These are conventionally assigned to the numbers, Roman letters, punctuation, and other common symbols. Everything you type on a regular typewriter has a corresponding number in the 33 to 127 range of the computer.

Computer geeks in California will tell you that this range should be used ONLY for the Roman (English) letter, and that higher numbers should be used for any other alphabet. That's fine for them; they're not trying to run a literacy project in Laos. For us, hijacking these slots and using them for Lao characters kept things simple.

* Characters 128 to 255: Electronic memories replaced paper tape, and an extra "hole" was added, so now 256 characters or codes could be accommodated. In some systems, these are used for the characters of Lao, or another "secondary" alphabet. This requires a background program, which if all goes according to plan will hover about invisibly but allow the user to get access to that higher range. On Windows, that's set up through the "Regional and Language Options" features. But, to keep this brief, it doesn't always work smoothly, and Lao isn't even one of the official options, so it requires a custom program written to tweak a feature which seems to have been a Windows second-thought add-on, anyway, and so now we've got a clue why the Lao was turning into Greek, way back when.

* Unicode. In the late 1990s, Unicode was introduced. The idea was to have one uniform way to store every alphabet and significant writing system in the world.

Unicode is no doubt the system of the future, but here in Laos, today is still today. Unicode doesn't work with some software, including the database program around which we've built much of our project. In addition, tone marks float higher than they should, which offends my aesthetic sense, and I would guess slightly reduced readability.

In your case, if your software runs unicode, and it types your alphabet, very likely you should use unicode, and not concoct something new.

Font design. The shape of letters is completely separate from the issues of keyboard layout and coding. Not too many people will have a good reason to create new keyboard or coding systems. But if you use a relatively uncommon alphabet, you may want make some new fonts, either for improved legibility, or design reasons.

We've made a number of Lao fonts, using Font Creator software. Someone will need to spend a week or two learning to use it. It can also be used to tweak existing fonts. Perhaps you've got fonts which look fine in small sizes, but the curves are pretty bumpy when they're used in larger sizes. You can fix them up with Font Creator much faster than creating a new font, and doing so is a good way to learn the program.

This is longer than I had intended, and of value to relatively few people, so there's more that could be said, but I'm not going to right now. However, if you're involved in literacy or reading promotion, and have questions about some aspect of all this, please drop me an email at the Big Brother Mouse contact page, and I'll do what I can to help.