2012-12-31

Text console: i18n, locales and UTF-8

Gone are the days of fiddling with configurations and fonts in order to get one's text terminal (or xterm) to display international character sets properly -- and good riddance! Still, even today it is still possible to misconfigure one's terminal. And what to do if you need to find the hex value of an international character, or vice versa?

Translating between characters and their hex values is the easy part: just point your browser to Stanislav Pecha's převodník UNICODE (sorry, it's in Czech so if you're not fluent in that language you might want to google for a different tool).

When troubleshooting i18n issues, the first thing to look at is the output of the locale command:
$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

If your output is different, add the following lines to your .bashrc and .profile files:
export LC_ALL=en_US.UTF-8
export LANG=en_US.UTF-8
export LANGUAGE=en_US.UTF-8

All of this and more is described in detail in the "How to set up a clean UTF-8 environment in Linux" article at perlgeek.de.

In addition, the font used by your text terminal must match the encoding set in your locale (UTF-8 in this case). Setting up fonts used to be a bit nightmarish. Nowadays, it can be quite simple. I'm currently running openSUSE 12.2 with KDE, so I'm using Konsole as my text terminal. International characters were not displaying correctly, even though I had my locale set for UTF-8. The solution was as easy as checking Konsole's encoding setting and changing it to UTF-8. This is intuitive and easily accomplished using Konsole's menu system, but I include the series of mouse clicks here for completeness:
Settings -> 
Configure current profile ... -> 
"Advanced" tab ->
Encoding ->
Default character encoding ->
Select -> 
Unicode ->
UTF-8

No comments:

Post a Comment