[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: alternative alphabets



On Sat, 2002-03-16 at 21:42, Charles Menzes wrote:
> i recently received a text file written in the cyrillic alphabet. when 
> opening it under vi, i am unable to see the actual characters. is there an 
> easy way to build this support in?

It's already there - or, at least, it is if you're using most popular
distributions.  (Are there any distributions that don't do i18n?)

What you need to do:

 - Install your distro's font packages for Cyrillic (sometimes labeled
"Russian fonts").  Be sure you get the right font packages for console
or X11, depending on what environment you're in.  (It's easier to get
X11 working than console.)

 - Set the LANG environment variable to the proper locale.

 - Run vi again on the file.

Locale settings use the two-letter ISO language and country codes, with
optional encoding rules.  They look like this:

  lang_COUNTRY.ENCODING

So, for example, most of us use:

  en_US.ISO-8859-1

Elements can be assumed and left off, so you can say:

  en_US

or even:

  en

(though some programs might see the latter and spell things like
"civilized" as "civilised").

As a bonus for being an English speaker, just about all encodings are
structured as supersets of ASCII, so your plain English text should work
in all languages and encodings.  If you set the language to something
different than English, though, vi might start speaking Russian or
whatever to you in its own messages.

So, to continue to get English text but with support for Russian
characters, you'd likely need:

  en_US.ISO-8859-5

Ultimately, we all want to switch to Unicode someday.  In that case, the
encoding would look like:

  en_US.UTF-8

And you'd be able to see those Russian characters without playing all
these games, as well as Chinese, Japanese, Hebrew, and Esperanto - all
in one document, even, if one wants.  The Linux world isn't quite ready
for Unicode yet, though we're getting there very rapidly.

If you want more information, the locale stuff is part of glibc, and is
often installed as a "locales" package if it isn't part of the libc
package proper.  The API for doing a lot of this from the program's
point of view is called "gettext".  If you're really interested in
playing with this stuff, there's a conversion program called "iconv"
which can convert between various ISO encodings, ASCII, and Unicode.


-
To unsubscribe, send email to majordomo@luci.org with
"unsubscribe luci-discuss" in the body.