•XML (like Java) uses Unicode to encode characters.
•Unicode comes in many flavors. The most common one used in the West is UTF-8.
•UTF-8 is a variable length code. Characters are encoded in 1 byte, 2 bytes, or 4 bytes.
•The first 128 characters in Unicode are ASCII.
•In UTF-8, the numbers between 128 and 255 code for some of the more common characters used in western Europe, such as ã, á, å, or ç.
•Two byte codes are used for
some characters not listed in the first 256 and some Asian ideographs.
•Four byte codes can handle any ideographs that are
left.
•Those using non-western
languages should investigate other versions of Unicode.