Encoding
•XML (like Java) uses Unicode to encode characters.
•Unicode comes in many flavors.  The most common one used in the West is UTF-8.
•UTF-8 is a variable length code.  Characters are encoded in 1 byte, 2 bytes, or 4 bytes.
•The first 128 characters in Unicode are ASCII.
•In UTF-8, the numbers between 128 and 255 code for some of the more common characters used in western Europe, such as ã, á, å, or ç.
•Two byte codes are used for some characters not listed in the first 256 and some Asian ideographs.
•Four byte codes can handle any ideographs that are left.
•Those using non-western languages should investigate other versions of Unicode.