|  |  |  |  |  |  |  |  |  |  |  |  |  |  |  | 
   
    | • | XML (like Java)
    uses Unicode to encode characters. 
 |  | 
   
    | • | Unicode comes in
    many flavors.  The most common one 
 |  | 
   
    |  | used in the West
    is UTF-8. 
 |  | 
   
    | • | UTF-8 is a
    variable length code.  Characters are 
 |  | 
   
    |  | encoded in 1
    byte, 2 bytes, or 4 bytes. 
 |  | 
   
    | • | The first 128
    characters in Unicode are ASCII. 
 |  | 
   
    | • | In UTF-8, the
    numbers between 128 and 255 code for 
 |  | 
   
    |  | some of the more
    common characters used in western 
 |  | 
   
    |  | Europe, such as ã,
    á, å, or ç. 
 |  | 
   
    | • | Two byte codes
    are used for some characters not listed 
 |  | 
   
    |  | in the first 256
    and some Asian ideographs. 
 |  | 
   
    | • | Four byte codes
    can handle any ideographs that are left. 
 | 
   
    | • | Those using
    non-western languages should investigate 
 |  | 
   
    |  | other versions of
    Unicode. 
 |  |