![]() For a programmer, the question is: How are these Unicode code points represented concretely using computer bits? The answer to this question leads directly to the concept of Unicode encoding. From Abstract Code Points to Actual Bits: UTF-8 and UTF-16 EncodingsĪ code point is an abstract concept, though. Currently, the Unicode standard defines more than 1,114,000 code points. So, for example, the Japanese kanji ideograph 学, which has “learning” and “knowledge” among its meanings, is associated to the code point U+5B66. Note that Unicode is an industry standard that covers most of the world’s writing systems, including ideographs. For example, the code point associated to the character “C” is U+0043. ![]() According to the official Unicode consortium’s Web site ( bit.ly/1Rtdulx), “Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language.” Each of these unique numbers is called a code point, and is typically represented using the “U+” prefix, followed by the unique number written in hexadecimal form. Unicode is the de facto standard for representing international text in modern software. Volume 31 Number 9 Unicode Encoding Conversions with STL Strings and Win32 APIs
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |