oracle guide

Page 133

What is Unicode?

UTF-8 is the Unicode encoding supported on UNIX platforms and used for HTML and most Internet browsers. Other environments such as Windows and Java use UCS-2 encoding. The benefits of UTF-8 are as follows: ■

Compact storage requirement for European scripts because it is a strict superset of ASCII Ease of migration between ASCII-based characters sets and UTF-8 See Also: ■ ■

"Supplementary Characters" on page 6-2 Table B–2, " Unicode Character Code Ranges for UTF-8 Character Codes" on page B-2

UCS-2 Encoding UCS-2 is a fixed-width, 16-bit encoding. Each character is 2 bytes. UCS-2 is the Unicode encoding used by Java and Microsoft Windows NT 4.0. UCS-2 supports characters defined for Unicode 3.0, so there is no support for supplementary characters. The benefits of UCS-2 over UTF-8 are as follows: ■

More compact storage for Asian scripts because all characters are two bytes

Faster string processing because characters are fixed-width

Better compatibility with Java and Microsoft clients See Also:

"Supplementary Characters" on page 6-2

UTF-16 Encoding UTF-16 encoding is the 16-bit encoding of Unicode. UTF-16 is an extension of UCS-2 because it supports the supplementary characters by using two UCS-2 code points for each supplementary character. UTF-16 is a strict superset of UCS-2. One character can be either 2 bytes or 4 bytes in UTF-16. Characters from European and most Asian scripts are represented in 2 bytes. Supplementary characters are represented in 4 bytes. UTF-16 is the main Unicode encoding used by Microsoft Windows 2000. The benefits of UTF-16 over UTF-8 are as follows: ■

More compact storage for Asian scripts because most of the commonly used Asian characters are represented in two bytes. Better compatibility with Java and Microsoft clients See Also: ■ ■

"Supplementary Characters" on page 6-2 Table B–1, " Unicode Character Code Ranges for UTF-16 Character Codes" on page B-1

Examples: UTF-16, UTF-8, and UCS-2 Encoding Figure 6–1 shows some characters and their character codes in UTF-16, UTF-8, and UCS-2 encoding. The last character is a treble clef (a music symbol), a supplementary character. Supporting Multilingual Databases with Unicode

6-3


Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.