Time for a New ISO Multilingual Character Code

-A Multilingual Character Encoding Standard for Embedded Devices in Low Bandwidth Wireless Networks-

Steven J. Searle

Web Master, TRON Web


When you start doing Web searches for BTRON-related pages, you occasionally come across something [1] written by someone who has read TRON Web but hasn't given the content enough time to sink in before committing his/her thoughts to writing, or who hasn't bothered to really look into the Web site in depth. This is particularly so in the case of certain Unicode supporters, who actually believe that "when the world wants to talk, it speaks in Unicode." Apparently, they have never heard that Shift-JIS, a Microsoft Corporation creation, is the encoding standard most used in Japanese language Web pages in Japan. Do they know that Microsoft, which is active in the Unicode movement, is busily creating new character tables, such as Windows-1252 [2], that are being used by people throughout the world? According to one of my correspondents who deals in software localization issues, Windows-1251 [3] has become the Russian world's equivalent of Shift-JIS. You don't notice it if you are using a Windows-based browser, but you quickly find out about such character encodings when you use the BTRON Basic Browser.

Ah, but things don't stop there. Originally, the Unicode proponents only intended to draw up the Basic Multilingual Plane of the ISO 10646 standard, a two-byte plane that doesn't have enough code space for even a single unabridged Chinese character set. That led others to launch their own projects to develop unabridged Chinese character sets, and there have been a lot of these in Japan. The most famous are Konjaku Mojikyo [4] and GT Font, but there was also eKanji; and large computer makers, such Hitachi, Fujitsu and NEC have also drawn up their own huge character sets. And to make matters even worse, Japanese publishers--which is not to mention government organizations and the telephone company--have created large numbers of user defined characters to handle printing needs that required solutions long, long ago. Accordingly, whatever the Unicode Consortium does in relation to Japan's character processing needs, will be too little too late. Not surprisingly, the TRON Project, whioch has been pursuing total design since its inception, has already offered a solution, the TRON Font Traceability System, to interchange data among these various character collections.

It is on that point that I would like to propose that the International Organization for Standardization create a new, parallel, Unicode-compatible character encoding standard based on TRON Code and the TRON Multilingual Environment. Since TRON Code uses a multiplane, two-byte character encodings system that is better suited to embedded devices and low bandwidth wireless networks, it could be called exactly that--"A Multilingual Character Encoding Standard for Embedded Devices in Low Bandwidth Wireless Networks." Need a number? How about ISO 10646-3? The important thing is to realize that multiple solutions to unabridged Chinese character processing are already in place in countries such as Japan and are being used on a daily basis. Moreover, there are so many of them that even the four-byte Unicode "surrogate pairs" plane could easily be filled up with them, particularly if you throw in user defined characters. Furthermore, since the main embedded systems developers in the world are standardizing around the T-Engine/T-Kernel open development platform, it's obvious a new de facto character coding standard is going to emerge.

It goes without saying that Unicode proponents would vehemently object to a proposal to create a new ISO multilingual character code standard based on TRON Code, but in addition to be being too late for use in Japan, Unicode is beset by a wide range of technical problems in its implementation. Input "problems with Unicode" into Google, and you find out that there are over five million links on the Web that deal with the topic! More surprisingly, this writer has been contacted by people even in Western and Eastern Europe who are dissatisfied with the way that Unicode is handling both the Latin and Cyrillic scripts. Whether the Unicode project should be terminated is something the ISO authorities will have to decided on the merits of what Unicode has accomplished to date, but one thing is certain--this coded character set is going to have a hard time coming into widespread use in countries that have existing computer infrastructure. At the very least, the ISO should allow for the creation of a parallel, compatible character code that will be available in case Unicode's technical problems prove insurmountable.

____________________

[1] The anonymous person who wrote this emotive missive takes the TRON movement to task for not doing anything new in the area of creating new character codes and character sets, although TRON policy essentially calls for creating a framework into which others' character sets are to be loaded. Nevertheless, when there is a demand for characters and no character set in existence, TRON will help standardize one. Hence the Braille characters were included for the visually disabled, and a Tompa character set was created to meet the demand of a fad in Japanese society. It is important to keep in mind that Cho Kanji is still a product aimed primarily aimed at the Japanese market, and hence its forte is processing Japanese, which it does amazingly well. There is also the TRON Font Traceability System to help users of de facto standard computer systems interchange text data among all the proprietary character sets and user defined characters.

[2] Windows-1252 is Microsoft's version of ISO 8859-1, otherwise known as Latin 1. This page and the rest of TRON Web use the ISO 8859-1 encoding standard so that BTRON users can view it with the BTRON Basic Browser, which as of this writing cannot handle any of Microsoft's unique character encoding tables.

[3] ". . . , I have lots of experience with Windows-1251. It's kind of like the Shift-JIS of Russian, in that it's what 95% of Russians use. Windows-1251 is kind of difficult because until recently nobody used Russian on computers so a lot of software (including Adobe FrameMaker, for example) just doesn't handle it."

[4] In 1999, the owner of the Konjaku Mojikyo character set, AI-Net Corporation, made its unabridged kanji character set available to Personal Media Corporation, which loaded it onto the BTRON3-specification Cho Kanji 2 operating system. However, later, AI-Net demanded that the GT Font drawn up at the University of Tokyo be kept off the operating system, or that AI-Net's character set be removed. AI-Net also began charging people for the commercial use of their character set by requiring them to purchase a copy of the CD-ROM marketed via Kinokuniya Co. for 29,400 yen (consumption tax included). Accordingly, Cho Kanji 3 and Cho Kanji 4, both put on sale in 2001, have been released with the GT Font only, even though they have the code space to handle both Konjaku Mojikyo and GT Font. Konjaku Mojikyo at present consists of approximately 110,000 characters, of which 80,000 are Chinese characters.