December 25, 1996

Surf & Turf

Language: Final Frontier
For the True Global Network

Language is the last frontier of the Internet. Over the years, the transmission of static images, video, sound and Latin text has been accomplished on the Web.

But moving beyond the Latin-centric world of ASCII has remained a hurdle. The vast majority of Web, Usenet and e-mail communications still reflects the dominance of English.

Credit: Christine Thompson / CyberTimes

The world, however, is quickly changing. One day, the majority of Net users will be logging in from outside the United States and the virtual world will reflect that diversity in the recreation of Babel in bits and bytes.

ASCII - the American Standard Code for Information Interchange - is a 7-bit system made up of 128 characters. It has been a fine standard for computing, but is inadequate to address the future language needs of the Net.

To get around the problem, many countries have developed two-byte coding schemes to write and read non-ASCII characters. For example, most Chinese Net users rely on a coding system known as Big 5, which uses a combination of standard ASCII characters and symbols above 128 to represent thousands of Chinese characters. To code the Chinese character "guo," meaning country, a Big 5 program would output the two-character string, â═ which would then be translated by a Big 5 viewer into the proper character.

At best, the current system is only passable. Every language requires special viewing software, and you can only encode Latin-script languages and one other on the same page since codes used by one language are also used by others.

Related Column
Drafting an Alphabet for the Digital Tradition
(Dec. 18)
Since the late 1980s, a consortium of high-tech companies, including Microsoft, Apple and IBM, have been working on a universal coding scheme that can encompass all of the world's scripts. The system is called Unicode -- a 16-bit encoding scheme that can specify up to 65,536 characters.

Unicode also has the capability to partially address a 31-bit code space, known as the ISO 10646 standard, which is capable of representing 2.1 million characters, enough to encode virtually every written character ever devised, including Mayan glyphs and Egyptian hieroglyphics. With Unicode, viewing different languages would be as seamless and transparent as viewing ASCII text, since every character would have its own unique 16-bit code. Any number of languages could be placed on the same page and all could be viewed by the same browser, as long as it was Unicode capable and had the proper display fonts.

The idea behind Unicode is so compelling that the question of its implementation is not a matter of "if" but "when." The most significant development is Microsoft's Windows NT, which is a Unicode-based operating system.

Unfortunately, having a Unicode-based system now doesn't do much for users since there are few editors, browsers, news readers, spreadsheets or games to go along with it. Even if there were plenty of those program, a user would find a stunning lack of Unicode material anywhere.

(As far as I know, there is only one Unicode newsgroup in existence -- chinese.txt.unicode. If anyone knows how to read this group with a non-Unix program, please let me know. I'm dying to know what these very lonely people are talking about. There may be some Unicode Web pages in the cosmos, but I have yet to find them.)

For now, multilingual Net users must adapt the various coding schemes of the world. There are now three major programs designed for multilingual communications -- Internet with an Accent by Accent software International, Tango by Alis Technologies and GlobalSurf by DynaLab.

The best known now is probably Internet with an Accent, which supports more than 30 scripts including Chinese characters, Korean hangul, Japanese, Hebrew, Greek and Arabic. It is actually made up of four programs -- Multilingual Mosaic for Web browsing, Multilingual Publisher for writing HTML, Multilingual Mailpad for composing mail and Multilingual Viewer for reading encoded documents. Internet with an Accent sells for about $100.

You can also buy a $30 product called Navigate with an Accent, which is a multilingual plug-in for Netscape Navigator Gold 2.0 and above. The programs are only available for Windows computers.

Tango is group of multilingual programs that includes a browser, an e-mail program and an HTML editor. The browser, which is available only for Windows machines, supports just about every known language standard in the world, including Unicode. By February, version 3.0 of Tango, which includes the e-mail program, will be available for about $40.

Tango and Internet with an Accent are fine solutions for businesses that are looking for complete packages to handle their language needs.

But for consumers with more limited needs, both suites seem a bit redundant. Netscape and Microsoft have turned out two superior browsers. I don't want another one on my desktop, not to mention the extra e-mail programs, document viewers and HTML editors.

DynaLab has taken a different approach by providing a program that works with existing software, including word processors and spreadsheets.

GlobalSurf, which retails for $99, is by far the most transparent multilingual viewer and editor now on the market. It loads as a small toolbar with icons for various languages. When you open a Web page or other documents with encoded characters, like e-mail or newsgroup postings, you simply hit the proper language icon and the text is decoded. GlobalSurf, which is available only for Windows computers, includes a variety of fonts so you display about two dozen different languages, including Chinese, Arabic, Hebrew and Russian.

GlobalSurf, which retails for $99, is by far the most transparent multilingual viewer and editor now on the market.

The program also allows users to compose documents in supported languages. The program uses TrueType fonts so the characters can be placed and manipulated in Windows programs like any other characters. For example, you can place Chinese characters into a Word document and change its font size and location just as you would with Latin characters.

The ability to read different languages is a major step forward in Net communications, but for many people the information would still be gibberish even if rendered correctly. The problem, of course, is that many people can only speak one language.

In the past few years, a group of automatic Translation programs has begun appearing on the market. Most are geared to professional translators and cost several hundred dollars. One product geared toward consumers is Power Translator from GlobaLink.

Power Translator 6.0, for Windows 95 and NT computers, sells on the street for about $130 and translates English, French, Spanish, German and Italian. The core of the product is a standalone program that works on .doc, .wri, .html, .rtf, .sam, .wpd and ASCII documents. There are also two utilities -- Web Translator and Translation Utility -- that allow users to convert e-mail, newsgroup postings and web pages.

Web Translator (it only works with Netscape Navigator now, although an Internet Explorer patch will be available early next year) appears on the screen as a small button bar. Pressing the "translate" button automatically converts a Web page from French, Spanish, German or Italian into English, or from English into any of the other four languages. The page is displayed with exactly the same layout and images inside your browser. Only the words are different.

Translation Utility works with any program that uses .doc, ASCII, .wri or other text formats. It appears as a little icon on every program window. I've been using it to translate e-mail messages and newsgroup postings.

The translations, of course, are very crude. They can range from intelligible to absolutely hilarious. My friends who speak French, Spanish and other languages were rolling on the floor howling over some of the stupidities of the program. But you have to be tolerant here and exercise some common sense. Machine Translation is extremely difficult. Colloquial and literary texts are problematic. But when language is simple, the translations can be surprisingly good.

Here's an example of Power Translator's abilities. I converted this paragraph with no improvements to the program's basic dictionary. There may be some very strange spots in the translations.

ENGLISH: Language is the last frontier of the Internet. Over the years, all the basic forms of communications have been mastered on the Web, including the transmission of static images, video, sound and basic Latin text.

FRENCH: La langue est la derni╦re fronti╦re de l'Internet. Sur les ann╚es, toutes les formes de base des communications ont ╚t╚ maËtris╚es sur le tissu, y compris la transmission d'images statiques, vid╚o, son et texte Latin de base.

SPANISH: El idioma es la Öltima frontera del Internet. Durante los aĎos, se han dominado todas las formas bĚsicas de comunicaciones en el tejido, incluso la transmisi█n de imĚgenes estĚticas, el video, texto latino leg╠timo y bĚsico.

GERMAN: Sprache ist die letzte Grenze vom Internet. đber den Jahren sind alle Grund Formen von Kommunikationen auf der Spinnwebe, einschlieĚlich der đbermittlung statischer Bilder, gemeistert worden Video, gesunder und Grund lateinischer Text.

ITALIAN: Lingua ╦ l'ultima frontiera dell'Internet. Durante il corso degli anni, tutte le forme di base di comunicazioni sono state dominate sul tessuto, incluso la trasmissione di immagini statiche, video, suono e testo Latino e di base.

These may not be great translations, but they are intelligible. You can try out Power Translator for yourself on the GlobaLink Web site, which has a free service to translate up to 300 words of text. The finished Translation is e-mailed back to you.

In years to come, machine translators will undoubtedly improve. While they may never become as good as human translators, each improvement in their abilities edges us closer to breaking down the barriers of language.

It may take years, but eventually the World Wide Web may actually fulfill its promise of being a global medium.

SURF & TURF is published weekly, on Wednesdays. Click here for a list of links to other columns in the series.

Related Sites
Following are links to the external Web sites mentioned in this article. These sites are not part of The New York Times on the Web, and The Times has no control over their content or availability. When you have finished visiting any of these sites, you will be able to return to this page by clicking on your Web browser's "Back" button or icon until this page reappears.

Ashley Dunn at welcomes your comments and suggestions.

Home | Sections | Contents | Search | Forums | Help

Copyright 1996 The New York Times Company