ETC MAIN PAGE | ABOUT SGML AND E-TEXTS

 
 

HOW-TO: SETTING UP AND TESTING YOUR BROWSER WITH CYRILLIC CHARACTERS ON THE ETC PAGES

The following table shows two rows of characters. On the top line are images of several Cyrillic characters; underneath each image is the character itself. While there may be minor differences, the characters should look recognizably the same.
Ё Ќ Ц ђ љ с џ ш Я И Ж Л Ђ Б
If the two lines do not match: First, set up and choose Cyrillic encoding (Netscape instructions) If the two lines still do not match: Make sure that you have Cyrillic fonts installed. If the two lines match

BACKGROUND: VIEWING INTERNATIONAL CHARACTERS ON THE ETC PAGES

On these ETC pages, all international (non-Roman) characters, such as Cyrillic characters, are stored as Unicode numbers in decimal HTML encoding.

The international standard ISO/IEC 10646-1:1993 Universal Character Set (Unicode) is an ongoing project under the auspices of the Unicode Consortium to assign a unique identifying hexadecimal integer number and a description for every character that is in use in any human language. While work is as yet incomplete on this project, many languages' character sets, including Cyrillic, have been assigned.

For inclusion in HTML or XML files, these Unicode numbers can be encoded in the form

&#dddd;
where dddd is the decimal equivalent of the character's hexadecimal Unicode number. This type of encoded text is called "decimal HTML-encoded Unicode". When these codes are received in your browser, they will be shown correctly if you have the proper font installed and if the proper encoding is chosen in your web browser.

For example, the Cyrillic character which resembles the letter "H" in the Roman alphabet is described in the Unicode standard with its Unicode number and a description:

U+041D    CYRILLIC CAPITAL LETTER EN
and its hexadecimal integer code is 41D. The decimal equivalent of this hexadecimal number is 1053. When this character is included in an HTML document, it may be encoded in the source code in this way:
Н

Some other references on the Web with information about Unicode and Cyrillic:


The Electronic Text Centre, Dalhousie University, Halifax, Nova Scotia B3H 4H8
etc@dal.ca - http://etc.dal.ca/ - 902-494-2319 (fax)

Last updated 1 November 2000


Back to the ETC | Contact the ETC | Dalhousie University | Dalhousie University Libraries | DISCLAIMER


The Electronic Text Centre is a project of the Dalhousie Electronic Text Working Group, with participation from Dalhousie's Killam Library, the School of Library and Information Studies, the Department of English, and Academic Computing Services.

Dalhousie University