
ETC MAIN PAGE | ABOUT SGML AND E-TEXTS | HOLDINGS
Standard Generalized Markup Language (SGML) is an international standard for the description of encoded or marked-up electronic text.Marking-up text is an old concept. Text may be "marked up" to add different types of information to a document; some examples are:
The SGML standard allows the definition of different document types. An SGML "document type definition" (DTD) defines which components of a document may be marked up or "tagged", and it defines the relationship between these components. The "tags" used to flag a certain feature are always enclosed in angle brackets. So, for example, a DTD may allow the tags <poem> and <stanza>, and constrain a <stanza> to be within a <poem>.
- Formatting (presentation) information may be added to a text, as with editor's marks on a physical manuscript (to indicate bolding, etc.), or printer markup languages such as TeX.
- Structural information may be added to a text, to mark structural entities such as title, scene, chapter, stanza, table of contents, index, preface, etc. SGML markup is often used to add this kind of information to a document.
There are several widely-used DTDs in existence: examples are the Text Encoding Initiative (TEI) DTD, and Hypertext Markup Language (HTML - used for the World Wide Web). In addition, many organizations or individuals create their own specialized DTDs.
Since the markup is done using only textual elements - all markup uses ASCII characters (letters, numbers, and a few other characters like <, >, and &) -- the resulting files do not have to be stored in binary format (as with a word processor document) and are independent of platform and operating system.
The markup added to a document or set of documents may be used later in a variety of ways. For example,
- Once decisions about presentation have been made (for example, all chapter titles to be bolded and centered), a formatting program can be used to translate a set of tagged documents to a form for a specific printer or other output device. Later, if a decision is made to change the presentation style of a certain document element, the documents do not all have to be edited; instead, only the formatter needs to be changed and all documents re-translated.
- A set of documents can be searched for occurrences of specific text within a certain part of each document (for example, "Canada" in chapter titles).
Many articles are available on the Web with information about SGML and electronic texts. Some of these are listed here:
- About SGML by David Seaman, Director of the University of Virginia Electronic Text Centre
- A Gentle Introduction to SGML
- Oxford Text Archive Guide to Creating and Documenting Electronic Texts
- Introduction to the Use of Electronic Texts in Research & Teaching at New York University
- The Text Encoding Initiative Home Page
The Electronic Text Centre, Dalhousie
University, Halifax, Nova Scotia B3H 4H8
etc@dal.ca -
http://etc.dal.ca/ - 902-494-2319 (fax)
Back to the ETC | Contact the ETC | Dalhousie University | Dalhousie University Libraries | DISCLAIMER
The Electronic Text Centre is a project of the Dalhousie Electronic Text Working Group, with participation from Dalhousie's Killam Library, the School of Library and Information Studies, the Department of English, and Academic Computing Services.
Dalhousie University