Listed below are some general editorial principles to follow when transcribing and encoding texts using TEI (Text Encoding Initiative)-Lite for this project. These follow, with minor revisions, those created by Perry Willett at the Victorian Women Writers Project at Indiana University. The most important principle to follow is that when you have questions about transcribing or encoding, call Charlotte Payne at 2-6040 (

Generally speaking, if something is there on the page as part of the text, include it in the electronic transcription, using the appropriate tag or tags to encode it. However, there may be things on the page that are not part of the textual matter, but instead belong to the bibliographic features of a book. These include such things as the recurring titles (running heads) that may appear at the top of each page, or signature markings (a "B" for instance) that may appear at the bottom of a page. Other typographic features may be included in the text, such as a recurring picture of an ivy vine at the end of each poem. Such bibliographic or typographic features can be left out, but may be noted in the header as a feature of the text. Other typographic divisions within poems, such as a line of asterisks or periods, or simple features such as lines or double lines can be encoded within the <milestone> tag with attribute values of UNIT=typography and N=**** or …. or ____, depending on the character used. The macro for this is CNTRL-ALT-M. If you have questions about any such feature, please ask Charlotte.


General Procedures

You will be assigned a text from the Kohler Collection of Minor British Poetry to scan and encode. These texts are part of the Special Collections Department, and must be handled with care: do not use pen or post-its near the texts, or put food or drinks in the same area. Use the small glass bar kept near the terminal to hold open pages.

Before you scan the text, add the text information to the LOG kept on the wall near the desk in the scanning office. To do this, you will have to fill in the author/title, your initials, the date, the Kohler number, the document ID number (first four letters of the last name, first initial of the first name and first 5 letters of the title of the poem: for example, HemaFPoems) and the series ID number (assigned consecutively). Next, fill out an individual tracking sheet for the text. Blank copies of these sheets are kept on the desk in the scanning office. The tracking sheet is kept in the folder with the text; fill out the sheet as you work on each part of the process. It is very important that we have accurate tracking records.

Once you have scanned the text, discuss the general structure of the text with Charlotte or Nancy. After taking a quick look through the text, discuss any features about which you are not sure. If you run into problems along the way, contact one of us. HINT: it is much easier to work out proper tagging format as you go than it is to go back and change it later!

After you have finished encoding, a print out of the text without markup will be returned to you for careful proofreading. This means a line-by-line comparison with the text. After you have made the corrections, save them under your folder and let Charlotte know; she will do the second proofing and check the tagging.



In general, it does not matter if tags are encoded in all upper- or lower-case letters.



Numbered divisions will begin with <DIV1> in the front matter, the text, and the back matter.



Frontispieces (the full page illustration usually found facing the title page), full page illustrations (not vignettes, which are smaller images found on pages which also include text of the book) and title pages should be encoded as a <FIGURE>, within a separately numbered <DIV1> (usually), or <DIV2>, etc. Use a <P> tag to type [Title page], [Frontispiece], or [Illustration], to describe the type of figure. In addition, whatever the number of the <DIV>, edit the attributes so that TYPE=frontispiece, title page, or illustration. Where there is text on the illustration, either the name(s) of the artists or sculptors, or descriptive text, add a second <P> inside the <FIGURE> tag.

In addition, edit attributes within <FIGURE> so that ID=BurtHWhite1m, BurtHWhite2m, etc., depending on number of illustrations. See samples in "NEW SAMPLE TEXTS", or: <DIV1> type=title page<P> [Title Page] <FIGURE></FIGURE></P></DIV1>, or <DIV1> type=title page <P> [Title Page] <FIGURE><P>H.Haines, sc.</P></FIGURE></P></DIV1>.


Page Breaks

Page breaks are encoded as <pb> at the top of the page where they occur in the text. The page number is noted using the "n=" attribute, with the actual page number as the value (not in quotation marks). If the page number is in Roman numerals, transcribe it in Roman numerals, e.g. "n=vi" (also not in quotation marks). Also fill in the attribute "id=". The value of the "id" attribute will be the actual page number, preceded by a "p". In this case, the page break would look like this:<pb id=p3 n=3>. See the Ref Element/Table of Contents tip sheet for further help.

Page breaks occur within a <DIV1, 2, etc. (first one immediately following, others where they occur in the text). For example, <DIV1> <PB></PB> …



Entities are used in place of common symbols such as & (ampersand), - (hyphen), -- (emdash), * (asterisk), dagger and double dagger. They are produced in Author/Editor by going to the "insert entity" box, picking the entity for the symbol, and inserting it.



Occasionally, foot notes <note> (or, more commonly, <ref>, the corresponding marker (often an asterisk) within the text) will occur within prefaces, titles, poems or appendices and should be marked. Where possible, place the note tag near the text where it occurs (but see below), using these attributes (and see P3, p.1072 for more discussion):

· resp= the individual or group responsible for the annotation. Sample values include:

· author: note by the author of the text.

· editor: note added by the editor of the text.

· translator: note added by the translator of a text

· transcriber: note added by the transcriber of a text

· place= indicates where the note appears in the source text. Sample values include:

· foot of page ___ : note appears at foot of page (specify).

· end of ___: note appears at end of chapter or volume (specify).

· left: note appears in left margin.

· right: note appears in right margin.

<Note>s occurring in the text of poetry should be added at the end of the line group (</lg) in which the marker (<ref>) appears. Place= must state exactly where the note occurred in the original text, i.e., foot of page 7; or foot of pages 7-9. For further clarification, see the Note and Ref Elements tip sheet.


Errors and Corrections

You will occasionally come across spelling or printing errors in the texts you encode. They should be noted using the <sic> tag. This tag has two attributes we use:

"CORR=" for the correct wording or spelling; and

"CERT=" to note your uncertainty. Errors can be very difficult to spot; if you are uncertain about a potential error, note it in the text using the <CORR> tag and the "cert=n" attribute. If you are certain, then there is no need to use the "cert=" attribute. Here are a few real life examples:

And, as she turned, they saw how bare

And bruised where her pilgrim feet.

<L>And, as she turned, they saw how bare</L>

<L>And bruised < SIC CORR="were" where</SIC> her pilgrim feet.</L>

For wings and probocis can go their own way.

<L>For wings and < SIC CORR ="proboscis" probocis</CORR> can go their own way.</L>

And if if you give it dinner, yet a further pack or two.

<L>And < SIC CORR ="if" if if</CORR>you give it dinner, yet a further pack or two.</L>

Note that the original text occurs within the <sic></sic> tag pair, with the corrected text within the <sic> tag as the value of the "corr" attribute.


Emphasized, Foreign and Highlighted Words

Generally, there are several different ways to indicate text that is highlighted in some way, generally by italics or bolded type. Each of these tags has attributes that allow for noting the typographic rendering and language. The most common attributes for rendering:

· rend="italics"

· rend="smallcaps"

In general, don't spend too much time trying to mark all the typographic features of a text, especially if they do not seem to add to an understanding of the text.

A word or phrase that is marked by the author for rhetorical or linguistic effect would use the <emph> tag:

Say rather, why not? It is easier so;

<L>Say rather, why <emph rend="italics">not</emph>? It is easier so;</L>

For a word or phrase that is typographically distinct from the surrounding text, for which there is no clear rhetorical or linguistic meaning, use the <hi> tag. This is most commonly used when the first letter or first word of a poem is highlighted as a typographic convention.

A Minor Poet.

HERE is the phial; here I turn the key

<div1 type=poem><head>A Minor Poet.</head>

<L>H<HI rend=smallcaps>ERE</hi>is the phial; here I turn the key</L>

NOTE: We tag SMALLCAPS WITHIN LINE GROUPS (<LG>) AND PARAGRAPHS <P> ONLY. These are tagged as <HI> within the <L> to differentiate from <EMPH> which we use for italics in poetry, or <EMPH> within the <P> to differentiate from the <HI> we use for italics in prose. Then, edit your attributes so that REND=smallcaps inside the <HI></HI>tag in poetry, or the <EMPH> tag in prose. Once you have done a few, you can use the macro Cntrl-Alt-C for poetry or Cntrl-Alt-E for prose.


Foreign Language

Foreign words may also be highlighted in the text: these should be tagged using the <foreign> tag, using the "lang" attribute to identify the language used. In order to do this, you must go back to the HEADER information and add, after the ending of the </ENCODINGDESC>, a <PROFILEDESC and <LANGUSAGE tag, with additional <LANGUAGE tags for each language. Inside the <LANGUAGE tags go the ID= codes, for example, ID=FRE; ID=GRC; ID=ITA; ID=LAT; ID=GER. (Classical Greek is GRC). To tag within the text, use the <foreign> tag and the "lang" attribute with the same code as in the ID= above.

Eh? what? baffled by a woman! Ah, sapristi! she can run!

<L>Eh? what? baffled by a woman! Ah, <FOREIGN LANG=ita

rend=italics>sapristi</FOREIGN>! she can run!</L>



Be consistent in your use of quotation marks, questions marks and exclamation points and spaces: often the text converts to lines that look like:

" Death to all my hopes ! "

For ease in reading, convert to no space:

"Death to all my hopes!"



Front Matter

The texts we'll be encoding will generally consist of the following elements:

· Front Matter: Preface, Table of Contents, Dedication, etc.

· Body of the Text: the poems, plays and other materials included in the main body.

· Back Matter: appendices, indices, notes, or other materials following the main body.

The front and back matter are normally the most complicated sections; once you get past those, the poetry itself will seem generally straightforward. The overall structure of a typical document looks like this:



[Source and processing information goes here--a template is available.]

</teiHeader> [/"name of tag" means end of that tag]


[Title page, preface, etc. goes here]



[Main body of the text goes here]



[Appendices, etc. goes here]





Front Matter

This section contains any material that precedes the work proper, such as table of contents or a preface. Generally, the first element of the front matter is the title page. A transcription of the title page should be included, following these guidelines.

Title Page

· <titlePage>:marks off the title page of a given text

· <docTitle>: contains the title itself, as well as its subsections

· <titlePart>: marks a particular subsection of the title, which may be given the following attributes:

· type="main": the main title of the work

· type="sub": a subtitle of the work

· type="alt": alternative title of the work

· type="desc": descriptive paraphrase of the work included in the title

· <byline>: contains the primary statement of responsibility given for a work on its title page, including such elements as:

· the words "by" or "edited by" (if present)

· <docAuthor>: a transcription of the author's or editor's names as it appears on the title page.

· the words "author of …." and similar descriptive statements (if present)

· <docImprint>: the imprint statement, which includes the place <pubPlace> and date of publication <docDate> and the publisher's name <publisher>.

Examples of title pages:



<titlePart>Liberty Lyrics.</titlePart>


<byLine>by<docAuthor>L.S. Bevington.</docAuthor>



<publisher>Printed and Published by James Tochatti "Liberty" Press.</publisher>







<titlePart type="main">Is There a Text in This Class?</titlePart>

<titlePart type="sub">The Authority of Interpretive Communities</titlePart>


<docAuthor>Stanley Fish</docAuthor>

<docImprint><publisher>Harvard University Press</publisher>

<pubPlace>Cambridge, Massachusetts</pubPlace>

<pubPlace>London, England</pubPlace>




Other material in the front matter--dedications, prefaces, etc.--will be tagged with <div1> tags, and use the type attribute:

· <div type="dedication"> a formal offering or dedication.

· <div type="frontispiece"> a pictorial frontispiece, possibly containing text.

· <div type="preface"> a forward or preface by the author, explaining the content, origin or purpose of the text.

· <div type="colophon"> statement appearing at the beginning of a book which describes the conditions of its physical production; it often includes the details of how many copies were printed. (This may also occur in back matter.)

· <div type="advertisement"> Publishers' advertisements

· <div type="contents"> a table of contents.


Tables of Contents

Tables of contents present a somewhat more complex structure, and should be encoded using <list>, <item> and <ref> tags. The page number, as it appears in the printed table of contents, will go between the <ref></ref> tags. The <ref> tag has an attribute "Target=" that refers to the page number where the poem begins. For this reference to work, each page break tag, <pb>, will have to include a corresponding "id=": <pb id="p1" n="l">. This id attribute is not necessary in works with no table of contents. For further help in understanding, look at the sheet REF ELEMENT (TABLE OF CONTENTS).


Upward. . . . . . . 1

January . . . . . . 4

Unto This Present . . . . . 5

<div1 type="contents">


<item>Upward <ref target="p1" rend="align right">1</ref></item>

<item>January <ref target="p4" rend="align right">4</ref></item>

<item>Unto This Present <ref target="p5" rend="align right">5</ref></item>



The Text Body: Verse

Poems in the body of the text will be tagged using the <div1> for the main structural element, using a "type" attribute to identify the overall structure, and with <div2> tags (and so on) for other structural elements within the poem itself (except for the lowest level, e.g. stanza, verse paragraph--see below). The most common attributes for poems will be:

· <div1 type="poem">

· <div1 type="sonnet">

· <div1 type="drama">

For other structural elements, such as cantos, scenes, or parts, use <div2> tags, and identify them using the type attribute:

· <div2 type="canto">

· <div2 type="scene">

· <div2 type="part">


Titles and Headings

Use the <head> tag for all titles, subtitles, etc. For anything but the main title, use the "type=" attribute to note its function. Typical types of <head>ings other than the main title include:

· <head type="sub">

· <head type="dedication">



An epigraph, or quotation of a passage from another poem or work, frequently precedes a poem. There are two different kinds of epigraphs--those that cite an author, and those that do not. There can also be prose or verse epigraphs. For those with no author, the structure is fairly simple:

· <epigraph><p>The mass of men lead lives of quiet desperation.<p></epigraph>


· <epigraph><l>Use every man after his desert</l>

<l>And who should 'scape whipping?</l></epigraph>

For those epigraphs which cite an author, the structure is more formal:

· <epigraph><q><p>The mass of men lead lives of quiet desperation.</p></q>

<bibl>Henry Thoreau</bibl></epigraph> (This is for a prose epigraph; poetry would have a <lg> <l> tag instead of the <p> tag.)


Line Groups

Within a poetic hierarchy, the "lowest" distinct unit of poetic composition gets the element tag <lg> rather than a <div> tag, with the type of grouping noted using the type attribute. The most frequent types of line groups you will encounter include:

· type="stanza": regular stanzaic structure

· type="verse paragraph": irregular breaks between sections

· type="couplet": a two-line rhymed unit, sometimes used to end a sonnet

· type="octet": an eight-line unit of a sonnet

· part="sestet": a six-line unit of a sonnet

(If you have questions about these, please ask). Thus, a poem divided into regular stanzas would be encoded:

<div1 type="poem">

<lg type="stanza">

<l>[line of stanza] </1>

<l>[line of stanza]</l>


In a more highly structured poem, each of the upper and middle units would receive <div> tags while the lowest level gets tagged with <lg>. Thus, in a poem arranged poem-book-canto-stanza:

<div1 type="poem">

<div2 type="book">

<div3 type="canto">

<lg type="stanza">


Lines of Verse

Lines of verse use the <l> tag. Lines that are indented are noted used the attribute "rend"

· rend="indent1": one tab stop

· rend="indent2": two tab stops

· rend="indent3": three tab stops


Typographic divisions within poems

Typographic division within poems, such as a line of asterisks or periods or a line, should be encoded as <MILESTONE> with attribute values of UNIT=typography and N=* * * * * or N=. . . . . or N= ______, depending on the typographic character used to divide the line groups. See NEW SAMPLE TEXT 2. Use the entity for the characters asterisk or dagger, etc.


Closers and Trailers

Occasionally, a poem will end with a date, or some other type of information not part of the poem. This information is usually encoded using the <closer> tag, as, for example:

<closer> 1876.</closer>. If a single poem ends with "The End" or "Finis", it will also be tagged using the <closer> tag, inside the DIV. If the last poem in the book ends with with "The End" or "Finis", consider it as applying to the entire book, and encode it also as <closer>, but put it OUTSIDE the DIV, but inside of BODY. Use the <trailer> tag for printer’s or publisher’s name and address at the end of the book.


Sample Tagged Verse

Amy Levy's "The London-Plane Tree":

<PB ID=p1 N=1>>

<DIV1 TYPE=poem>

<HEAD>A London Plane-Tree.</HEAD>

<LG TYPE="stanza">

<L>G<HI rend="smallcaps">REEN</HI> is the plane-tree in the square,</L>

<L REND="indent1">The other trees are brown;</L>

<L>They droop and pine for country air;</L>

<L REND="indent1">The plane-tree loves the town.</L>


<LG TYPE="stanza">

<L>Here from my garret-pane, I mark</L>

<L REND="indent1 ">The plane-tree bud and blow,</L>

<L>Shed her recuperative bark,</L>

<L REND="indent1">And spread her shade below.</L>


<LG TYPE="stanza">

<L>Others the country take for choice,</L>

<L REND="indent1">And hold the town in scorn;</L>

<L>But she has listened to the voice</L>

<L REND="indent1 ">On city breezes borne.</L>




The Text Body: Drama

Dramatic verse uses the same <1> and <lg> tags as poetry. Generally, the overall drama is tagged as <div1>, with the next structure in the hierarchy, e.g. Act I, tagged as <div2> and so on. There are additional tags for speeches, speakers and stage directions.

Speeches are noted using the <sp></sp> tag pair. Speakers, if identified, are noted using the <speaker></speaker> tag pair, which is nested within the <sp></sp> tags.


<l>To be or not to be ....</1>


Stage directions use <stage></stage> tags, which may occur either inside or outside of

<sp></sp> tags.

<stage>Exit, pursued by a bear.</stage>


Sample tagged drama

From Amy Levy's "Medea": <DIV1 TYPE="drama">


<HEAD TYPE="sub">(A Fragment in Drama Form, After Euripides.)</HEAD>

<LIST type="cast">


<HEAD>Citizens of Corinth.</HEAD>






<DIV2 TYPE="scene">

<STAGE TYPE="setting">Scene: Before Medea's House.</STAGE>

<STAGE TYPE="entrance">[Enter Medea.]</STAGE>


<L>T<HI>O-DAY</HI>, to-day, I know not why it is,</L>

<L>I do bethink me of my Colchian home.</L>

<L>Of pride in strength, when strength was all unprov'd,</L>

<L>Of hope too high, too sweet, to be confined</L>

<L >In limits of conception.</L>

<PB N="p36">

<L>I am sad</L>

<L>Here in this gracious city, whose white walls</L>

<L>Gleam snow-like in the sunlight; whose fair shrines</L>

[lines omitted]


[more lines omitted]



<L>Gods, spare me your strange women, so say I.</L>

<L>Give me gold hair, lithe limbs and gracious smiles,</L>

<L>And spare the strangeness.</L>



<L>I do marvel much</L>

<L>How she will bear the tidings.</L>




<L>Lo, behold!</L>

<L>Here comes our Jason striding 'thwart the streets.</L>

<L>Gods! what a gracious presence!</L>




<L>I perceive</L>

<L>The Colchian on the threshold. By her looks,</L>

<L>Our idle talk has reached her listening ears.</L>


<STAGE TYPE="entrance">[Enter Jason. Medea reappears on the threshold.] </STAGE>



Back Matter

This section contains any material that follows the work proper, such as notes, indices, advertisements, etc.

Structural Division Attributes within <back>:

Each division should be tagged with an <div1>:

· <div1 type="appendix"> self-contained section of the work

· <div1 type="glossary"> list of terms and definitions

· <div1 type="notes"> includes textual or other kinds of notes

· <div1 type="biblio"> a list of bibliographical citations

· <div1 type="index"> any form of an index to the work

· <div1 type="colophon"> statement appearing at the end of a book which describes the conditions of its physical production; it often includes the details of how many copies were printed, and the fonts and designs used.

· <div1 type="advert"> Publishers' advertisements



You will receive a printout of the formatted text without tags. In general, first read a line from the text (perhaps reading out loud), then compare it to the same line in the printout, checking for spelling and encoding errors.


  1. Spelling: does the typed copy match the way the word is spelled in the book?
  2. Italicized words: are they in italics in the typed copy as well?
  3. SMALL CAPS: are the initials words or word in a poem in SMALL CAPS in the typed copy if they are in the printed book?
  4. Indentations: are the indents reflected in the typed copy?
  5. Spacing: does the spacing between line groups look similar in the printed and typed copies?
  6. Page numbers: are the page numbers represented correctly and in order? Are brackets [ ] provided around the page number if it is not actually printed on the page in the printed book? Remember to start with the first physically numbered page in the book and work back to the beginning of the text block to figure out the pagination. I will explain this to you before you start.
  7. Missing text and/or lines: add missing text in the margins, or make a note so the lines or text can be filled in from the printed book later.
  8. Punctuation: punctuation errors are easy to miss; please look out for them.

Circle errors with a colored pen, and make a check mark in the right margin. Either correct in place or in the right margin. Sometimes it is helpful to add a note such as sp for "add space" or (the delete symbol in editing) for words to take out. Use a mark to show where a new line is needed. Ignore the size of the type font—it will not always accurately reflect the different sizes in the printed book, but it may approximate them.

 Please read once line-by-line and then again for content. You will be surprised by how many errors you catch in the "reading for content" pass!

 Any footnotes in the text printed at the bottom of pages will be found at the end of the print out.

 Sample corrections are attached.

Rev. 1/17/02