Transcription Coding

MEL developed TextLab primarily to facilitate the transcription of original and revised texts found in manuscript. The text editor automates the identification of revision sites on leaf images and the linking of those sites to transcribed texts associated with the sites. TextLab uses TEI’s P5 version of XML to code these sites on leaf images and revision texts in the transcription. In the hierarchy of TEI codes, an “element” contains certain “attributes,” which can be further assigned “values” by the editor. For instance, standard revision elements (<add>, <del>, <restore>, <subst>) contain attributes that describe the textual content of their related revision sites, such as the placement of an insertion relative to the base line, its stage of composition, the rendering of a deletion, the hand that inscribed it, and the medium in which the revision was inscribed. MEL editors configure TextLab by assigning optional values to these attributes, such as the values “Herman Melville” and “Elizabeth Shaw Melville” for the “hand” attribute or “pencil” and “ink” for the medium attribute, and so on. TextLab can also be reconfigured to suit the specifics of revision in other Melville works or of the writing process of writers other than Melville.

TEI elements, attributes, and values enable nuanced description of both manuscript leaf and word content. Notice that an element code uses angle brackets (and slash) to indicate the <beginning> and </end> of the range of words the code is meant to affect. Certain elements can also be contained within other elements. For instance, the nesting of codes—“of the <restore><del>British</del></restore> fleet”—means that in the phrase “in the British fleet” the word “British” was deleted then restored. When attributes and values are added, the coding becomes more involved:

of the <**RESTORE** **facs="#img_23-0025"** **change**="#StDa"> <**DEL** facs="#img_23-0025" **rend**="single-stroke _ink1" **hand**="#HM" change="#StDa">British</del> </restore\> fleet

Here, in facs=”#img_23-0025”, the “facsimile” attribute, with its assigned image value links both the deletion and restoration to revision site 25 on leaf image 23. The change attribute records that both revision acts occurred in stage D (substage a) in the composition of Billy Budd. In addition, the rend and hand attributes within the deletion element indicate that HM (i.e. Melville and not his wife ESM) deleted “British,” in one pen stroke, using “ink1” or black ink. All of these attributes and values are contained within the DEL element, wrapped within the RESTORE element.

Digital humanities coding is daunting, and fortunately TextLab was designed to simplify the coding process for editors (or any user). In TextLab’s Upper Frame containing the leaf image, the transcriber first identifies revision sites on the leaf by using a drawing tool to place a box (called a “zone”) around each site. TextLab automatically assigns a unique number to each box and generates the “facs=” attribute designating leaf image and box numbers. Once the transcriber types in the manuscript text in the Lower Frame, s/he can then highlight a specific revision text and click on the corresponding numbered zone in the Upper Frame. A dialogue box pops up, enabling the transcriber to tick off the appropriate elements, attributes, and values for the highlighted revision text. Once the transcriber clicks Enter, TextLab automatically surrounds the revision text with the kinds of tangled coding witnessed above. In short, the transcriber identifies the site with a series of clicks, and the machine applies the coding.

TEI codes are also used to describe inscriptions not intended for publication. They include folios (leaf and leaf image numbers), carets and other insertion devices, such as bubbles, section dividers, pointers, squiggles, date entries, and written instructions like “insert here.” TextLab uses the <metamark> element to distinguish these kinds of composition information. In MEL’s XSLT transformation algorithm, <metamark> information appears properly positioned in the Diplomatic Transcription but is removed from the Base Version.

In configuring values for MEL’s TEI revision elements and attributes, we draw upon inscription descriptors designated by Hayford and Sealts (HS) in their 1962 edition of Billy Budd, Sailor (An Inside Narrative).

  • Insertions may be placed in line, above or below it, or in margins.
  • Insertions may be rendered with or without caret and/or in a bubble.
  • Deletions may be achieved by strike through, multi-strike, hashmark, or erasure.
  • Metamarks may represent folio, caret, half-caret, insertion device, and composition instruction.
  • Insertions and deletions may appear on different pieces of paper:
    • Clip: a slice of text cut away from a leaf composed at an earlier stage and straight-pinned to a later leaf.
    • Patch: a slice of text composed in one stage and affixed to a leaf composed at an earlier stage.
    • Mount: a leaf (often full-length) to which clips or patches are attached, which might also include text of its own.
  • Inscription and deletions may be rendered by different hands:
    • Herman Melville
    • His wife Elizabeth Shaw Melville
    • The editor Raymond Weaver
    • The staff of Houghton Library
  • Inscription and deletions may be composed in different media
    • Pencil
    • Ink (black, gray, brown, blue)
    • Crayon (green, red, orange, blue)
  • Stages and sub-stages of composition for folios and revisions follow HS designations.

The Limits and Uses of Revision Coding.

As explained in Modes of Digital Editing, MEL’s policy is to perform primary editing (transcription) separately from secondary editing (revision annotation) so that the necessarily interpretative nature of secondary editing does not overly determine the identification of revision sites in primary editing. In addition, our policy is to “granularize” in the primary identification of revision sites so as to facilitate the discernment of as many combinations of steps in the secondary editing of revision sequences as possible. In truth, experienced transcribers will attest that identifying, describing, and coding granularized revision sites necessarily requires some understanding of the revisions (if not the precise steps) that are in play. The only caution is to not let the inevitable interpretive sequencing dictate or inflect coding in primary transcription. By design, TextLab keeps the primary and secondary editing workspaces on separate platforms. Similarly, its coding protocols embody the limits we put on granularity and interpretative sequencing.

The issues at stake regarding granularization and interpretation are exemplified in MEL’s use of the TEI “substitution” element or <subst>. Let it be said at the outset that MEL uses this code in ways other than its intended use. The <subst> element combines a <del> and an <add> in one unit and is meant to describe what one might assume is the most common of revision phenomena, in which the writer deletes a word or phrase and adds “substitute” wording for the deletion. In fact, this revision scenario is not as common or as straightforward as the <subst> option might suggest. In many cases, the so-called substitution is linked to or triggered by other revisions sites that, when taken together, involve a multi-step revision sequence that may occur on the same line, different lines, or different leaves. (See revision sites for “quite foreign to his nature” and “naval discipline” versus “naval decorum,” both in Chapter 1, and for “ruddy-tipped daisies” (leaf 203) versus “ruddy clover” (leaf 211) in Chapter 18, respectively.) Of course, “substitution” is just an arbitrary descriptor for a code that links a <del> to an <add>, but call it what you will, the linking of these two elements is an interpretive act, and the problem is that <subst> instantiates its interpretation in the primary transcription coding, which might in turn complicate or even inhibit further, more nuanced interpretations in secondary revision sequencing.

Accordingly, MEL uses <subst> not as it is intended but rather to address another common feature in the revision process. Occasionally, Melville transforms a word into another word by adding letters to the original or reshaping its letters into other letters. For instance, in Chapter 1, Melville originally inscribed in ink the word “capstan” (a horizontal circular winch for lifting heavy objects), without crossing his “t.” Later, in ink, he dotted the left side of the “n” to create an “i,” added a bump to the right side of the “n” to create a new “n,” and crossed the “t.” The resulting transformed word was “capstain,” which is an archaic spelling for “capstan” that had existed up into the late-eighteenth century (when Billy Budd takes place) but had become obsolete in the late-nineteenth century (when Melville wrote the novella). Since the older usage appears in dialogue spoken by Billy’s first captain, a big-hearted and down-to-earth merchant, Melville’s meticulous modifications to transform the word—recorded for the first time in this MEL edition—reveal his sensitivity to eighteenth-century dialect and skill at characterization. This revision site is particularly important in light of Melville’s late attention to narrative technique. How, then, might this revision be coded?

The reality is that the transformation of capstan to capstain cannot be coded for what it actually is because there are no TEI codes for “transforming parts of letters into other letters.” One “good-enough” solution might be to code the word as capsta<del>n</del><add>in</add>. But this kind of letter-level approach has additional problems that take us to the limits of granularity and of what coding needs to do in rendering revision. The problem is that in future searches of transformed words in the primary transcription the actual words of interest “capstan” and “capstain” would not show up. MEL’s solution, then, is to code transformations at the word level and to use <subst> to combine both original and transformed words in one unit. Here is how our coding of “capstan / capstain” appears in TextLab:

 <subst facs="#img_51-0044"> <del rend=" _ink1" hand="#HM" change="StBb" facs="#img_51-0044">capstan</del><add place="inline" rend="no-caret _ink1" hand="#HM" change="StBb" facs="#img_51-0044">capstain</add></subst>

While capstain is better described as a “transformation” than as a “substitution” for capstan, the <subst> code aptly identifies the transformative revision act. And because the “facs=” attribute directs us to the manuscript leaf and revision site where the transformation occurs, the reader can move from the diplomatic transcription—where “capstain” appears on the base line and where “capstan” pops up when “capstain” is moused-over—to the side-by-side leaf image, where one can make out the letter-by-letter transformations visually. The reader can also inspect the revision sequence / narrative(s) listed below the side-by-side display, where all steps in the revision act are sequenced.

The fundamental benefit of TEI coding is its facilitation of research. In editing Billy Budd, we have used TEI codes to register the stages and sub-stages of composition that Hayford and Sealts assigned to leaves and revisions; this effort will enable researches in sorting revisions by sub-stage in order to analyze patterns of Melville’s revision behaviors beyond the leaf level. Scholars can also test the excellent work of the Hayford / Sealts transcription against their own findings to modify or augment these stages of composition. Finally, the coding will facilitate MEL’s projected visualization, titled How Billy Grew, which aims to display the macro-revision of Billy Budd, in eight stages, from ballad to novella, and enable users to drill down in each stage to the micro-revisions of the sub-stage leaf level, as recorded by TextLab.

The Hayford-Sealts Genetic Transcription

Though dated in its reading text, the hard-bound version of the HS edition of Billy Budd, Sailor is an early, excellent, and still useful example of genetic editing. Apart from the scholarship displayed, a crucial feature is the attempt to link segments of the reading text to the corresponding transcriptions of leaves in the appendix. Leaf numbers placed in the margins tell readers which leaf transcription to consult. These marginal numbers pre-date digital linking by two decades. (In issuing its paperback version of Billy Budd, Sailor, the University of Chicago Press kept these marginal leaf numbers but eliminated the edition’s appendix and the leaf transcriptions to which the numbers referred.)

The genetic transcriptions themselves are a laudable attempt to render Melville’s revisions leaf by leaf; however, they require training to read. The full text of a leaf is rendered in paragraph format so that Melville’s manuscript lineation is not reproduced. Various symbols are used to indicate kinds of revision performed on texts (which are themselves placed in angle brackets). Separate symbols represent addition with or without caret, deletion in pencil or ink, erasure of pencil, restoration, etc. However, readers must memorize the symbols in order to read the transcription. In many cases, brief editorial interjections, in straight brackets, indicate more complicated text placement information. The leaf transcriptions are not accompanied with side-by-side leaf images, a diplomatic transcription, or revision sequence and narratives.

Most problematic, from the perspective of MEL’s fluid-text approach, is that the Hayford-Sealts genetic transcriptions attempt to integrate interpretive revision sequencing along with the physical description of a leaf’s revision texts. For instance, the transcription for Melville’s revision of “white forecastle-magnate” to “handsome sailor” (Chapter 1) is registered as a single step deletion of the former and insertion of the latter, whereas the manuscript shows two deletions and two insertions and the possibility of several multi-stepped revision scenarios.

Needless to say, the distracting features of the heavily symbolized genetic transcription are editorial accommodations largely due to the limitations of print technology. A more ambitious, accurate, and accessible representation, with photo-reproductions, would have required an unfeasible book-length appendix.

That said, no reliable digital edition of Billy Budd could be imagined without the data marshalled in the Hayford-Sealts appendix. Its genetic transcription has been indispensable to MEL editors in deciphering Melville’s handwriting, unearthing deletions obscured by ink and erasure, and identifying sub-stages, linked to the HS analysis of Melville’s re-paginations on manuscript leaves. TextLab not only facilitates MEL’s re-editing and encoding of the Billy Budd manuscript from scratch and generates the kinds of interactive features needed for fuller access to Melville’s revision process but, in its coding, it also preserves and corrects the scholarship contributed in the Hayford-Sealts genetic transcription.

What is the text of Billy Budd