A TEXT CREATION PARTNERSHIP Companion |
TCP Production Files
⇐ Return to main (index) page.
It is one thing to speak blithely about the creation of SGML/XML-encoded text, such as was created from early printed books under the auspices of the Text Creation Partnership at the University of Michigan Library; quite another to anticipate all the quirks of format, structure, and symbology early printers and authors could come up with during a singularly inventive age.This directory contains files used during the course of production to document decisions about the appropriate capture and encoding of features encountered in early print. The original keying and coding guidelines were written in a week; as books appeared, those guidelines, and the minimalist schema that went with them, needed to be revised and applied. Thus these documents, very much ad-hoc documents, created in response to actual questions and actual books As such, these are working files not intended originally for public distribution, but preserved here for the insight they might provide into the texts that the TCP produced -- as a record of decisions made, both wise and foolish.
VENDOR DOCUMENTATION
- Keying/encoding instructions, version 3 (partial revision 2004)
- Detailed guidelines for capturing the textual information in the EEBO items. (Version 1 and Version 2 are still available).
- Sample pages
- Index to 25+ sample pages from potential EEBO items, each presented as a page image or pair of page images (in .pdf) and a corresponding transcription (in SGML).
- Calculating EEBO error rates
- Documentation of sampling procedures and error-rate calculations
- Examples of errors
- Examples of "excusable" and "inexcusable" character-level transcription errors
- "Illegible" ($) overused (1)
- Examples of text unnecessarily marked as illegible
- "Illegible" ($) overused (2)
- More examples of text unnecessarily marked as illegible
- "Illegible" ($) overused (3) and (4)
- Yet more examples of text unnecessarily marked as illegible
- The other extreme: guessing
- Examples of text captured without sufficient warrant in the damaged original
- More of the same
- More examples of creative capture
- Roman numerals
- Two special problems with roman numerals: overlining and backwards-c
- TEI guidelines
- TEI P3 documentation, including element-by-element descriptions
- EEBO tagging "cheat sheet"
- Supplies a summary description of each of elements of the EEBO tag set (prepared for purposes of internal training)
- DIV TYPEs
- List of common and preferred values for the TYPE attribute
- Decorated initials
- Page of sample decorated initials (and non-decorated large initials for comparison)
- Apothecaries' symbols
- Capture of apothecaries' symbols (ounce, dram, scruple, etc.) as found in medical recipes.
- Alchemical symbols
- Some samples of alchemical symbols, with suggestions for capture (draft)
- Unusual symbols used as note markers
- Suggested capture for notes that use unusual symbols as markers.
- Noting subtle font changes
- Examples of subtle typeface changes to be marked with <HI> or <Q> (etc.).
- Inverted letters
- Examples of letters accidentally printed upside-down.
- Sample alphabets from the Caxton's press: ; His 'type 1' font ; His 'type 2' font ; His 'type 3' font ; His 'type 4' font ; His 'type 5' font ; His 'type 6' font
- Alphabets and letter combinations extracted from Caxton's type fonts, with tentative instructions on capture (to be revised as various letter combinations are seen in context in the books themselves).
- Additional symbols
- A supplement to the main keying instructions.
- Character capture issues (March 2005)
- Five proposed areas of change and innovation in character capture:
- When there's nothing there...
- Quick summary of the treatment of blanks and things missing.
INTERNAL (REVIEWERS') DOCUMENTATION
- [in progress] All characters list
- Experimental list of all available character entities, with pictures. [will never be as up to date as the auto-generated charent list on which it is based.]
- All character entities
- List of all available charents (both TCP-created and ISO sets) with displayable forms as used in derivative XML version of texts [auto-generated from character map file (see below)].
- Additional symbols/charents
- Growing list of symbols for reviewers to recognize and supply beyond those in vendor instructions
- More odd uses of symbols and characters
- Especially math
- Overview of review process
- Basic guide to the inhouse review process as a whole
- How to proof
- Step-by-step guide to the proofing stage (preparing and proofing sample)
- How to review
- Step-by-step guide to the tag-review stage (reviewing and correcting book)
- How to end
- Step-by-step guide to the final stage (checking in and reporting)
- More Latin abbrevs
- Further examples of Latin abbreviations, etc.
- Ambiguous abbreviations
- Examples and draft policy on ambiguous characters and symbols, with examples. (also includes more examples of apothecary's measures)
- Anglo-Saxon type
- [now moved to vendor area]
- Greek type and ligatures</a
- Early modern Greek type and its characteristic forms and ligatures: introduction and a few unorganized samples
- Hijacked symbols
- Some thoughts on symbols pressed into duty against their will.
Reviewers' questions and tips relating to ...
- Structure
- Using DIVs to group like things; Using GROUP instead of BODY for several texts with common title front and/or back matter; DIVS and LETTER tags; Songs embedded in plays; Using Q for "raisins in oatmeal"; OPENERs and CLOSERs as holdalls; Dialogues and Catechisms: Questioner and Responder. When pages are in the wrong order.
- Notes and Milestones
- Note markers; Note placement; Handling endnotes.; STAGE and NOTE combined; Use of MILESTONE unit attribute; MILESTONEs with illegible values; Multiple notes with a single reference
- Captions, Headings, and Quotations
- Captions in figures; ARGUMENTS in verse; Quotations on title pages; Authorial interjections in quotations; Changing &startq; into <Q> and <HI>; <Q>s broken by <P>s. Q+BIBL inside HEAD. Q+BIBL inside TRAILER. Using running header for division header. Placement of epigraphs.
- Letters
- New tag: POSTSCRIPT; DIV versus LETTER; SALUTE and SIGNED; Use of DATELINE and DATE (DATELINE and SIGNED, DATELINE without DATE, Including dating system within DATE); Sample CLOSERs with problems; Correct sample CLOSERs and SIGNEDs; Lists of signatories.
- Matters philosophical
- Correcting illegibilities; Counting in/excusable errors; Purpose of DIV types; Printer's errors.
- Matters miscellaneous
- Superscripts, including superscript o; Clarifying UNCLEAR; Long or short lines in verse; Abbreviations and abbreviation entities; Tagging "Explicit"s; editing TABLEs; Acrostic poem; "Spoken by..." in plays; letters for rubricator.
- Title Page matters
- Proofing the title page; Handling epigraphs on title pages; Imprimaturs, approbations, licenses
- Software tips (esp. TextPad)
- TextPad clip libraries; TextPad upgrades; TextPad syntax file (for color-coding tags); downloading EEBO pdfs.
- Divisions (DIVs)
- Assigning div types; Sample div types; Use of "N" attribute alongside "TYPE"
- Lists
- Lists with curly braces; Genealogies as lists; Tables of Contents and Indexes as lists; Changes to the model of LIST; Syllogisms as lists
- Character capture issues
- Z and yogh in Scottish texts; Other uses of z; I/J; Illegibilities
Code
For internal use only
Vendors' coding and capture queries (all very old)
- (No. A1) Re: Drama tags (<SP>, <SPEAKER>) in non-dramatic dialogs. Marginal notes and numbers in prose texts. Page-level illegibility (see now P12 instead).
- (No. A2) Re: Milestones.
- (No. A3) Re: Musical notation.
- (No. A6) Re: Single table, illustr., etc. spanning multiple pages.
- (No. P1) Re: Marginal notes and numbers in prose texts. Strange "q"-like character in Latin passage.
- (No. P2) Re: Odd characters: stars, pointing fingers, and dot-triplets.
- (No. P3) Re: Braces; <STAGE> directions; marginal notes IMPLICITLY linked to asterisks in the text.
- (No. P4) Re: Interlinear numbers in a "puzzle" poem; <SPEAKER> tags; <SPEAKER>s identified only by number.
- (No. P5) Re: ee and oo ligatures with acute accent marks
- (No. P7) Re: Numbers appearing usually (but not always) at beginnings of <P>s; specialized vs. default (fallback) tagging; blocks of text after FINIS.(<BACK> matter).
- (No. P8) Re: identifying <LETTER>s buried in running text.
- (No. P9) Re: missing t.p.; verse paragraphs; poetic letters; analytical summary table of contents; list vs. table; lapidary inscriptions; fractions; mismatched catchwords (missing pages?)
- (No. P10) Re: text attached to figures; acrostics printed at an angle.
- (No. P11) Re: in-line figures; overlining (of roman numerals).
- (No. P12) Re: damaged and illegible text; out-of-sequence pages
- (No. P13) Re: song lyrics interspersed with musical notation
- (No. P15) Re: duplicate pages: capture both or one & if the latter, which one?
- (No. P16) Re: right-justified words at ends of verse lines
- (No. P17) Re: multiple typefaces used concurrently, partly to mark quotations
- (No. T1) Re: miscellaneous tagging problems exemplified.
- Question log (1) regarding the bidding process
- Questions (with answers) received from data conversion firms, as well as updates and announcements.
- Question log (2) regarding setup and production
- Questions (with answers) received from data conversion firms, as well as updates and announcements.
Accumulated Wisdom garnered by the Oxford staff
N.B.: this section appeared originally on the web site of Oxford's Bodleian Library, and represented (mostly) a compilation of email responses to particular issues in the capture and encoding of early modern books.
Encoding
- <ADD>
- <CLOSER>
- DIV types
- Drama
- <FIGURE>
- <GAP>
- <HEAD>
- <LETTER>
- <LG>
- <LIST>
- Music
- <NOTE>
- <OPENER>
- <Q>
- Structure
- <TABLE>
- Title Pages
Transcription
- Abbreviations and Ligatures
- Fonts
- Foreign alphabets
- Miscellaneous
- Punctuation
- Unresolved Queries
- Symbols
Technical
- DTD and image sets
- Other
Miscellaneous
- Matters philosophical
- Correcting illegibilities; Counting in/excusable errors; Purpose of DIV types; Printer's errors.
- Matters miscellaneous
- Superscripts, including superscript o; Clarifying UNCLEAR; Long or short lines in verse; Abbreviations and abbreviation entities; Tagging "Explicit"s; Editing TABLEs; Acrostic poem; "Spoken by..." in plays; Letters for rubricator.