README.md

   1
   2 This is a Tengwar transcriber suitable for transcribing Sindarin
   3 Elvish from a phonetic encoding of the Latin alphabet, to the General
   4 Use mode of the Tengwar.  It is written in JavaScript and is suitable
   5 for use as:
   6
   7 -   A plain script in a web page, `tengwar.min.js`.
   8 -   A CommonJS module as used by Node or Mr, with the NPM package name
   9     ``tengwar``.
  10
  11 Using the Script
  12 ================
  13
  14 The script searches the document for elements with the `tengwar` class.
  15 The class must also include either `parmaite` or `annatar` to select the
  16 rendering font.  This is not merely for the purpose of applying the
  17 appropriate web font, but also instructs the script on which bindings to
  18 use for kerning tehtar.  The body of a `tengwar` class must be rendered
  19 with the included Tengar Annatar variant webfont or Tengwar Parmaitë
  20 using the included `tengwar-annatar.css` or `tengwar-parmaite.css`.
  21
  22 If the element has a `data-tengwar` property, that property is expected
  23 to contain phonetic letters from the latin alphabet and gets transcribed
  24 into bindings for the Tengwar Anntar font in the General Use mode,
  25 popular for Sindarin and English.  The script populates the element's
  26 inner HTML with the font bindings, rendering the desired tengwar text
  27 visible.
  28
  29     class="tengwar annatar"
  30
  31 If the element has a `data-mode` property, the latin letters
  32 are instead transcribed into key bindings through the
  33 Classical mode, popular for Quenya, or the mode of Beleriand.  Various
  34 options can also be applied.
  35
  36     data-mode="general-use no-ach-laut reverse-curls"
  37     data-mode="classical reverse-curls"
  38     data-mode="beleriand"
  39     data-mode="general-use black-speech"
  40
  41 If the element has a `data-encoded` property, the value is expected to
  42 be a description of the tengwar and tehtar to display like
  43 `romen:a;ungwe:a;romen:o;numen` for "Aragorn" in the General Use mode.
  44
  45     data-encoded="romen:a;ungwe:a;romen:o;numen"
  46
  47 Of course, a page can bypass the whole automated transcription process
  48 by statically populating the element with the desired key bindings and
  49 using neither of these data properties.
  50
  51 The script checks for modern browser features and stops if the necessary
  52 features are not present.
  53
  54 Using the Modules
  55 =================
  56
  57 -   `tengwar/general-use` transcribes phonetic latin letters, as Tolkien
  58     wrote it, into Tengwar Notation in the General Use mode, suitable
  59     for Sindarin and many other languages.
  60     -   `transcribe(text, options)` to key bindings for the font.
  61         Tengwar Annatar by default.
  62     -   `encode(text, options)` to Tengwar Notation
  63     -   `parse(text, options)` to Tengwar Object Notation
  64     -   `makeOptions(options)`
  65         -   `font` defaults to the TengwarAnnatar module.
  66         -   `block` whether to include HTML tags for paragraphs and line
  67             breaks.
  68         -   `plain` whether to exclude all HTML from the output,
  69             making it suitable for plain text..
  70         -   `blackSpeech`: In the Black Speech of the ring inscription,
  71             the "o" and "u" curls are reversed, medial "r" is ore before
  72             consonants in addition to final "r", and "sh" and "gh" used
  73             extended tengwar.  This implies `reverseCurls` and
  74             `medialOre`.
  75         -   `doubleNasalsWithTildeBelow`: Many tengwa can be doubled in
  76             General Use mode by placing a tilde above the tengwa, and
  77             many tengwa can be prefixed with the sound of the
  78             corresponding nasal by putting a tilde below the tengwa.
  79             Tengwar that represent nasal sounds have the special
  80             distinction that either rule might apply in order to double
  81             their value.
  82             -   `false`: by default, a tilde above doubles a nasal
  83             -   `true`: a tilde below doubles a nasal
  84         -   `reverseCurls`: In the Black Speech of the ring inscription,
  85             among other samples, the "o" and "u" tehtar are reversed.
  86             -   `false`: by default, the "o" tehta curls forward, and
  87                 "u" backward.
  88             -   `true`: "o" curls backward, "u" forward.
  89         -   `swapDotSlash`
  90             -   `false`: by default, "i" is a dot and "e" is a slash.
  91             -   `true`: "i" is a slash, "e" is a dot.
  92         -   `noAchLaut`
  93             -   `false`: by default, "ch" is transcribed as ach-laut,
  94                 the "ch" as in "Bach".  "cc" is transcribed as "ch" as
  95                 in "chew".
  96             -   `true`: "ch" is interpreted as the "ch" as in "chew".
  97         -   `sHook`
  98             -   `false`: by default, "is" is silme-nuquerna with an I
  99                 tehta.
 100             -   `true`: "is" is a short carrier with an I tehta and S
 101                 hook.
 102
 103 -   `tengwar/classical` transcribes phonetic latin letters into Tengwar
 104     Notation in the Classical mode, most commonly used for Quenya.
 105     -   `transcribe(text, options)` to key bindings for the font.
 106         Tengwar Annatar by default.
 107     -   `encode(text, options)` to Tengwar Notation
 108     -   `parse(text, options)` to Tengwar Object Notation
 109     -   `makeOptions(options)`
 110         -   `font` defaults to the TengwarAnnatar module.
 111         -   `block` whether to include HTML tags for paragraphs and line
 112             breaks.
 113         -   `plain` whether to exclude all HTML from the output,
 114             making it suitable for plain text..
 115         -   `viyla`: In the earlier forms of the mode, the tengwa
 116             "vilya" represented the sound of the letter V.  The tengwa
 117             "vala" eventually replaced its role and "vilya" was renamed
 118             "wilya", and used for the sound of W, consonantal U.
 119             -   `false`: by default "wilya" serves for W and "vala" for
 120                 V.
 121             -   `true`: "vilya" serves for V, and W is interpreted as
 122                 the vowel U.
 123         -   `reverseCurls`: In the Black Speech of the ring inscription,
 124             among other samples, the "o" and "u" tehtar are reversed.
 125             -   `false`: by default, the "o" tehta curls forward, and
 126                 "u" backward.
 127             -   `true`: "o" curls backward, "u" forward.
 128         -   `iuRising`: In the Third Age, IU is a rising diphthong,
 129             meaning that the stress is on the second sound.  Whether to
 130             represent a rising diphthong in the same fashion as other
 131             diphthongs is a matter of conjecture.
 132             -   `false`: by default, IU is rendered as the I tehta over
 133                 "ure", the U tehta.
 134             -   `true`: IU is rendered as the tengwa "anna" with a Y
 135                 tehta below, and a U tehta above.
 136         -   `classical`: Before the Third Age (as defined by the
 137             Namarië) transcribers dealt with R and H differently.  R can
 138             be rendered as either "romen" or "ore", but the rules
 139             differ.  In the classical period, R is interpreted as "ore"
 140             only when it appears between vowel sounds.  In the Third
 141             Age, R is interpreted as "ore" before consonants and at the
 142             end of words.  The treatment of H is more complex and I have
 143             only given it a rough draft.
 144             -   `false`: by default, we transcribe in the pattern of the
 145                 Namarië poem, where "ore" is used finally and before
 146                 consonants.
 147                 -   H is interpreted as "hyarmen".
 148                 -   HY is interpreted as "hyarmen" with the underposed
 149                     "y" tehta.
 150                 -   HW and WH are interpreted as "hwesta".
 151                 -   CH is interpreted as "harma".
 152                 -   HT is interpreted as "harma" followed by "tinco".
 153                     Therby, HT implies CHT.
 154                 -   HL is interpreted as "halla" followed by "lambe".
 155                 -   HR is interpreted as "halla" followed by "romen".
 156             -   `true`: "ore" appears only between vowels.  The
 157                 treatment of "H" depends on whether "harma" has been
 158                 introduced yet.
 159         -   `harma`: In the Classical period, "hyarmen" implied the
 160             following-Y.  Then "hyarmen" served as breath-H medially,
 161             and "harma" served as breath-H initially and was renamed
 162             "aha" in that role.
 163             -   `false`: by default
 164                 -   H is interpreted as "halla" in all positions
 165                 -   HY is interpreted as "hyarmen" with underposed "y".
 166                 -   HT still implies CHT so treated as "harma" as above.
 167                 -   CH, HL, HR, and HW (and WH) are not affected.
 168             -   `true`: the oldest form of the mode
 169                 -   H initial is interpreted as "harma"
 170                 -   H medial is interpreted as "hyarmen"
 171                 -   HY is interpreted as "hyarmen"
 172                 -   HT still implies CHT so treated as "harma" as above.
 173                 -   CH, HL, HR, and HW (and WH) are not affected.
 174
 175 -   `tengwar/beleriand`: transcribes phonetic latin letters into Tengwar
 176     Notation in the mode of Beleriand, which is suitable for Sindarin
 177     and uses full tengwar for most vowels, instead of tehtar.
 178     -   `transcribe(text, options)` to key bindings for the font.
 179         Tengwar Annatar by default.
 180     -   `encode(text, options)` to Tengwar Notation
 181     -   `parse(text, options)` to Tengwar Object Notation
 182     -   `makeOptions(options)`
 183         -   `font` defaults to the TengwarAnnatar module.
 184         -   `block` whether to include HTML tags for paragraphs and line
 185             breaks.
 186         -   `plain` whether to exclude all HTML from the output,
 187             making it suitable for plain text..
 188
 189 -   `tengwar/tengwar-annatar`: Translates Tengwar Object Notation into
 190     key bindings for Johan Winge’s Tengwar Annatar font.  Provides the
 191     `makeColumn` primitive which is aware of how a column of tengwar and
 192     tehtar can transform to accommodate additional tehtar with this
 193     font.
 194     -   `transcribe(tengwarObjectNotation, options)`: to Tengwar Annatar key
 195         bindings
 196         -   `plain`: plain text, no markup
 197         -   `block`: block markup, with paragraph and line break tags
 198     -   `makeColumn(tengwa, above, below)`
 199         -   `canAddAbove()`
 200         -   `addAbove(tehta)`
 201         -   `canAddBelow()`
 202         -   `addBelow(below)`
 203         -   `addFollowing(following)`
 204         -   `addTildeAbove()`
 205         -   `addTildeBelow()`
 206         -   `addError(error)`
 207 -   `notation`
 208     -   `encode(tengwarObjectNotation)`: to Tengwar Notation
 209     -   `decode(tengwarNotation, makeColumn)`: to Tengwar Object
 210         Notation.
 211     -   `decodeWord(tengwarNotation makeColumn)`: to Tengwar Object
 212         Notation for just one word (no nested arrays).
 213
 214 ## Tengwar Notation
 215
 216 Tengwar Notation is useful for succinctly representing the first stage
 217 of transcription, before translation to key bindings for a particular
 218 font.  The notation uses the names of the tengwa followed by a list of
 219 tehtar in a consistent order:
 220
 221 -   *column* =
 222     -   *tengwa*
 223     -   ":" if there are any following tehtar
 224     -   *tehtar* delimited by ","
 225         -   *tehta above* if applicable
 226         -   *tehta below* if applicable
 227         -   *following tehta* if applicable
 228         -   "tilde-above" if applicable
 229         -   "tilde-below" if applicable
 230 -   *word* = *column* delimited by ";"
 231 -   *sentence* = *word* delimited by " "
 232 -   *stanza* = *sentence* delimited by "\n"
 233 -   *paragraph* = *stanza* delimited by "\n\n"
 234 -   *section* = *paragraph* delimited by "\n\n\n+"
 235
 236 The notation is useful for manually describing a transcription, either
 237 to override the transcriber, or for testing a transcriber.
 238
 239 ## Tengwar Object Notation
 240
 241 Tengwar Object Notation represents a word of Tengwar as an array of
 242 objects.  Each object has properties,
 243
 244 -   `tengwa` the name of one of the tengwar or punctuation mark in my
 245     obtuse pidgin of punctuation names: "comma", "full-stop",
 246     "exclamation-point", "question-mark", "open-paren", "close-paren",
 247     "flourish-left", or "flourish-right".  "vilya" is always represented
 248     as "wilya" and "aha" is always "harma", regardless of what name is
 249     appropriate for the mode.
 250 -   `above` may be a tehta including "a", "e", "i", "o", "u", "ó", or
 251     "ú".  Note that "á", "é", and "í" are not supported diacritics.
 252 -   `below` may be "y".
 253 -   `following` a tehta like "s", "s-inverse", "s-extended", or
 254     "s-flourish".
 255 -   `tilde-above` boolean.
 256 -   `tilde-below` boolean.
 257
 258 Words are wrapped in an array to make a sentence.  Sentences are wrapped
 259 to make paragraphs.  Paragraphs are wrapped to make sections.  Somehow
 260 I’ve neglected stanzas within paragraphs.  This will be remedied in a
 261 future version, and the nodes will probably be revised to be more
 262 sophisticated than merely nested arrays.
 263
 264 A font module must have a `makeColumn` function that produces objects
 265 with these properties and the attendant methods as described for the
 266 Tengwar Annatar module above.
 267