NLA Trial index

NLA Trial Articles from 1829

Notes
  1. Accuracy of OCR and overProof is measured in comparison with the human corrections. We know human corrections in this sample are incomplete, and themselves contain errors, but they are the best we could find automatically from the NLA newspapers corpus, tagged as completely corrected then further filtered to those with at least 3 corrections, at least 40% of lines corrected and lowest third percentage of non-dictionary words.
  2. Accuracy is measured by a separate process from that used to colour words in this output: the colouring process is heuristic, and not completely accurate.
  3. Colour legend:
    Text - OCR text corrected by human and/or overProof
    Text - human and/or overProof corrections
    Text - discrepencies between human and/or overProof
    Text - human corrections not applied by overProof
  4. Identified overProof corrections are calculated by the statistical calculation process, and shows those words changed by overProof which ALSO match human corrections. As human corrections are often wrong and incomplete, so too is this list.
  5. Identified overProof non-corrections are calculated by the statistical calculation process, and shows those words in the overProof output which DO NOT MATCH human corrections. As human corrections are often wrong and incomplete, so too is this list. Words marked as [**VANDALISED] are those which have been changed by overProof but not by the human correction; as before, a missed human correction will be (incorrectly) classified as vandalisation by overProof.
  6. Searchability of unique words refers to the distinct words in an article, and how many are present before and after correction. It is measure of how many of the words within an article could be used to find the article using a search engine.
  7. Weighted Words refers to a calculation in which common words count for little (a fraction of a word) and unusual words count for more, in proportion to the log of the inverse of their frequency in the corpus. It may be an indicator of how well distinctive words in an article can be searched before and after correction.

Article ID 36865543, Article, THE BRIDAL HOUR., page 4 1829-03-27, The Australian (Sydney, NSW : 1824 - 1848), 253 words, 5 corrections

Raw OCRHuman CorrectedoverProof Corrected
THE BRIDAL HOfTR THE BRIDAL HOUR. THE BRIDAL HOfTR
The gliding fish that takes his play The gliding fish that takes his play The gliding fish that takes his play
In shady nook of streamlet cool,. In shady nook of streamlet cool, In shady nook of streamlet cool,.
Thinks not how waters pass away. Thinks not how waters pass away, Thinks not how waters pass away.
And stimmpr drip* *h» nnnl. And summer dries the pool. And summer dries the pool.
The bird beneath his leafy dome The bird beneath his leafy dome The bird beneath his leafy some
. Wlio trills his carol, loud and clear* ~ Who trills his carol, loud and clear, Who trills his carol, loud and clear ~
, Thinks not how soon his verdant home Thinks not how soon his verdant home , Thinks not how soon his verdant home
The lightning's breath may sear. The lightning's breath may sear. The lightning's breath may year.
Shall I within my bridegroom's bower Shall I within my bridegroom's bower Shall I within my bridegroom's bower
With braids of budding roses twined. With braids of budding roses twined. With braids of budding roses twined.
Look forward to a coming hour Look forward to a coming hour Look forward to a coming hour
* When be may prove unkind } When he may prove unkind ? When he may prove unkind }
The bee reigns in hit waxen cell, The bee reigns in his waxen cell, The bee reigns in his waxen cell,
. The chieftain in his stately hold, The chieftain in his stately hold, The chieftain in his stately hold,
' \ To-morrow's earthquake who can tell ? ' To-morrow's earthquake—who can tell ? ' \ To-morrow's earthquake who can tell ? '
May both in ruin fold. - ' May both in ruin fold. May both in ruin fold. - '
Permanent writing; the London Weekly Review Permanent writing; the London Weekly Review Permanent writing; the London Weekly Review
remarks, may be easily efiected by rubbing fine remarks, may be easily effected by rubbing fine remarks, may be easily effected by rubbing fine
pounce, or what is preferable phosphate of lime pounce, or what is preferable phosphate of lime pounce, or what is preferable phosphate of lime
from burnt bones, over the paper previously, and from burnt bones, over the paper previously, and from burnt bones, over the paper previously, and
making use of the composition metallic pencils, making use of the composition metallic pencils, making use of the composition metallic pencils,
which consist of three parts lead, two of bismuth, which consist of three parts lead, two of bismuth, which consist of three parts lead, two of bismuth,
and one of tin, melted together. This is a desira and one of tin, melted together. This is a desirable and one of tin, melted together. This is a desirable
ble mode of writing for surveyors, or others em mode of writing for surveyors, or others mode of writing for surveyors, or others employed
ployed out of doors, where pen and ink cannot be employed out of doors, where pen and ink cannot out of doors, where pen and ink cannot be
obtained, it being almost as indelible as common be obtained, it being almost as indelible as obtained, it being almost as indelible as common
ink. ''? '. '. ? ''..?? - _ ' .' common ink. ink. ''? '. '. ? ''..?? - _ ' .'
The. new railway between Manchester and Liver The new railway between Manchester and Liver- The. new railway between Manchester and Liverpool
pool passes under the'town of Liverpool in a tun pool passes under the town of Liverpool in a tunnel passes under the town of Liverpool in a tunnel
nel cut through the. solid rock, 16 feet high, S2 cut through the solid rock, 16 feet high, 22 cut through the. solid rock, 16 feet high, 22
wide, and a mile and a quarter in length. It wide, and a mile and a quarter in length. It wide, and a mile and a quarter in length. It
emerges into day light on the top ,of Edge-hill, emerges into day light on the top of Edge-hill, emerges into day light on the top of Edge hill,
looking down upon Liverpool. looking down upon Liverpool. looking down upon Liverpool.
Identified overProof corrections POOL DRIES TOWN SUMMER EFFECTED HE
Identified overProof non-corrections SEAR [**VANDALISED] DOME [**VANDALISED]
Word
count
OCR
accuracy %
overProof
accuracy %
Errors
corrected %
All Words22195.098.672.7
Searchability of unique words16496.398.866.7
Weighted Words96.898.966.7

Accumulated stats for 1 articles from year 1829

Word
count
OCR
accuracy %
overProof
accuracy %
Errors
corrected %
All Words22195.098.672.0
Searchability of unique words16496.398.867.6
Weighted Words96.898.965.6