NLA Trial index

NLA Trial Articles from 1989

  1. Accuracy of OCR and overProof is measured in comparison with the human corrections. We know human corrections in this sample are incomplete, and themselves contain errors, but they are the best we could find automatically from the NLA newspapers corpus, tagged as completely corrected then further filtered to those with at least 3 corrections, at least 40% of lines corrected and lowest third percentage of non-dictionary words.
  2. Accuracy is measured by a separate process from that used to colour words in this output: the colouring process is heuristic, and not completely accurate.
  3. Colour legend:
    Text - OCR text corrected by human and/or overProof
    Text - human and/or overProof corrections
    Text - discrepencies between human and/or overProof
    Text - human corrections not applied by overProof
  4. Identified overProof corrections are calculated by the statistical calculation process, and shows those words changed by overProof which ALSO match human corrections. As human corrections are often wrong and incomplete, so too is this list.
  5. Identified overProof non-corrections are calculated by the statistical calculation process, and shows those words in the overProof output which DO NOT MATCH human corrections. As human corrections are often wrong and incomplete, so too is this list. Words marked as [**VANDALISED] are those which have been changed by overProof but not by the human correction; as before, a missed human correction will be (incorrectly) classified as vandalisation by overProof.
  6. Searchability of unique words refers to the distinct words in an article, and how many are present before and after correction. It is measure of how many of the words within an article could be used to find the article using a search engine.
  7. Weighted Words refers to a calculation in which common words count for little (a fraction of a word) and unusual words count for more, in proportion to the log of the inverse of their frequency in the corpus. It may be an indicator of how well distinctive words in an article can be searched before and after correction.

Article ID 231521700, Government Gazette Private Notices, EXOTIC LIVESTOCK AUSTRALIA PTY LIMITED (In Liquidation).—Notice of intention to declare a dividend.—A final, page 405 1989-11-03, Government Gazette of the State of New South Wales (Sydney, NSW : 1901 - 2001), 80 words, 3 corrections

Raw OCRHuman CorrectedoverProof Corrected
Liquidation).—Notice of intention to declare a dividend.—A final Liquidation).— Notice of intention to declare a dividend.— A final Liquidation).—Notice of intention to declare a dividends final
dividend is to be declared on the 27th day of November, 1989, in dividend is to be declared on the 27th day of November, 1989, in dividend is to be declared on the 27th day of November, 1939, in
respect of the Company. Creditors whose debts or claims have not respect of the Company. Creditors whose debts or claims have not respect of the Company. Creditors whose debts or claims have not
already been admitted are required on or before the 27th day of already been admitted are required on or before the 27th day of already been admitted are required on or before the 27th day of
November, 19S9, formally to prove their debts or claims. In default, November, 1989, formally to prove their debts or claims. In default, November, 1939, formally to prove their debts or claims. In default,
they will be occluded from the benefit of the dividend. LINDSAY they will be excluded from the benefit of the dividend. LINDSAY they will be excluded from the benefit of the dividend. LINDSAY
DREW, (Liquidator), 32A Oxford Sum, Sydney, N.S.W. 2010.(2350] DREW, (Liquidator), 32A Oxford Street, Sydney, N.S.W. 2010. [2350] DREW, (Liquidator), 32A Oxford Sum, Sydney, N.S.W. 2010 (2350]
Identified overProof corrections EXCLUDED
Identified overProof non-corrections STREET
accuracy %
accuracy %
corrected %
All Words7397.397.30.0
Searchability of unique words5196.198.050.0
Weighted Words96.898.450.0

Accumulated stats for 1 articles from year 1989

accuracy %
accuracy %
corrected %
All Words7397.397.3.0
Searchability of unique words5196.198.048.7
Weighted Words96.898.450.0