NLA Trial index

NLA Trial Articles from 1961

  1. Accuracy of OCR and overProof is measured in comparison with the human corrections. We know human corrections in this sample are incomplete, and themselves contain errors, but they are the best we could find automatically from the NLA newspapers corpus, tagged as completely corrected then further filtered to those with at least 3 corrections, at least 40% of lines corrected and lowest third percentage of non-dictionary words.
  2. Accuracy is measured by a separate process from that used to colour words in this output: the colouring process is heuristic, and not completely accurate.
  3. Colour legend:
    Text - OCR text corrected by human and/or overProof
    Text - human and/or overProof corrections
    Text - discrepencies between human and/or overProof
    Text - human corrections not applied by overProof
  4. Identified overProof corrections are calculated by the statistical calculation process, and shows those words changed by overProof which ALSO match human corrections. As human corrections are often wrong and incomplete, so too is this list.
  5. Identified overProof non-corrections are calculated by the statistical calculation process, and shows those words in the overProof output which DO NOT MATCH human corrections. As human corrections are often wrong and incomplete, so too is this list. Words marked as [**VANDALISED] are those which have been changed by overProof but not by the human correction; as before, a missed human correction will be (incorrectly) classified as vandalisation by overProof.
  6. Searchability of unique words refers to the distinct words in an article, and how many are present before and after correction. It is measure of how many of the words within an article could be used to find the article using a search engine.
  7. Weighted Words refers to a calculation in which common words count for little (a fraction of a word) and unusual words count for more, in proportion to the log of the inverse of their frequency in the corpus. It may be an indicator of how well distinctive words in an article can be searched before and after correction.

Article ID 105895070, Article, Society Meeting Called, page 26 1961-09-27, The Canberra Times (ACT : 1926 - 1995), 136 words, 3 corrections

Raw OCRHuman CorrectedoverProof Corrected
Society? Society Society?
Meetiml Meeting Meeting
Called i Called Called A
A special general A special general special general
| meeting of the Canberra meeting of the Canberra meeting of the Canberra
l Co - operative Society Co-operative Society l Co - operative Society
will be held on October will be held on October will be held on October
9. 9. 9.
The Registrar of Co-opera, The Registrar of Co-opera- The Registrar of Co-operative
tive Sovieties, Mr. J. D. But* tive Societies, Mr. J. D. But- Societies, Mr. J. D. Butt
ton, has called the meeting, ton, has called the meeting. ton, has called the meeting,
The meeting, for share The meeting, for share- The meeting, for shareholders
holders only, will be closed holders only, will be closed only, will be closed
to the Press and the public, to the Press and the public. to the Press and the public,
Shareholders will hear Hit Shareholders will hear the Shareholders will hear the
result of an inspectoi'i result of an inspector's result of an inspector's
inquiry into the society's inquiry into the society's inquiry into the society's
affairs. affairs. affairs.
The society's board ot The society's board of The society's board of
directors requested : tit directors requested the directors requested : the
inquiry on May 23. inquiry on May 23. inquiry on May 23.
Mr. Button appointed Mr. Mr. Button appointed Mr. Mr. Button appointed Mr.
Keith Bennetts to examine Keith Bennetts to examine Keith Bennetts to examine
and report on the Co-Opera and report on the Co-Opera- and report on the Co-Operative
tive's minutes, books, docu tive's minutes, books, docu- minutes, books, documents,
ments, stock, securities and ments, stock, securities and stock, securities and
affairs. affairs. affairs.
. Shareholders also will bt Shareholders also will be . Shareholders also will be
told what action the board ot' told what action the board of told what action the board of'
directors has taken or is con-' directors has taken or is con- directors has taken or is considering.
sidering. 1 sidering. 1
They will be required to They will be required to They will be required to
produce their share books at produce their share books at produce their share books at
the door to prove member-' the door to prove member- the door to prove membership
ship. ship.
accuracy %
accuracy %
corrected %
All Words11891.595.850.0
Searchability of unique words7194.494.40.0
Weighted Words95.695.60.0

Accumulated stats for 1 articles from year 1961

accuracy %
accuracy %
corrected %
All Words11891.595.850.6
Searchability of unique words7194.494.4.0
Weighted Words95.695.6.0