NLA Trial index

NLA Trial Articles from 1956

Notes
  1. Accuracy of OCR and overProof is measured in comparison with the human corrections. We know human corrections in this sample are incomplete, and themselves contain errors, but they are the best we could find automatically from the NLA newspapers corpus, tagged as completely corrected then further filtered to those with at least 3 corrections, at least 40% of lines corrected and lowest third percentage of non-dictionary words.
  2. Accuracy is measured by a separate process from that used to colour words in this output: the colouring process is heuristic, and not completely accurate.
  3. Colour legend:
    Text - OCR text corrected by human and/or overProof
    Text - human and/or overProof corrections
    Text - discrepencies between human and/or overProof
    Text - human corrections not applied by overProof
  4. Identified overProof corrections are calculated by the statistical calculation process, and shows those words changed by overProof which ALSO match human corrections. As human corrections are often wrong and incomplete, so too is this list.
  5. Identified overProof non-corrections are calculated by the statistical calculation process, and shows those words in the overProof output which DO NOT MATCH human corrections. As human corrections are often wrong and incomplete, so too is this list. Words marked as [**VANDALISED] are those which have been changed by overProof but not by the human correction; as before, a missed human correction will be (incorrectly) classified as vandalisation by overProof.
  6. Searchability of unique words refers to the distinct words in an article, and how many are present before and after correction. It is measure of how many of the words within an article could be used to find the article using a search engine.
  7. Weighted Words refers to a calculation in which common words count for little (a fraction of a word) and unusual words count for more, in proportion to the log of the inverse of their frequency in the corpus. It may be an indicator of how well distinctive words in an article can be searched before and after correction.

Article ID 79264355, Article, OLD BOTTLE MESSAGE, page 14 1956-05-31, The Central Queensland Herald (Rockhampton, Qld. : 1930 - 1956), 83 words, 4 corrections

Raw OCRHuman CorrectedoverProof Corrected
OLD BOTTLE OLD BOTTLE OLD BOTTLE
MESSAGE MESSAGE MESSAGE
PERTH. May 27.-Probablv PERTH. May 27.-- Probably PERTH. May 27. Probably
throw overboard from tone of thrown overboard from one of throw overboard from tone of
Augtmllafe fim troopamps tn Australia's first troopships in Augtmllafe from troopships in
Wort<f Wir I, a bottle eog World War I, a bottle con- World War I, a bottle dog
t*ln!n<- three faded sheet* of tainer three faded sheets of trials- three faded sheets of
notepi^per has been found at notepaper has been found at notepaper has been found at
Doubtful Island Bay. on the Doubtful Island Bay, on the Doubtful Island Bay on the
sooth coast of Western Aus south coast of Western Aus- south coast of Western Australia.
tralia. tralia.
One note read: "Off to the One note read: "Off to the One note read: "Off to the
Dardanelles. 30/11/1*--half Dardanelles. 30/11/15--half- Dardanelles. 30/11/1*--half
way tfr Western Australia" way to Western Australia" way to Western Australia"
Then «ame the namea, H Then came the names, H Then came the names, H
Notices. Q. Holeenberger tods Noakes, G. Holzenberger and Notices. Q. Holeenberger tons
Captain Bromlden, with mn« Captain Bromiden, with some Captain Bromlden, with many
others Impossible to decipher others impossible to decipher. others impossible to decipher
The bottle was found un The bottle was found un- The bottle was found in
corked <on a lonely south corked on a lonely south worked on a lonely south
coaet besich. coast beach. coast beach.
Identified overProof corrections BEACH NOTEPAPER WORLD IN NAMES WAR SHEETS TROOPSHIPS CAME PROBABLY
Identified overProof non-corrections NOAKES AND THROWN BROMIDEN SOME AUSTRALIAS UNCORKED [**VANDALISED] FIRST HOLZENBERGER CONTAINER
Word
count
OCR
accuracy %
overProof
accuracy %
Errors
corrected %
All Words7467.685.154.2
Searchability of unique words5867.282.847.4
Weighted Words70.884.547.0

Article ID 84389063, Article, One snake got away, page 7 1956-09-29, The Argus (Melbourne, Vic. : 1848 - 1957), 61 words, 4 corrections

Raw OCRHuman CorrectedoverProof Corrected
One snake i One snake One snake i
got away j got away got away j
; NATHALIA, Friday: NATHALIA, Friday: ; NATHALIA, Friday:
I Jack and Walter Broom Jack and Walter Broom I Jack and Walter Broom
\ were mustering stock at were mustering stock at were mustering stock at
Skeleton Creek when Skeleton Creek when Skeleton Creek when
the.v saw two brown they saw two brown they saw two brown
snakes. snakes. snakes.
They killed one and They killed one and They killed one and
the other got away. the other got away. the other got away.
; The dead snake was The dead snake was The dead snake was
7ft. OJin. long, with 6Ain. 7ft. 0½in. long, with 6½in. 7ft. 6in. long, with 54in.
girth, with black spots girth, with black spots girth, with black spots
along its back. along its back. along its back.
It was the largest It was the largest It was the largest
snake seen in this area. snake seen in this area. snake seen in this area.
Identified overProof corrections
Identified overProof non-corrections
Word
count
OCR
accuracy %
overProof
accuracy %
Errors
corrected %
All Words5298.1100.0100.0
Searchability of unique words41100.0100.0100.0
Weighted Words100.0100.00.0

Accumulated stats for 2 articles from year 1956

Word
count
OCR
accuracy %
overProof
accuracy %
Errors
corrected %
All Words12680.291.255.8
Searchability of unique words9980.889.947.6
Weighted Words82.990.946.9