NLA Trial index

NLA Trial Articles from 1957

  1. Accuracy of OCR and overProof is measured in comparison with the human corrections. We know human corrections in this sample are incomplete, and themselves contain errors, but they are the best we could find automatically from the NLA newspapers corpus, tagged as completely corrected then further filtered to those with at least 3 corrections, at least 40% of lines corrected and lowest third percentage of non-dictionary words.
  2. Accuracy is measured by a separate process from that used to colour words in this output: the colouring process is heuristic, and not completely accurate.
  3. Colour legend:
    Text - OCR text corrected by human and/or overProof
    Text - human and/or overProof corrections
    Text - discrepencies between human and/or overProof
    Text - human corrections not applied by overProof
  4. Identified overProof corrections are calculated by the statistical calculation process, and shows those words changed by overProof which ALSO match human corrections. As human corrections are often wrong and incomplete, so too is this list.
  5. Identified overProof non-corrections are calculated by the statistical calculation process, and shows those words in the overProof output which DO NOT MATCH human corrections. As human corrections are often wrong and incomplete, so too is this list. Words marked as [**VANDALISED] are those which have been changed by overProof but not by the human correction; as before, a missed human correction will be (incorrectly) classified as vandalisation by overProof.
  6. Searchability of unique words refers to the distinct words in an article, and how many are present before and after correction. It is measure of how many of the words within an article could be used to find the article using a search engine.
  7. Weighted Words refers to a calculation in which common words count for little (a fraction of a word) and unusual words count for more, in proportion to the log of the inverse of their frequency in the corpus. It may be an indicator of how well distinctive words in an article can be searched before and after correction.

Article ID 91240726, Article, Three Fishermen Fight For Lines In High Seas, page 1 1957-10-24, The Canberra Times (ACT : 1926 - 1995), 216 words, 3 corrections

Raw OCRHuman CorrectedoverProof Corrected
Three Fisherttteii Wight Three Fishermen Fight Three Fisherttteii Fight
Fan J A res In High Seas For Lives In High Seas Fan J A. resIn High Seas
M^LBOIJBlNE, Wednesday,-—-Three young men in a 40 ft* 5 MELBOURNE, Wednesday,-- Three young men in a 40 ft. MELBOURNE, Wednesday,-—-Three young men in a 40 ft 5
shark boat are fighting for their lives against gale force tvinds> shark boat are fighting for their lives against gale force winds shark boat are fighting for their lives against gale force winds
mid raf*imr spms off the East Victorian coastline* ; and raging seas off the East Victorian coastline. and rather seas off the East Victorian coastline The
The shark boat "Heather The shark boat "Heather shark boat "Heather
Belle" has a smashed Belle" has a smashed Belle has a smashed
wheelhouse. and her engine wheelhouse and her engine wheelhouse. and her engine
is useless, v is useless. is useless, v
It left Lakes Entrance It left Lakes Entrance It left Lakes Entrance
two days ago on a normal two days ago on a normal two days ago on a normal
fishingf expedition. fishing expedition. fishing expedition.
: Late this afternoon, the Late this afternoon the : Late this afternoon, the
boat. was approaching the boat was approaching the boat. was approaching the
bar at Lakes Entrance, but bar at Lakes Entrance, but bar at Lakes Entrance, but
could not enter because of could not enter because of could not enter because of
the rough sea. the rough sea. the rough sea.
. : The occupants of t h e The occupants of the The occupants of t h e
boat, Veinon Newman* 25,1 boat, Vernon Newman, 25, boat, Vernon Newman 25,1
Lance Newman, -22, and Lance Newman, 22, and Lance Newman, -22, and
John Theodore, 22, decided John Theodore, 22, decided John Theodore, 22, decided
to ride the gale out until to ride the gale out until to ride the gale out until
to-morrow morning. to-morrow morning. to-morrow morning.
Then a massive wave Then a massive wave Then a massive wave
broke over the boat, broke over the boat, broke over the boat,
smashing the wheelhouse, smashing the wheelhouse, smashing the wheelhouse,
stopping: the engine arid stopping the engine and stopping: the engine and
wrecking their radio re wrecking their radio re- wrecking their radio re-
ceiver. : .. ceiver. ceiver. : ..
They, could not restart They could not restart They, could not restart
the engine, but the radio the engine, but the radio the engine, but the radio
: transmitters could" still transmitters could still : transmitters could" still
be operated. be operated. be operated.
They sent out, con, They sent out con- They sent out, continuous
tinuous S.O.S. signals. tinuous S.O.S. signals. S.O.S. signals.
The only other boat out. The only other boat out The only other boat out.
from Lakes Entrance, th« from Lakes Entrance, the from Lakes Entrance, the
"Harvey Star," receive® "Harvey Star," received Harvey Star," received
one of the signals and left one of the signals and left one of the signals and left
a sheltered spot to go . a sheltered spot to go to a sheltered spot to go .
the stricken vessel's aifl, the stricken vessel's aid, the stricken vessel's side,
but had to turn back'. . but had to turn back. but had to turn back'. .
A Lincoln bomber from A Lincoln bomber from 6 A Lincoln bomber from
East Sale \yill resume. a. L East Sale will resume a East Sale will resume. a. L
I search for the vessel m thft . search for the vessel in the I search for the vessel in the .
Imovnin^." . morning. moving." .
Identified overProof non-corrections AID RAGING FISHERMEN
accuracy %
accuracy %
corrected %
All Words18389.195.660.0
Searchability of unique words11991.697.570.0
Weighted Words93.097.767.1

Article ID 104020592, Article, POSTAGE STAMPS DEPICTING AUSTRALIAN WAR MEMORIAL, page 8 1957-11-29, Western Herald (Bourke, NSW : 1887 - 1970), 238 words, 3 corrections

Raw OCRHuman CorrectedoverProof Corrected
Two postage stamps to be Two postage stamps to be Two postage stamps to be
issued on Monday, 10th Febru issued on Monday, 10th Febru- issued on Monday, 10th February,
ary, 1958, will depict the Aus ary, 1958, will depict the Aus- 1953, will depict the Australian
tralian War Memorial, Can tralian War Memorial, Can- War Memorial, Canberra.
berra. In stating this, the berra. In stating this, the In stating this, the
Postmaster - General (Mr. Postmaster-General (Mr. Postmaster - General (Mr.
Davidson) said that both Davidson) said that both Davidson) said that both
stamps will be of 5id. deno stamps will be of 5½d. deno- stamps will be of 5d. denomination,
mination, with the courtyard of mination, with the courtyard of with the courtyard of
the Memorial as the. central the Memorial as the central the Memorial as the. central
theme. The symbolic support theme. The symbolic support- theme. The symbolic support-
ing figures in one instance will ing figures in one instance will ing figures in one instance will
be those of a sailor and an air be those of a sailor and an air- be those of a sailor and an air
man and, in the other, of a man and, in the other, of a man and, in the other, of a
soldier and a servlcewoman. soldier and a servicewoman. soldier and a servicewomen.
The figures are representations The figures are representations The figures are representations
of mosaics prepared by Mel of mosaics prepared by Mel- of mosaics prepared by MelBourne
Bourne artist, Mr. Napier bourne artist, Mr. Napier artist, Mr. Napier
Waller, which are being install Waller, which are being install- Waller, which are being installed
ed in the Hall of Memory. The ed in the Hall of Memory. The in the Hall of Memory. The
stamps will be one of the same stamps will be one of the same stamps will be one of the same
size and format as the recent size and format as the recent size and format as the recent
7d. Royal Flying Doctor Ser. 7d. Royal Flying Doctor Ser- 7d. Royal Flying Doctor Service
vlce Stamp and will appear vice Stamp and will appear Stamp and will appear
alternatively in the same sheet. alternatively in the same sheet. alternatively in the same sheet.
They will be printed in brown They will be printed in brown- They will be printed in brown
ish red colour. The design "is ish red colour. The design is ish red colour. The design is
the work of artist-engravers the work of artist-engravers the work of artist engravers
'of the Note Printing Branch, of the Note Printing Branch, of the Note Printing Branch,
Commonwealth Bank of Aus Commonwealth Bank of Aus- Commonwealth Bank of Australia,
tralia, Melbourne, where the tralia, Melbourne, where the Melbourne, where the
stamps will be printed. Con stamps will be printed. Con- stamps will be printed. Continuing,
tinuing, Mr. Davidson said that tinuing, Mr. Davidson said that Mr. Davidson said that
the issue of the stamps during the issue of the stamps during the issue of the stamps during
February, 1958, was particularly February, 1958, was particularly February, 1953, was particularly
appropriate as a large number appropriate as a large number appropriate as a large number
of returned servicemen from of returned servicemen from of returned servicemen from
British Commonwealth and British Commonwealth and British Commonwealth and
Empire countries will be in Empire countries will be in Empire countries will be in
Canberra in connection with Canberra in connection with Canberra in connection with
the 14th Conference of the the 14th Conference of the the 14th Conference of the
British" Empire Service League. British Empire Service League. British Empire Service League.
The Conference will be opened i The Conference will be opened The Conference will be opened i
by Her Majesty Queen Eliza- : by Her Majesty Queen Eliza- by Her Majesty Queen Eliza- :
beth, the Queen Mother, on : beth, the Queen Mother, on beth, the Queen Mother, on
17|th February. : 17th February. 17th February. :
Identified overProof corrections
Identified overProof non-corrections SERVICEWOMAN ELIZABETH
accuracy %
accuracy %
corrected %
All Words20898.199.050.0
Searchability of unique words11298.298.20.0
Weighted Words98.498.40.0

Accumulated stats for 2 articles from year 1957

accuracy %
accuracy %
corrected %
All Words39193.997.457.6
Searchability of unique words23194.897.858.4
Weighted Words95.998.153.3