NLA Trial index

NLA Trial Articles from 1968

  1. Accuracy of OCR and overProof is measured in comparison with the human corrections. We know human corrections in this sample are incomplete, and themselves contain errors, but they are the best we could find automatically from the NLA newspapers corpus, tagged as completely corrected then further filtered to those with at least 3 corrections, at least 40% of lines corrected and lowest third percentage of non-dictionary words.
  2. Accuracy is measured by a separate process from that used to colour words in this output: the colouring process is heuristic, and not completely accurate.
  3. Colour legend:
    Text - OCR text corrected by human and/or overProof
    Text - human and/or overProof corrections
    Text - discrepencies between human and/or overProof
    Text - human corrections not applied by overProof
  4. Identified overProof corrections are calculated by the statistical calculation process, and shows those words changed by overProof which ALSO match human corrections. As human corrections are often wrong and incomplete, so too is this list.
  5. Identified overProof non-corrections are calculated by the statistical calculation process, and shows those words in the overProof output which DO NOT MATCH human corrections. As human corrections are often wrong and incomplete, so too is this list. Words marked as [**VANDALISED] are those which have been changed by overProof but not by the human correction; as before, a missed human correction will be (incorrectly) classified as vandalisation by overProof.
  6. Searchability of unique words refers to the distinct words in an article, and how many are present before and after correction. It is measure of how many of the words within an article could be used to find the article using a search engine.
  7. Weighted Words refers to a calculation in which common words count for little (a fraction of a word) and unusual words count for more, in proportion to the log of the inverse of their frequency in the corpus. It may be an indicator of how well distinctive words in an article can be searched before and after correction.

Article ID 131678717, Article, Another rail terminal planned, page 3 1968-10-24, The Canberra Times (ACT : 1926 - 1995), 314 words, 6 corrections

Raw OCRHuman CorrectedoverProof Corrected
Another Another Another
rail rail rail
terminal terminal terminal
planned planned planned
A railway passenger terminal, west of the A railway passenger terminal, west of the A railway passenger terminal, west of the
Canberra airport, is envisaged in planning for Canberra airport, is envisaged in planning for Canberra airport, is envisaged in planning for
rail facilities in the ACT. rail facilities in the ACT. rail facilities in the ACT.
I Envisaged also arc addi Envisaged also are addi- I Envisaged also are additional
tional freight handling faci tional freight handling faci- freight handling facilities
lities in Bclconncn. lities in Belconnen. in Belconnen.
| The plans were revealed The plans were revealed | The plans were revealed
yesterday in a statement is yesterday in a statement is- yesterday in a statement is
sued by the Commonwealth sued by the Commonwealth sued by the Commonwealth
Railways which said that if Railways which said that if Railways which said that if
they eventuated the station they eventuated the station they eventuated the station
building at Kingston would building at Kingston would building at Kingston would
become the centre of freight become the centre of freight become the centre of freight
operations. operations. operations.
A spokesman for the Nat A spokesman for the Nat- A spokesman for the National
ional Capital Development ' ional Capital Development Capital Development '
Commission said last night, Commission said last night, Commission said last night,
however, that he knew of no however, that he knew of no however, that he knew of no
firm plans such as those firm plans such as those firm plans such as those
given in the statement. given in the statement. given in the statement.
But it was believed that But it was believed that But it was believed that
such developments could such developments could such developments could
be included in a report on be included in a report on be included in a report on
the proposed Canberra the proposed Canberra- the proposed Canberra
Yass railway line which was t Yass railway line which was Yass railway line which was t
likely to reach the Minister r likely to reach the Minister likely to reach the Minister r
for Shipping and Transport. - for Shipping and Transport, for Shipping and Transport. -
Mr Sinclair, this month. r Mr Sinclair, this month. Mr Sinclair, this month.
The report, the result of f The report, the result of The report, the result of f
a three-year investigation by t a three-year investigation by a three-year investigation by the
the Commonwealth Rail- e the Commonwealth Rail- Commonwealth Rail- e
ways and the NSW Rail- I ways and the NSW Rail- ways and the NSW Rail- I
ways Department, was to ways Department, was to ways Department, was to
contain details of a pro- ( contain details of a pro- contain details of a pro- (
posed route for the line to- ( posed route for the line to- posed route for the line to- (
gether with an estimate of gether with an estimate of gether with an estimate of
its cost, probable traffic ( its cost, probable traffic its cost, probable traffic (
and likely revenue. and likely revenue. and likely revenue.
Additional I Additional Additional I
sidings i sidings sidings The
The statement by the The statement by the statement by the
Commonwealth Railways, Commonwealth Railways, Commonwealth Railways,
said freight traffic in the. said freight traffic in the said freight traffic in the.
ACT continued to increase, \ ACT continued to increase, ACT continued to increase, A
and later this year work \ and later this year work and later this year work would
would begin on additional ] would begin on additional begin on additional ]
sidings for handling goods r sidings for handling goods sidings for handling goods r
direct from rail to road direct from rail to road direct from rail to road
truck. t truck. truck. t
"An end-loading ramp for t An end-loading ramp for "An end-loading ramp for a
piggy-back traffic will be in- a piggy-back traffic will be in- piggy back traffic will be in- a
eluded, and a gantry crane t cluded, and a gantry crane cluded, and a gantry crane t
for handling heavy goods s for handling heavy goods for handling heavy goods s
and containers will also be a and containers will also be and containers will also be a
provided", it said. f provided, it said. provided, it said. of
It was proposed to begin c It was proposed to begin It was proposed to begin c
work during the 1969-70. work during the 1969-70. work during the 1969-70.
financial year on the provi- | financial year on the provi- financial year on the provision
sion of facilities for for- f sion of facilities for for- of facilities for for- f
warding agents which would warding agents which would warding agents which would
incorporate warehouses and incorporate warehouses and incorporate warehouses and
rail/road access. j rail/road access. rail/road access. These
These facilities would be , These facilities would be facilities would be
si(ed on landfill between the ^ sited on landfill between the sited on landfill between the I
eastern edge of the Cause eastern edge of the Cause- eastern edge of the Cause
way and Jerrabomberra, way and Jerrabomberra way and Jerrabomberra,
Creek. , Creek. Creek.
Identified overProof corrections ARE SITED PROVISION BELCONNEN
Identified overProof non-corrections TOGETHER BYTHE FORWARDING
accuracy %
accuracy %
corrected %
All Words25895.797.336.4
Searchability of unique words14595.297.957.1
Weighted Words96.398.457.1

Accumulated stats for 1 articles from year 1968

accuracy %
accuracy %
corrected %
All Words25895.797.337.2
Searchability of unique words14595.297.956.3
Weighted Words96.398.456.8