NLA Trial index

NLA Trial Articles from 1840

Notes
  1. Accuracy of OCR and overProof is measured in comparison with the human corrections. We know human corrections in this sample are incomplete, and themselves contain errors, but they are the best we could find automatically from the NLA newspapers corpus, tagged as completely corrected then further filtered to those with at least 3 corrections, at least 40% of lines corrected and lowest third percentage of non-dictionary words.
  2. Accuracy is measured by a separate process from that used to colour words in this output: the colouring process is heuristic, and not completely accurate.
  3. Colour legend:
    Text - OCR text corrected by human and/or overProof
    Text - human and/or overProof corrections
    Text - discrepencies between human and/or overProof
    Text - human corrections not applied by overProof
  4. Identified overProof corrections are calculated by the statistical calculation process, and shows those words changed by overProof which ALSO match human corrections. As human corrections are often wrong and incomplete, so too is this list.
  5. Identified overProof non-corrections are calculated by the statistical calculation process, and shows those words in the overProof output which DO NOT MATCH human corrections. As human corrections are often wrong and incomplete, so too is this list. Words marked as [**VANDALISED] are those which have been changed by overProof but not by the human correction; as before, a missed human correction will be (incorrectly) classified as vandalisation by overProof.
  6. Searchability of unique words refers to the distinct words in an article, and how many are present before and after correction. It is measure of how many of the words within an article could be used to find the article using a search engine.
  7. Weighted Words refers to a calculation in which common words count for little (a fraction of a word) and unusual words count for more, in proportion to the log of the inverse of their frequency in the corpus. It may be an indicator of how well distinctive words in an article can be searched before and after correction.

Article ID 12858755, Article, DEPARTURES., page 2 1840-04-13, The Sydney Herald (NSW : 1831 - 1842), 101 words, 4 corrections

Raw OCRHuman CorrectedoverProof Corrected
DEPARTURF.S. DEPARTURES. DEPARTURES.
April -I-MONTREAL 'Xessicr, master, for April 4—MONTREAL Tessier, master, for April -I-MONTREAL 'Tessier, master, for
Guam. Guam. Guam.
. April 5-HELVETIA, Gardner, master, for South April 5—HELVETIA, Gardner, master, for South . April 5 HELVETIA, Gardner, master, for South
Sea Fi«herr. Sea Fishery. Sea Fishery.
April 5-^H. M. S. BUFFALO, Wood, master, for April 5—H. M. S. BUFFALO, Wood, master, for April 5-29. M. S. BUFFALO, Wood, master, for
New Zealand. New Zealand. New Zealand.
April 7-JUSTINE, Lucas, muster, for New Zca April 7—JUSTINE, Lucas, master, for New Zea- April 7 JUSTINE, Lucas, master, for New Zealand.
lund. - land. -
April 8-SUSANNAH ANN, Anderson, master, April 8 -- SUSANNAH ANN, Anderson, master, April 8 SUSANNAH ANN, Anderson, master,
for New Zealand. for New Zealand. for New Zealand.
April 8-ARGYLE, Gatenby, master, for London. April 8 -- ARGYLE, Gatenby, master, for London. April 8 ARGYLE, Gatenby, master, for London.
April 8-BEE, Macfarlane, master, for New Zea- April 8 -- BEE, Macfarlane, master, for New Zea- April SABER, Macfarlane, master, for New Zealand.
land. land.
April 8-BENGAL, Calfon, master, for Calcutta. April 8 -- BENGAL, Carson, master, for Calcutta. April 8 BENGAL, Calfon, master, for Calcutta.
April 8-HIND, Jones, master, for Hobart Town, April 8 -- HIND, Jones, master, for Hobart Town. April 8 HIND, Jones, master, for Hobart Town,
'April 8-KINNEAR, Mailor, master, for landon. April 8 -- KINNEAR, Mallar, master, for London. April 3 KINNEAR, Mailor, master, for London.
April 8-LORD ELDON, Worsell, master, for April 8 -- LORD ELDON, Worsell, master, for April LORD ELDON, Worsell, master, for
India. India. India.
April 8-SOPHIA, Johns, master, for Java. April 8 -- SOPHIA, Johns, master, for Java. April 6 SOPHIA, Johns, master, for Java.
April 8-WILMOT, Miller, master, for Guam. April 8 -- WILMOT, Miller, master, for Guam. April 8 WILMOT, Miller, master, for Guam.
April 10-AV1LLIAAI, Hugh, master, for India. April 10 -- WILLIAM, Hugh, master, for India. April 10 WILLIAM, Hugh, master, for India.
Identified overProof corrections WILLIAM DEPARTURES TESSIER FISHERY
Identified overProof non-corrections MALLAR MONTREAL BEE [**VANDALISED] CARSON
Word
count
OCR
accuracy %
overProof
accuracy %
Errors
corrected %
All Words9489.495.760.0
Searchability of unique words4684.891.342.9
Weighted Words86.691.637.2

Article ID 28653973, Article, ARRIVALS., page 2 1840-03-30, The Sydney Herald (NSW : 1831 - 1842), 106 words, 4 corrections

Raw OCRHuman CorrectedoverProof Corrected
ARRIVALS. ARRIVALS. ARRIVALS.
March 21-SUSANNAH ANN. Anderson, March 21—SUSANNAH ANN, Anderson, March 21 SUSANNAH ANN. Anderson,
master, from Nerv Zealand, 27th February. master, from New Zealand, 27th February. master, from New Zealand, 27th February.
March 22-AUSTRALASIAN PACKET, March 22—AUSTRALASIAN PACKET, March 22 AUSTRALASIAN PACKET,
Mcpherson, Master, from Hobart Town, 14(h Mcpherson, Master, from Hobart Town, 14th Mcpherson, Master, from Hobart Town, 11th
instaut. . instant. instant. .
March 22-ACASTA, Ry le, master, from March 22—ACASTA, Ryle, master, from March 22 ACASTA, Ryle, master, from
London, 13th November. London, 13th November. London, 13th November.
March 22-CHRISTINA, Birkenshaw, master, March 22-- CHRISTINA, Birkenshaw, master, March 22 CHRISTINA, Birkenshaw, master,
from Port Phillip, 14th instant. from Port Phillip, 14th instant. from Port Phillip, 14th instant.
March 22-ROBERT HENDERSON, March 22-- ROBERT HENDERSON, -------- March 22 ROBERT HENDERSON,
returned to port. , returned to port. returned to port.
March 22-BRITANNIA, Leith, master, from March 22-- BRITANNIA, Leith, master, from March 22-BRITANNIA, Leith, master, from
London. 25th October. London, 25th October. London. 25th October.
March 24- PISCATOR, Silk, roaster, from March 24-- PISCATOR, Silk, master, from March 24- PISCATOR, Silk, master, from
London, 9th Si ptetnher. London, 9th September. London, 9th Si preacher.
March 24-HELVETIA, Gardiner, master, March 24-- HELVETIA, Gardiner, master, March 24 HELVETIA, Gardiner, master,
from South Sea Fishery. from South Sea Fishery. from South Sea Fishery.
March 26-H M S. HERALD, Nias, mattel March 26-- H. M. S. HERALD, Nias, master, March 26 H M S. HERALD, Nias, master
fri m New Zealand, 12th instant. from New Zealand, 12th instant. from New Zealand, 12th instant.
March 26 -LA VILLE DE BORDEAU, March 26-- LA VILLE DE BORDEAU, March 26 -LA VILLE DE BORDEAUX,
Pierre Largetean, master,, from France. Pierre Largetean, master, from France. Pierre Largetean, master, from France.
March 26-JOHN BARRY, Robson, master, March 26-- JOHN BARRY, Robson, master, March 26 JOHN BARRY, Robson, master,
from Valparaiso, 26th December. from Valparaiso, 26th December. from Valparaiso, 26th December.
Identified overProof corrections RYLE
Identified overProof non-corrections SEPTEMBER BORDEAU [**VANDALISED]
Word
count
OCR
accuracy %
overProof
accuracy %
Errors
corrected %
All Words8892.097.771.4
Searchability of unique words5396.296.20.0
Weighted Words96.396.30.0

Accumulated stats for 2 articles from year 1840

Word
count
OCR
accuracy %
overProof
accuracy %
Errors
corrected %
All Words18290.796.764.3
Searchability of unique words9990.993.933.2
Weighted Words91.393.929.6