NLA Trial index

NLA Trial Articles from 1827

Notes
  1. Accuracy of OCR and overProof is measured in comparison with the human corrections. We know human corrections in this sample are incomplete, and themselves contain errors, but they are the best we could find automatically from the NLA newspapers corpus, tagged as completely corrected then further filtered to those with at least 3 corrections, at least 40% of lines corrected and lowest third percentage of non-dictionary words.
  2. Accuracy is measured by a separate process from that used to colour words in this output: the colouring process is heuristic, and not completely accurate.
  3. Colour legend:
    Text - OCR text corrected by human and/or overProof
    Text - human and/or overProof corrections
    Text - discrepencies between human and/or overProof
    Text - human corrections not applied by overProof
  4. Identified overProof corrections are calculated by the statistical calculation process, and shows those words changed by overProof which ALSO match human corrections. As human corrections are often wrong and incomplete, so too is this list.
  5. Identified overProof non-corrections are calculated by the statistical calculation process, and shows those words in the overProof output which DO NOT MATCH human corrections. As human corrections are often wrong and incomplete, so too is this list. Words marked as [**VANDALISED] are those which have been changed by overProof but not by the human correction; as before, a missed human correction will be (incorrectly) classified as vandalisation by overProof.
  6. Searchability of unique words refers to the distinct words in an article, and how many are present before and after correction. It is measure of how many of the words within an article could be used to find the article using a search engine.
  7. Weighted Words refers to a calculation in which common words count for little (a fraction of a word) and unusual words count for more, in proportion to the log of the inverse of their frequency in the corpus. It may be an indicator of how well distinctive words in an article can be searched before and after correction.

Article ID 2189149, Article, Poetry. TO MY SISTER., page 4 1827-10-12, The Sydney Gazette and New South Wales Advertiser (NSW : 1803 - 1842), 253 words, 8 corrections

Raw OCRHuman CorrectedoverProof Corrected
Voflv»; Poetry. Votive;
TO MySlKl'liH TO MY SISTER. TO MySlKl'liH
.y My Sitter thou urt all to ma " My Sister thou art all to me by My Sister thou art all to me
" That hope could form or wish demand, " That hope could form or wish demand, " That hope could form or wish demand,
"*' Or fancy fetterlcii and free j " Or fancy fetterless and free ; "*' Or fancy fettered and free j
,M Call from the bowen of fairyland, " Call from the bowers of fairyland, M Call from the bower of fairyland,
" TJuîÇrlfjhf creation« <¡f lur «j-and " The bright creations of her wand " TJuîÇrlfjhf creations of her grand
'* Fly at the tiyht of narra«?* tear*', " Fly at the sight of sorrow's tears, '* Fly at the sight of narrates tears',
" tint, dcarvtt, thou art »till at hand, " But, dearest, thou art still at hand, " that, dearest, thou art still at hand,
"? Tlu? friend* desert mc, thou art near." " Tho' friends desert me, thou art near." "? The friends desert mc, thou art near.
Tile world, oh ! my Sister, believe not, The world, oh ! my Sister, believe not, The world, oh ! my Sister, believe not
' Tho' smiling and bright it appears ; Tho' smiling and bright it appears ; The' smiling and bright it appears ;
Thy young heart, -oh ! let it deceive not, Thy young heart, oh ! let it deceive not, Thy young heart, -oh ! let it deceive not,
Then leave thee to sorrow and teais. Then leave thee to sorrow and tears. Then leave thee to sorrow and tears.
It it false, it it false, for not one It it false, it it false, for not one it is false, it is false, for not one
' Of the many around thee that }k>w, Of the many around thee that bow, Of the many around thee that yew,
. But would vanish with pleasure'! bright ino, But would vanish with pleasure's bright sun, But would vanish with pleasure'! bright ino,
Aiid forget or forsake their fond vow. And forget or forsake their fond vow. And forget or forsake their fond vow.
Some Isle, oh ! my Sister, I'll find thee, Some Isle, oh ! my Sister, I'll find thee, Some Isle, oh ! my Sister, I'll find thee,
Ixme and lovely beyond the deep tea ; Lone and lovely beyond the deep sea ; Ixme and lovely beyond the deep sea ;
Where, leaving the false world In-hind thee, Where, leaving the false world behind thee, Where, leaving the false world In-hind thee,
Ever happy und pure shall thou be ; Ever happy and pure shall thou be ; Ever happy and pure shall thou be
And the aun,-and the aun shall be bright, And the sun,— and the sun shall be bright, And the hear-and the sun shall be bright,
And the air shall be heavenly crear ; And the air shall be heavenly clear ; And the air shall be heavenly clear ;
And tho moons that-cmbellish the night, And the moons that embellish the night, And the moon's that embellish the night,
All cloudlets and bright shall appear. All cloudlets and bright shall appear. All cloudless and bright shall appear.
And there, oh ! my Sister, a bower And there, oh ! my Sister, a bower And there, oh ! my Sister, a bower
All blooming with roses I'll twine ; All blooming with roses I'll twine ; All blooming with roses I'll twine ;
And no cloud of the world «h|ill'e'cr lower, And no cloud of the world shall e'er lower, And no cloud of the world «h|ill'e'cr lower,
But in peace thou shall softly recline ; But in peace thou shall softly recline ; But in peace thou shall softly recline ;
And the queen ! and (he queen of that isle, And the queen ! and the queen of that isle, And the queen ! and the queen of that isle,
In thy purity there «halt thou dwell ; In thy purity there shalt thou dwell ; In thy purity there shalt thou dwell ;
And fred from the world and its guile, And freed from the world and its guile, and free from the world and its guile,
Shall bid all ita sorrows farewell ! STUFF. Shall bid all its sorrows farewell ! STUFF. Shall bid all its sorrows farewell ! STUFF.
Identified overProof corrections DEAREST SUN SEA STILL EMBELLISH FRIENDS CREATIONS ME SHALT SIGHT TEARS HER CLEAR
Identified overProof non-corrections WAND POETRY FREED LONE THO [**VANDALISED] FETTERLESS BOWERS PLEASURES BOW EER BEHIND CLOUDLETS [**VANDALISED]
Word
count
OCR
accuracy %
overProof
accuracy %
Errors
corrected %
All Words23181.889.642.9
Searchability of unique words12281.190.247.8
Weighted Words84.692.350.0

Accumulated stats for 1 articles from year 1827

Word
count
OCR
accuracy %
overProof
accuracy %
Errors
corrected %
All Words23181.889.642.9
Searchability of unique words12281.190.248.1
Weighted Words84.692.350.0