When testing a model, pay attention to the Character Error Rate (CER) metric. The CER measures the percentage of incorrectly recognized characters in relation to the total number of recognized characters in your text corpus. A low CER indicates that the model recognizes the handwriting accurately. For a more precise analysis, such as to determine where in the ATR model the errors occur, use tools such as CERberus.
A further metric is the Word Error Rate (WER). It measures the percentage of incorrectly recognized words in relation to the total number of recognized words. A low WER indicates that the model correctly transcribes handwriting into typed words. However, the problem with WER is that there is no exact, fixed definition of a word and the WER does not show how wrong a word is. A word could still be readable if only one character in it is recognized wrong. The WER in this case would be the same as with a word that is completely illegible. Therefore, it is more accurate to evaluate the success of a model with CER.
These metrics help you decide if a model works well on your documents. Keep in mind that depending on your reading skills, you might be able to estimate yourself, how well a model works.
Many current ATR models work best with documents in the same language as they were trained on, since they work with the feature 'predicted next character', which estimates the probability of the next character in a sequence. This is possible with built-in language models inside an ATR model. It is therefore most promising if the model's training data and the sources to be transcribed are in the same language. There are also multilingual ATR models that have been trained with large amounts of text data in multiple languages that work quite well. However, new problems arise with these models, such as overcorrection and/or worsening of results, when the language models do not work well with the language in your corpus. This problem arises most with pre-modern languages that vary widely in spelling.
Generally, ATR models struggle with names of people and places, as those vary the most. The more standardized the spelling in your documents and in the model used are, the fewer errors will be made during text recognition. The same is true for the opposite, which means that a lot of pre-modern (especially non-Latin) documents will have higher CERs.
The layout of your project will decide what constitutes a good (enough) model. Even 'imperfect' transcriptions often allow for reliable identification of relevant text passages in large text corpora, to a certain extent. Therefore, if your goal is to find specific concepts in a big corpus, the CER can be significantly lower compared to a project that only examines a few documents, were you want to have an accurate transcription in terms of character and content. To understand this better, remember the four dimensions with the methodic aspects of a broad versus narrow research question and of distant versus close reading. More often than not, finding an available transcription in which 90% of the text has been correctly recognized already constitutes a great find.
Different tools have been developed to solve the problem of not entirely error-free text recognition. Compared to a conventional and literal 'text search', such search aids are based on calculated probabilities, which show the likeliness that a term in the document corresponds to the term we have searched for. A known example of such a search aid would be "fuzzy search". A fuzzy search is based on the so called Levenshtein-Distance. It counts how many characters must be replaced, removed and added from one word to another. If you search e.g. the term "big" with a Levenshtein-Distance of 1, the results will also show the term "bag". If you then look in the original document, you can check whether it is actually "bag" or "big". The number 1 indicates that one character needed to be changed to go from "big" to "bag". In German, a similar example would be with "Kind" and "Rind", whereas the change between the English "pear" and "peel" would be a Levenshtein-Distance of 2.
To summarize, when deciding if a model works for your documents, consider the following key factors:
- Character Error Rate (CER)
- your preexisting reading and comprehension skills
- the quantity of documents to be transcribed
- your goals for the transcription
Keep time efficiency in mind and ask yourself: what is good enough for your project?