What You Will Learn
By the end of this introduction and the subsequent pages, you will confidently be able to:
- Understand how different project corpora can be represented on scales of four key dimensions of working with ATR
- Explain why and how these values on the scales influence possible approaches
This chapter shows four examples with different types of corpora.
In the last chapter, we have established four dimensions of working with ATR. Here, we use these to show how different types of corpora can be tackled when using Automated Text Recognition (ATR), depending on the heterogeneity of hands in the corpus, the amount of text, the width of your research question, and if you want to use close reading or distant reading as your method.
In each section, we present one example with the following information:
- On the right, you will see a description of a corpus and a related project with some information on what we (hypothetically) would like to do with said corpus after text recognition.
- On your left, you will see a visual representation of the four dimensions established in the previous chapter with red dots representing the value on the scale given to this specific example.
- Underneeth the visual representation, you find a short summary of the project description.
In this chapter, we sometimes talk about ATR-models you can use for text recognition. You find more information about them in the last chapter. Here, it is only important to understand that models are either trained with a very narrow focus on a specific script or even a specific hand, or they can be trained with a larger set of training data consisting of various scripts, script types or even languages.
Small models with a narrow focus offer a better recognition if your script matches the narrow training data well but are often weak with scripts that differ even slightly. Bigger models aim at broader corpora, therefore matching more scripts. With bigger models, we can therefore recognize more scripts, however, these bigger models have worse results when it comes to the individual hand. The lower the heterogeneity is in your corpus, the better a specific small model will work, if it is selected or trained well. The higher the heterogeneity is, the bigger your model needs to be to cover all scripts in your corpus.
Whereas in this chapter, we focus on four examples, in the next chapter, you will find a tool where you can see all possible combinations of the previously established four important dimensions of working with ATR.