Summary

In this module you have learned to:

  • Recognize the role of ATR projects in the digital preservation of endangered historical documents.
  • Explain why new approaches have dramatically improved the accuracy and feasibility of transcribing historical documents as well as explain how ATR accelerates historical discovery by improving the efficiency of transcription and indexing.
  • Define OCR, HTR, and ATR, and trace their evolution from punch-cards to neural networks
  • Define the four key dimensions when working with ATR as well as explain how these four key dimensions influence approaches to working with ATR, especially considering time efficiency
  • Understand the difference between small and large models and how this influences the quality of an ATR-transcription
  • Decide when to use off-the-shelf ATR models versus custom-trained ones for your specific collection
  • Evaluate the practical value of imperfect transcriptions and describe how tools like "fuzzy search" can be used to analyze them.

As we have shown in this module, Automated Text Recognition (ATR) can be used to tackle projects based on smaller or larger corpora with different approaches.

We have established time efficiency as a critical component when planning on working with ATR. However, what constitutes a time efficient workflow depends strongly on the aim of your project. Not every project, for example, needs to have a perfect transcription of the corpus. Often, the goals can be reached even with some minor mistakes, saving a lot of time in pre-processing and post-processing. To find an ideal individual workflow, we have proposed four key dimensions (heterogeneity of hands, amount of text, research question, method) that should to be considered.

Finally, we would like to introduce you to a site offering digital transcription tools that could be useful for your project.