w.

DH Certificate Dossier

An in-progress record of coursework, work, events, and trainings attended to complete the Digital Humanities Graduate Certificate Program at the University of Virginia (UVA). These records are being prepared to comply with the final requirements to receive the certificate.

Last modified: April 05, 2026 9:38am


Coursework

Extracurricular Technical Learning

DH Events and Conferences Attended

01: Coffee and Code, Evaluating OCR

April 1, 2026, 11:00am–12:00pm EST
Shannon Library 308

Event Summary

A casual, but in-depth, workshop where the Senior Developer at the Scholars' Lab, Shane Lin, shared a progress update on his research into document transcription software and techniques. This time, he walked us through a quality comparison of multiple LLM models for hand-written text recognition (HTR) and the parameters to evaluate and rate their outputs. As a case study, Shane showed had us determine whether the Chat-GPT or Gemini were better at transcribing a page segment of a document from the Salem Witch Trials Project to spark an open discussion about what kinds of information we value when transcribing a historical document. What is our "ground truth" in this context? In the second part of the talk, he went deeper into the linguistic nuances that change how LLMs process natural languages by introducing the Levenshtein distance, the Character Error Rate (CER), and Word Error Rate (WER) metrics. This research project developed from Shane's role as Technical Lead fro the Salem project.

Resources

DH Research Assistant Experiences