open access

Using AI to Automate Library Captioning

Captions play a key role in making audio and video content accessible. They benefit not only deaf and hard-of-hearing users, but also second-language learners, researchers scanning interviews, and anyone viewing content in noisy environments.

At the Northeastern University Library, we manage a growing archive of media from lectures and events to oral histories. Manually creating captions for all of this content is not a scalable solution, and outsourcing the task to third party services can be expensive, time-consuming, and inconsistent. Motivated by this, we have been exploring AI-powered speech-to-text tools that could generate high-quality captions more efficiently and cost-effectively.

Screenshot of a video with a person speaking and a caption reading "There is an enormous need for an expansion of imagination and"
Figure 1: Example of an ideal transcription output

We started by testing Autosub, an open-source tool for generating subtitles. Even using a maintained fork (copies of the original project that add features, fix bugs, or adapt the code for different use cases), Autosub did not offer significant time savings, and it was eventually dropped.

In summer 2023, the team began using OpenAI’s Whisper, which immediately cut captioning time in half. However, it lacked key features like speaker diarization (the process of segmenting a speech signal based on the identity of the speakers), and it often stumbled on long stretches of music or background noise which would require extra cleanup and made the output harder to use at scale.

As the AI for Digital Collections co-op on the Digital Production Services (DPS) team, I was responsible for researching and testing Whisper forks that could be realistically adopted by our team. I tested model performance, wrote scripts to automate captioning, debugged issues, and prepared tools for long-term use within our infrastructure.

Phase 1: Evaluating Whisper Forks

We looked for a model that could:

  • Handle speaker diarization
  • Distinguish between speech and non-speech (music, applause, etc.)
  • Output standard subtitle formats (like VTT/SRT)
  • Be scriptable and actively maintained

We tested several forks, including WhisperX, Faster Whisper, Insanely Fast Whisper, and more. Many were either too fragile, missing key features, or poorly maintained. WhisperX stood out as the most well-rounded: it offered word-level timestamps, basic diarization, reasonable speed, and ongoing development support.

Phase 2: Performance Testing

Once we chose WhisperX, we compared its various models to OpenAI’s original Whisper models, including large-v1, v2, v3, large-v3-turbo, and turbo. We tested six videos, each with different lengths and levels of background noises, and compared the models based on Word Error Rate (WER) (how often the transcription differed from a “gold standard, or human-created or -edited transcript), and processing time (how long it took each model to generate captions).

WhisperX’s large-v3 model consistently performed well, balancing speed and accuracy even on noisy or complex audio. OpenAI’s turbo and large-v3-turbo delivered strong performance but lacked diarization features.

Phase 3: Timestamp Accuracy Evaluation

Next, we assessed how precisely each model aligned subtitles to the actual audio — crucial for usability. We compared outputs from the WhisperX large-v3 model and the OpenAI turbo and large-v3-turbo models.

We used a gold standard transcript with human-reviewed subtitles as our benchmark. For each model’s output, we measured:

  • Start Mean Absolute Error (MAE) — average timing difference between predicted and actual subtitle start times
  • End MAE — same as Start MAE, but for subtitle end times
  • Start % < o.5s — percentage of subtitles with start times less than 0.5 seconds off
  • End % < 0.5s — same for start % < 0.5s, for end times
  • Alignment rate — overall percentage of words correctly aligned in time

WhisperX’s large-v3 model outperformed all other models significantly. In most of our test videos, it showed:

  • Much lower MAE scores for both start and end timestamps
  • Higher percentages of accurately timed subtitles (within the 0.5-second range)
  • Better overall word alignment rates

In fact, in several test cases, WhisperX was nearly three times more accurate than the best-performing OpenAI Whisper models in terms of timing precision.

Coded two-page caption
Figure 2: WhisperX output vs. gold-standard transcript in a high-WER case

In one particular case, one WER result for WhisperX large-v3 showed a surprisingly disappointing score of 94% errors. When I checked the difference log to investigate, it was that the model had transcribed background speech that was not present in the gold standard transcript. So, while it was technically penalized, WhisperX was actually picking up audio that the gold standard did not include. This highlighted both the model’s sensitivity and the limitations of relying solely on WER for evaluating accuracy.

Figure 2 shows exactly that. On the left, WhisperX (denoted “HYP”) transcribed everything it heard, while the gold standard transcript (denoted “REF”) cut off early and labeled the rest as background noise (shown on the right).

What’s Next: Integrating WhisperX

We have now deployed WhisperX’s large-v3 model to the library’s internal server. It’s actively being used to generate captions for incoming audio and video materials. This allows:

  • A significant reduction in manual labor for our DPS team
  • The potential for faster turnaround on caption requests
  • A scalable solution for future projects involving large media archives

Conclusion

As libraries continue to manage growing volumes of audio and video content, scalable and accurate captioning has become essential, not only for accessibility, but also for discoverability and long-term usability. Through this project, we identified WhisperX as a practical open-source solution that significantly improves transcription speed, speaker diarization, and timestamp precision. While no tool is perfect, WhisperX offers a strong foundation for building more efficient and inclusive media workflows in the library setting.

Reflections and Acknowledgements

This project helped me understand just how much thought and precision goes into building effective captioning systems. Tools like WhisperX offer powerful capabilities, but they still require careful evaluation, thoughtful tuning, and human oversight. I am incredibly grateful to have contributed to a project that could drastically reduce the time and effort required to caption large volumes of media, this way enabling broader access and creating long-term impact across the library’s AV collections.

Finally, I would like to thank the Digital Productions Services team for the opportunity and their guidance and support throughout this project — especially Sarah Sweeney, Kimberly Kennedy, Drew Facklam, and Rob Chavez, whose insights and feedback were invaluable.

Northeastern University Library strikes new agreements to support open access publishing

In partnership with the Office of the Provost, Northeastern University Library is taking steps to support open access publishing upon completion of agreements with two top publishers: Springer Nature and Wiley. The new agreements cover article processing charges (APCs) across each publisher’s portfolio of open access journals, eliminating the cost to Northeastern researchers who choose to publish open access or are mandated by funders to publish or otherwise disseminate research via open publications/platforms without barriers to access. These agreements build on Northeastern University Library’s existing subscriptions providing access to Springer Nature and Wiley content spanning ebooks, journals, and more.

Springer Nature
Northeastern University Library is among a leading group of research libraries to explore options and strike new, cost-effective transformative agreements. Along with MIT and Carnegie Mellon, the agreement covers APCs in all hybrid Springer Nature publications/imprints, including Springer, Adis, and Palgrave. Springer’s Guide for Authors offers detailed information.

Wiley
Authors affiliated with Northeastern may publish open access at no charge in Wiley or Hindawi fully open access journals or in a hybrid journal. Wiley offers detailed information for authors on the publication process.

The new agreements in place run through 2025 and follow recent progress with other publishers including Cambridge University Press. A complete list of open access agreements and related publishing options are found on the library’s Open Access Publishing page.

Register for upcoming webinars to learn more about the agreements and related publication workflows for authors/potential authors. Two webinars with Springer Nature and two webinars with Wiley are scheduled for late March at times to enable colleagues from across global network time zones to participate

For more information, contact Evan Simpson, Associate Dean for Experiential Learning and Academic Engagement at Northeastern University Library.

Affordable course materials: reducing costs and promoting student success

We all remember textbooks. Memories of those big chunky books organized into chapters and sections, with tons of figures and charts explaining everything there is to know about a discipline. We stayed glued to them throughout each semester for the assigned activities and exercises they included. We studied them front-to-back for midterms and final exams.

From Anthropology to Zoology, textbooks are still used heavily. They are written by experts, reviewed by experts, and published by reputable academic publishers and other media companies—they are reliable. The problem is that prices have risen sharply, students in turn are paying more and must often turn to alternatives or choose different paths in the curriculum if none can be found.

Multiple studies have broken down the rise in the price of textbooks. A study concluded early in the last decade showed that between 2002 and 2012 the price of textbooks increased 82%. Another looked at 2006-2016 and found an 88% increase. More studies are underway. As the price of textbooks rises students are spending more; in the 2018-2019 academic year, students spent over $1200 a year on average on course materials, mostly textbooks.

When students can’t afford new textbooks, they have no alternatives but to pool funds to share books, rent, or purchase used copies, or use a copy on reserve at the library. Sometimes the only option is to purchase a new copy of a required textbook when the book includes accompanying online content in the form of activities, quizzes, or other coursework—a used or shared copy is of no use. Given these factors, in various surveys students have reported making decisions on which courses to enroll in based on what the required textbook(s) will cost.

It is no wonder there is a growing movement to utilize free/open educational content, and Northeastern University Library is on the front lines. Working with faculty and partners across the institution, librarians are helping faculty discover, evaluate, and integrate freely available textbooks and other Open Educational Resources (OERs), many of which are authored and reviewed by experts. In the case of Biology, multiple faculty members discontinued use of costly textbooks in favor of freely accessible, open texts: students enrolled in various Biology courses have saved over 100K since the summer of 2018. In related work, librarians are working to ensure faculty know how to maximize use of library-subscribed content such as online journal articles and e-books through dynamic reading-list creation tools and other services.

The library is actively presenting, creating partnerships, and raising awareness about the issues students face, and the options faculty have for finding and integrating alternatives and utilizing existing library content. Savings will continue to grow as the library works with more departments. The library is proud to be a part of this important movement.

For more information, visit the Affordable Course Materials guide.

Data Fest is coming in February

Since Love Data Week and Endangered Data Week both happen in February, we thought we’d use this month to showcase some of the great data-related services and resources we have to offer here at Snell. We’re calling it Data Fest, and you’re invited!     Here’s a taste of what we have planned: Stop by and lend a hand at our Citizen Science: Health Hackathon Make friends with your command line at our Intro to the Unix Shell workshop Learn how to create impressive charts & data visualizations at our workshops on Tableau and free web-based tools   And more! Check out the full lineup and register here: http://bit.ly/snelldatafest18   

Open Access Week is 10 Years Old!

The theme of this year’s International Open Access Week, “Open in order to…”, highlights the multitude of reasons why Open Access is important to researchers, students, funders, patients, and everyone else who benefits from increased sharing of knowledge. This year marks the 10th celebration of International Open Access Week, held during the last full week of October to advocate for fewer barriers between people and the information they need. At Snell Library, we support Open Access in lots of ways. In 2016, our staff adopted an open access policy for our published research and presentations – you can find them in our Digital Repository Service. These materials have been viewed almost 2,000 times and have been downloaded by readers more than 1,000 times! If you’re a researcher at Northeastern and would like to get started using the DRS to make your work more accessible to readers around the world, it’s easy. Also of interest to researchers: we’ve recently updated the page on our website about Open Access, and it now includes a list of publishers that offer Northeastern-affiliated authors a discount on the article processing charges for publishing open-access with them. Snell Library also supports Open Access journal publishing on campus through Open Journal Systems (OJS). We currently work with four journals being published at Northeastern – including NU Writing, which recently moved over to our OJS system from the platform it was previously using. NU Writing just released their first issue using OJS! And, we support Open Access publishing and sharing through our memberships in initiatives such as the Digital Commonwealth, the Digital Public Library of AmericaHathiTrustKnowledge Unlatched, and SCOAP³. In October 2008, we celebrated the first international Open Access Day at Snell Library. Since then, as the Open Access movement has grown, we’ve expanded our programming as well – first, with Open Access Week, and then in the past two years with Open Access Month in October. This year, we’re expanding the concept even more – we want to highlight openness in research, teaching, scholarship, and creativity throughout the academic year. After all, at this point, open access is something that we should be acknowledging as an established facet of the scholarly ecosystem, rather than a special topic that only gets attention once a year. So, stay tuned for open access–related news and events to come. Banner image and poster openly licensed by SPARC, CC BY 4.0