The installations will be on display until September 1.
“Home Less Home” creates the shape of a house with many red and black ropes hung from the ceiling. Suspended within the ropes are records of immigration, such as passports and immigration papers. ICA Boston’s iteration of the installation also draws specifically on Boston history, featuring archival records from institutions across the city that speak to the theme of home and the actions around home: finding a home, leaving home, protecting home, and creating a new home.
Northeastern’s archives brought a unique organizational activism component to the exhibit through our Special Collections’ focus on neighborhood social justice movements. Reference staff worked with ICA Boston curators to find records addressing housing activism and advocacy in Boston’s neighborhoods. The exhibit features records from the following collections:
Captions play a key role in making audio and video content accessible. They benefit not only deaf and hard-of-hearing users, but also second-language learners, researchers scanning interviews, and anyone viewing content in noisy environments.
At the Northeastern University Library, we manage a growing archive of media from lectures and events to oral histories. Manually creating captions for all of this content is not a scalable solution, and outsourcing the task to third party services can be expensive, time-consuming, and inconsistent. Motivated by this, we have been exploring AI-powered speech-to-text tools that could generate high-quality captions more efficiently and cost-effectively.
Figure 1: Example of an ideal transcription output
We started by testing Autosub, an open-source tool for generating subtitles. Even using a maintained fork (copies of the original project that add features, fix bugs, or adapt the code for different use cases), Autosub did not offer significant time savings, and it was eventually dropped.
In summer 2023, the team began using OpenAI’s Whisper, which immediately cut captioning time in half. However, it lacked key features like speaker diarization (the process of segmenting a speech signal based on the identity of the speakers), and it often stumbled on long stretches of music or background noise which would require extra cleanup and made the output harder to use at scale.
As the AI for Digital Collections co-op on the Digital Production Services (DPS) team, I was responsible for researching and testing Whisper forks that could be realistically adopted by our team. I tested model performance, wrote scripts to automate captioning, debugged issues, and prepared tools for long-term use within our infrastructure.
Phase 1: Evaluating Whisper Forks
We looked for a model that could:
Handle speaker diarization
Distinguish between speech and non-speech (music, applause, etc.)
Output standard subtitle formats (like VTT/SRT)
Be scriptable and actively maintained
We tested several forks, including WhisperX, Faster Whisper, Insanely Fast Whisper, and more. Many were either too fragile, missing key features, or poorly maintained. WhisperX stood out as the most well-rounded: it offered word-level timestamps, basic diarization, reasonable speed, and ongoing development support.
Phase 2: Performance Testing
Once we chose WhisperX, we compared its various models to OpenAI’s original Whisper models, including large-v1, v2, v3, large-v3-turbo, and turbo. We tested six videos, each with different lengths and levels of background noises, and compared the models based on Word Error Rate (WER) (how often the transcription differed from a “gold standard, or human-created or -edited transcript), and processing time (how long it took each model to generate captions).
WhisperX’s large-v3 model consistently performed well, balancing speed and accuracy even on noisy or complex audio. OpenAI’s turbo and large-v3-turbo delivered strong performance but lacked diarization features.
Phase 3: Timestamp Accuracy Evaluation
Next, we assessed how precisely each model aligned subtitles to the actual audio — crucial for usability. We compared outputs from the WhisperX large-v3 model and the OpenAI turbo and large-v3-turbo models.
We used a gold standard transcript with human-reviewed subtitles as our benchmark. For each model’s output, we measured:
Start Mean Absolute Error (MAE) — average timing difference between predicted and actual subtitle start times
End MAE — same as Start MAE, but for subtitle end times
Start % < o.5s — percentage of subtitles with start times less than 0.5 seconds off
End % < 0.5s — same for start % < 0.5s, for end times
Alignment rate — overall percentage of words correctly aligned in time
WhisperX’s large-v3 model outperformed all other models significantly. In most of our test videos, it showed:
Much lower MAE scores for both start and end timestamps
Higher percentages of accurately timed subtitles (within the 0.5-second range)
Better overall word alignment rates
In fact, in several test cases, WhisperX was nearly three times more accurate than the best-performing OpenAI Whisper models in terms of timing precision.
Figure 2: WhisperX output vs. gold-standard transcript in a high-WER case
In one particular case, one WER result for WhisperX large-v3 showed a surprisingly disappointing score of 94% errors. When I checked the difference log to investigate, it was that the model had transcribed background speech that was not present in the gold standard transcript. So, while it was technically penalized, WhisperX was actually picking up audio that the gold standard did not include. This highlighted both the model’s sensitivity and the limitations of relying solely on WER for evaluating accuracy.
Figure 2 shows exactly that. On the left, WhisperX (denoted “HYP”) transcribed everything it heard, while the gold standard transcript (denoted “REF”) cut off early and labeled the rest as background noise (shown on the right).
What’s Next: Integrating WhisperX
We have now deployed WhisperX’s large-v3 model to the library’s internal server. It’s actively being used to generate captions for incoming audio and video materials. This allows:
A significant reduction in manual labor for our DPS team
The potential for faster turnaround on caption requests
A scalable solution for future projects involving large media archives
Conclusion
As libraries continue to manage growing volumes of audio and video content, scalable and accurate captioning has become essential, not only for accessibility, but also for discoverability and long-term usability. Through this project, we identified WhisperX as a practical open-source solution that significantly improves transcription speed, speaker diarization, and timestamp precision. While no tool is perfect, WhisperX offers a strong foundation for building more efficient and inclusive media workflows in the library setting.
Reflections and Acknowledgements
This project helped me understand just how much thought and precision goes into building effective captioning systems. Tools like WhisperX offer powerful capabilities, but they still require careful evaluation, thoughtful tuning, and human oversight. I am incredibly grateful to have contributed to a project that could drastically reduce the time and effort required to caption large volumes of media, this way enabling broader access and creating long-term impact across the library’s AV collections.
Finally, I would like to thank the Digital Productions Services team for the opportunity and their guidance and support throughout this project — especially Sarah Sweeney, Kimberly Kennedy, Drew Facklam, and Rob Chavez, whose insights and feedback were invaluable.
Aries Peralta works in the Archives and Special Collections. Photo courtesy of Molly Brown
The initial planning of the Big Dig, officially named the Central Artery/Tunnel Project, began in 1982 and actual construction occurred from 1991-2006. The donated records contained articles by both Palmer and Murphy, as well as a third reporter, Charles Sennott.
I find this collection interesting because it is not just a compilation of articles published in the Boston Globe; it consists of the research and reference materials amassed for use in reporting on the issues surrounding the Big Dig. The records reveal the vast context and information a journalist would need to know in order to write cohesive articles, including contracts, technical reports, financial statements, photographs, maps, articles from other news sources, and more.
Below are some selected items to highlight the extent of the collection.
Big Dig Contract Map
The contracts in progress map is a snapshot of the various contracts happening at one time in downtown Boston and serves as a great visualization of the contracts’ physical locations. It also helps associate the technical contract number with the more publicly known name of any given section of the project, such as the Ted Williams Tunnel identified as contract number C07A1.
The Big Dig Blame Game
As Massachusetts Governor from 1997-2001, Paul Cellucci was the subject of countless voiced opinions about his tenure and leadership during the Big Dig project. This image of Cellucci as a snake accompanied an article published in a 2000 issue of Boston Magazine that suggested cost overruns were caused by a collective failure of key players, including Cellucci, for not properly managing the project.
A Fifth-Grader’s Opinion on the Big Dig
Often stuck in traffic with their parents or simply by living in nearby neighborhoods, local students were also affected by the Big Dig project. The Student Newsline section in the Boston Globe presented an opportunity for students to send in their own opinions about the project. Many students offered their own ideas to quickly finish and reduce the costs of the project.
2006 Ceiling Collapse
Reporting on the construction of the Big Dig included documenting tragedies. In 2006, a ceiling panel fell on a car in the Fort Point Channel Tunnel, killing a passenger and injuring the driver. Their family and the public wanted answers as to how the incident could have occurred. As a result, the Boston Globe undertook an in-depth investigation to report and provide answers. These photographs may have been taken to document the other ceiling panels in the rest of the tunnel after the accident occurred.
To learn more about accessing the Boston Globe Big Dig records, email the Northeastern University Archives and Special Collections at archives@northeastern.edu.
Aries Peralta (he/him) recently graduated from Simmons University with an MS in Library and Information Science with a concentration in archives management. He received his BA in history from the University of Connecticut.
Two radio program collections available in the Digital Repository Service (DRS) — Issue and Inquiry and Urban Confrontation — document social progress and unrelenting difficulties within American cities in 1970-71. Airing on Northeastern University’s radio station WRBB, the programs were produced the university’s now-defunct Division of Instructional Communication. (Urban Confrontation noted that it ended in 1971 for financial reasons.)
Episodes were primarily hosted by Joseph R. Baylor and feature interviewees from across the United States discussing wide-ranging topics. From the threat of nuclear warfare to the farm labor rights movement, from the “longhair” youth subculture to de-facto school segregation, these episodes present a sweeping view of both common anxieties and optimistic ideas about the future of city life.
As a metadata assistant in Digital Production Services, I performed a survey of the episodes and their associated metadata records. This helped me understand how descriptive information should appear in the DRS. For example, I investigated how titles, creators, subjects, and abstracts should be recorded for each episode. Next, I created an editing plan, performed batch edits, and carefully listened to each episode. As I listened, I recorded accurate information about the episodes so it could be updated in the DRS.
I selected two interesting episodes to highlight here, but be sure to check out the full collection for more episodes.
In this episode from 1970, Al Weingand, Bob Solan, and Dick Smith discuss a Union Oil offshore drilling well explosion that occurred on January 28, 1969, expelling two million gallons of uncontrolled oil into Santa Barbara Channel off the coast of California. Topics include the oil’s effect on tourism, local economy, wildlife, fishing, and environmental safety concerns.
Weingand, a Santa Barbara resident and former California legislative member, explains that no other disasters can compare to the devastation of the oil pollution. Smith, a reporter for the Santa Barbara News Press, calls for greater investment in tourist value of beaches, saying that offshore oil well spills are dangerous both environmentally and economically. Solan, another reporter for the News Press, covers the psychological benefits of beautiful surroundings for Santa Barbara residents.
This episode was produced in a time of evolving standards for environmental safety and presents an intimate view of lives affected by oil pollution.
“The business that I am about is resurrecting that dormant conscious pride that Black people have had and should have.” — Elma Lewis (4:57)
In this episode, airing in 1970, arts educator and activist Elma Lewis discusses the intertwined histories of Black labor and Black cultural impact in America. She speaks critically of modern art because she says it lacks a basis in life experience. This, Lewis explains, is why Black contributions to American culture transcend art and extend to labor and life experience, which has formed the basis of American society. Throughout the program, Baylor asks Lewis to respond to common racist comments about Black culture. Despite Baylor’s insistence that Lewis speak to his white audience, she intentionally denies this request. Laughing, she replies, “I don’t answer nonsense. I’m not in the business of answering nonsense.”
I wanted to highlight these two episodes because they made me think deeply about both everyday problems and large socio-political injustices which continue to affect us today. “Oil in Santa Barbara” presents opinions from concerned community members in California. It focuses on their reaction to environmental pollution, showing common anxieties about business success, health, and the beauty of their local natural environment. By contrast, “Afro-American Culture” features distinguished Black arts educator Elma Lewis. She discusses fine arts movements, while also celebrating Black joy and artistry in the face of wide-scale systemic racism.
I greatly enjoyed the opportunity to help make these shows available in the DRS. Both Issue and Inquiry and Urban Confrontation hold potential research value for those interested in viewing snapshots of American life in the early 1970s.
Chelsea McNeil served as a part-time metadata assistant in Digital Production Services.
The Digital Repository Service (DRS) is an institutional repository that was designed by the Northeastern University Library to help members of the Northeastern community organize, store, and share the digital materials that are important to their role or responsibilities at the university. This can include scholarly works created by faculty and students; supporting materials used in research; photographs and documents that represent the history of the community; or materials that support the day-to-day operations of the university.
While the DRS itself is a technical system that stores digital files and associated information to help users find what they need, we also consider the DRS to be a service for the university community: library staff are here to help you organize, store, share, and manage the digital materials that have long-lasting value for the university community and beyond.
Published research from the Northeastern community available in the DRS.
Northeastern is not alone in this endeavor. Repository services are now standard practice for most academic institutions, including Harvard University Library (who also use the name “Digital Repository Service”), Stanford University Library (a leader in technical development for repository systems), Tufts Libraries, and other institutions around the world.
Who uses the DRS?
The DRS has been used by faculty, staff, students, and researchers from all corners of the university community for 10 years. There are too many use cases to mention in one brief blog post, but here are some trends we’ve seen in what users choose to deposit the last few years.
Publications and data that supports published research
Event recordings, photographs, newspapers, and almost any kind of material you can think of to support the day-to-day operations and activity at the university
Student research projects and classwork, like oral histories and research projects. Students are also required to contribute their final version of their thesis or dissertation.
Digitized and born-digital records from the Archives and Special Collections, including photographs, documents, and audio and video recordings
These files, and all the other audio, video, document, and photograph files in the DRS, have been viewed or downloaded 11.2 million times since the DRS first launched in 2015. Nearly half of the files in the DRS are made available to the public and are therefore available for the wider world to discover. Materials in the DRS have been cited in reporting by CNN, Pitchfork, WBUR, and Atlas Obscura, among others, and are regularly shared on social media or in Reddit threads. As a result, Northeastern continues to contribute the work produced here to the larger scholarly and cultural record, and to the larger world.
Who supports the DRS?
The day-to-day work managing, maintaining, and supporting users of the service comes from staff in Digital Production Services:
Kim Kennedy supervises the digitization of physical materials and processing of born-digital and digitized materials.
Drew Facklam and Emily Allen create and maintain the descriptive metadata that helps you find what you need.
And all of us in the department, including part-time staff, are responsible for general management of the system, including batch ingesting materials, holding consultations and training sessions, answering questions, and leading conversations about how to improve the system and the service.
Sarah Sweeney and David Cliff, DRS staff, posing in 2015 with the homepage of the recently launched DRS.
The DRS is also supported by a number of library staff members across the library:
David Cliff, Senior Digital Library Developer in Digital Infrastructures, is the DRS’ lead developer and system administrator.
Ernesto Valencia and Rob Chavez from the Library Technology Services and Infrastructure departments also provide development support and system administration.
Many librarians in the Research and Instruction department do outreach about the service and support faculty as they figure out how to use it in their work.
Jen Ferguson from Research Data Services also connects faculty and researchers to the DRS, while also providing data management support for those wishing to use the DRS to store their data.
Members of the library administration, including Dan Cohen, Evan Simpson, Tracey Harik, and the recently retired Patrick Yott have contributed their unwavering support and advocacy for developing and maintaining system an service.
We are all here to help you figure out how the DRS may be used to make your work and academic life easier. To dive deeper into what the DRS is and how to use it, visit the DRS subject guide or contact me or my team.