Gathering the Red Record: A Two-Day Convening on Linking Racial Violence Archives

Last month, an interdisciplinary group of over 100 archivists, legal professionals, and historians gathered at Northeastern University’s Snell Library for Gathering the Red Record: Linking Racial Violence Archives. Presented by the Civil Rights and Restorative Justice Project (CRRJ) and the Northeastern University Library, the two-day convening served to highlight the Version 2.0 update of the Burnham-Nobles Digital Archive (BNDA), the launch of a new research project, and the development of its first white paper.

A smiling woman stands behind a podium holding a piece of paper
Gina Nortonsmith, the African American history archivist at Northeastern University. Photo courtesy of Michael Manning.

The Racial Violence Interoperability White Paper Project will serve as a roadmap exploring the possibility of a national project linking various collections of racial violence into a united, interoperable dataset.

Simultaneously a celebration, a launch, and a call to action, Gathering the Red Record highlighted the newest achievements of the BNDA and asked participants for their input and feedback to design future shared goals.

On the first day of the conference, panelists and attendees were introduced to the extensive expansion of the BNDA and the restorative justice milestones the CRRJ have achieved. Since its initial launch in 2022, the BNDA has established itself as one of the most comprehensive digital records of racially motivated homicides collected to date. The archive serves as an open-source repository and database dedicated to identifying, classifying, and providing documentation on anti-Black killings the mid-20th century South. Version 2.0 introduces 290 new victims to the database, along with their corresponding case files, which resulted in over 5,000 new records becoming publicly available. In addition to a massive expansion in records available, Version 2.0 expands the geographic scope of the archive, adding Maryland, Delaware, Washington D.C., Missouri, West Virginia, Indiana, Kentucky, and Oklahoma to the original 11 formerly Confederate states.

Two women sit in front of a large screen. One is holding a microphone and speaking
Co-founder of the Burnham-Nobles Digital Archive Melissa Nobles and Monica Martinez, project lead for Mapping Violence, speak on The Road to Interoperability White Paper Project. Photo courtesy of Michael Manning.

Day two of the event was dedicated to introducing attendees to The Racial Violence Interoperability White Paper Project and asking for feedback, putting researchers, librarians, and archivists who document historical violence into conversation. Participants were given an early draft, which included instructions on how a national digital project might emerge. Developed in collaboration with eight similar ‘sister’ projects, the paper outlines strategies for aligning data dictionaries, establishing governance, securing funding, and ensuring ethical hosting. Participants then divided into working groups to address project planning and data collection, technology alignment, funding and resources, and federal initiatives on cold case records. The day concluded with conference attendees engaging in guided discussions that explored the feasibility of a national project as described in the White Paper.

As the conference finished, participants were left with possibilities for new collaborations, ideas for funding resources, project design suggestions, and digital publishing possibilities. The fruitful discussions also continue to contribute to the White Paper Project, which is scheduled to be finalized in September.

Special Collections Featured in ICA Boston Watershed Art Installation

A series of red and black threads hanging from the ceiling with folded papers suspended within. Two chairs also sit within the threads
Chiharu Shiota’s “Home Less Home” exhibit featuring reproductions of materials from the Archives & Special Collections. Photo courtesy of Molly Brown.

Reproductions from the Northeastern University Archives and Special Collections are featured as a part of artist Chiharu Shiota’s “Home Less Home” exhibit at the Institute of Contemporary Art (ICA) Boston’s Watershed.

The installations will be on display until September 1.

“Home Less Home” creates the shape of a house with many red and black ropes hung from the ceiling. Suspended within the ropes are records of immigration, such as passports and immigration papers. ICA Boston’s iteration of the installation also draws specifically on Boston history, featuring archival records from institutions across the city that speak to the theme of home and the actions around home: finding a home, leaving home, protecting home, and creating a new home.

Northeastern’s archives brought a unique organizational activism component to the exhibit through our Special Collections’ focus on neighborhood social justice movements. Reference staff worked with ICA Boston curators to find records addressing housing activism and advocacy in Boston’s neighborhoods. The exhibit features records from the following collections:

Paper suspended amid red threads reading "Servicios Humanos"
Records from the Inquilinos Boricuas en Acción. Photo courtesy of Molly Brown.
Papers suspended within hanging red threads
Records from the Phyllis Ryan papers. Photo courtesy of Molly Brown.

















This installation is Shiota’s first in New England and is featured as part of the Boston Public Art Triennial 2025. Check it out before September 1!

Using AI to Automate Library Captioning

Captions play a key role in making audio and video content accessible. They benefit not only deaf and hard-of-hearing users, but also second-language learners, researchers scanning interviews, and anyone viewing content in noisy environments.

At the Northeastern University Library, we manage a growing archive of media from lectures and events to oral histories. Manually creating captions for all of this content is not a scalable solution, and outsourcing the task to third party services can be expensive, time-consuming, and inconsistent. Motivated by this, we have been exploring AI-powered speech-to-text tools that could generate high-quality captions more efficiently and cost-effectively.

Screenshot of a video with a person speaking and a caption reading "There is an enormous need for an expansion of imagination and"
Figure 1: Example of an ideal transcription output

We started by testing Autosub, an open-source tool for generating subtitles. Even using a maintained fork (copies of the original project that add features, fix bugs, or adapt the code for different use cases), Autosub did not offer significant time savings, and it was eventually dropped.

In summer 2023, the team began using OpenAI’s Whisper, which immediately cut captioning time in half. However, it lacked key features like speaker diarization (the process of segmenting a speech signal based on the identity of the speakers), and it often stumbled on long stretches of music or background noise which would require extra cleanup and made the output harder to use at scale.

As the AI for Digital Collections co-op on the Digital Production Services (DPS) team, I was responsible for researching and testing Whisper forks that could be realistically adopted by our team. I tested model performance, wrote scripts to automate captioning, debugged issues, and prepared tools for long-term use within our infrastructure.

Phase 1: Evaluating Whisper Forks

We looked for a model that could:

  • Handle speaker diarization
  • Distinguish between speech and non-speech (music, applause, etc.)
  • Output standard subtitle formats (like VTT/SRT)
  • Be scriptable and actively maintained

We tested several forks, including WhisperX, Faster Whisper, Insanely Fast Whisper, and more. Many were either too fragile, missing key features, or poorly maintained. WhisperX stood out as the most well-rounded: it offered word-level timestamps, basic diarization, reasonable speed, and ongoing development support.

Phase 2: Performance Testing

Once we chose WhisperX, we compared its various models to OpenAI’s original Whisper models, including large-v1, v2, v3, large-v3-turbo, and turbo. We tested six videos, each with different lengths and levels of background noises, and compared the models based on Word Error Rate (WER) (how often the transcription differed from a “gold standard, or human-created or -edited transcript), and processing time (how long it took each model to generate captions).

WhisperX’s large-v3 model consistently performed well, balancing speed and accuracy even on noisy or complex audio. OpenAI’s turbo and large-v3-turbo delivered strong performance but lacked diarization features.

Phase 3: Timestamp Accuracy Evaluation

Next, we assessed how precisely each model aligned subtitles to the actual audio — crucial for usability. We compared outputs from the WhisperX large-v3 model and the OpenAI turbo and large-v3-turbo models.

We used a gold standard transcript with human-reviewed subtitles as our benchmark. For each model’s output, we measured:

  • Start Mean Absolute Error (MAE) — average timing difference between predicted and actual subtitle start times
  • End MAE — same as Start MAE, but for subtitle end times
  • Start % < o.5s — percentage of subtitles with start times less than 0.5 seconds off
  • End % < 0.5s — same for start % < 0.5s, for end times
  • Alignment rate — overall percentage of words correctly aligned in time

WhisperX’s large-v3 model outperformed all other models significantly. In most of our test videos, it showed:

  • Much lower MAE scores for both start and end timestamps
  • Higher percentages of accurately timed subtitles (within the 0.5-second range)
  • Better overall word alignment rates

In fact, in several test cases, WhisperX was nearly three times more accurate than the best-performing OpenAI Whisper models in terms of timing precision.

Coded two-page caption
Figure 2: WhisperX output vs. gold-standard transcript in a high-WER case

In one particular case, one WER result for WhisperX large-v3 showed a surprisingly disappointing score of 94% errors. When I checked the difference log to investigate, it was that the model had transcribed background speech that was not present in the gold standard transcript. So, while it was technically penalized, WhisperX was actually picking up audio that the gold standard did not include. This highlighted both the model’s sensitivity and the limitations of relying solely on WER for evaluating accuracy.

Figure 2 shows exactly that. On the left, WhisperX (denoted “HYP”) transcribed everything it heard, while the gold standard transcript (denoted “REF”) cut off early and labeled the rest as background noise (shown on the right).

What’s Next: Integrating WhisperX

We have now deployed WhisperX’s large-v3 model to the library’s internal server. It’s actively being used to generate captions for incoming audio and video materials. This allows:

  • A significant reduction in manual labor for our DPS team
  • The potential for faster turnaround on caption requests
  • A scalable solution for future projects involving large media archives

Conclusion

As libraries continue to manage growing volumes of audio and video content, scalable and accurate captioning has become essential, not only for accessibility, but also for discoverability and long-term usability. Through this project, we identified WhisperX as a practical open-source solution that significantly improves transcription speed, speaker diarization, and timestamp precision. While no tool is perfect, WhisperX offers a strong foundation for building more efficient and inclusive media workflows in the library setting.

Reflections and Acknowledgements

This project helped me understand just how much thought and precision goes into building effective captioning systems. Tools like WhisperX offer powerful capabilities, but they still require careful evaluation, thoughtful tuning, and human oversight. I am incredibly grateful to have contributed to a project that could drastically reduce the time and effort required to caption large volumes of media, this way enabling broader access and creating long-term impact across the library’s AV collections.

Finally, I would like to thank the Digital Productions Services team for the opportunity and their guidance and support throughout this project — especially Sarah Sweeney, Kimberly Kennedy, Drew Facklam, and Rob Chavez, whose insights and feedback were invaluable.

Reading Challenge Update: July Winner and August Preview

Congratulations to everyone who participated in the July Reading Challenge! Our July winner is Abhigna Police, who won a Paws READ poster. To be eligible for a prize drawing, make sure to read a book that fits the month’s theme and then tell us about it.

This month, we challenged you to read a book of short stories, essays, or poetry. Here are a few of the books you read! (Comments may have been edited for length or clarity.)

What You Read in July

Cover of The Hill We Climb

The Hill We Climb: An Inaugural Poem for the Country, Amanda Gorman
Find it at Snell Library | Read the e-book

“This poem reminds me that even in the darkest times, there’s always a spark of light. Gorman’s voice is powerful but gentle, commanding attention without shouting.” — Aanshi


Cover of Bad Feminist

Bad Feminist: Essays, Roxanne Gay
Find it at Snell Library | Find it at F.W. Olin Library

“Some of the essays felt a little contradictory. I did like the book, as it gave me a new perspective, but I was hoping for more data than opinion.” — Brooke


Cover of The Serviceberry

The Serviceberry: Abundance and Reciprocity in the Natural World, Robin Wall Kimmerer
Find it at Snell Library | Find it at F.W. Olin Library | Read the e-book | Listen to the audiobook

“This is the newest of Dr. Kimmerer’s books, and I really needed the message of reciprocity, mutual flourishing, and abundance right now. Dr. Kimmerer’s writing is absolute artistry. She clearly notices and appreciates the connectedness of life. I read this right after reading Ezra Klein and Derek Thompson’s Abundance, and I really needed something that was less policy-minded and more dedicated to beauty. This was the perfect companion!” — Sam

Cover of I, Robot

I, Robot, Isaac Asimov
Find it at F.W. Olin Library

“I had a really fun time reading I, Robot. It explores technology and ethics through fun little machine stories. One of my favorites was about a robot named Cutie, who decides that humans couldn’t have built him, and instead creates his own belief system centered around a greater power. As someone in tech and engineering, these stories really made me think about robots and how they can reflect human nature.” — Shree

Cover of Unaccustomed Earth

Unaccustomed Earth, Jhumpa Lahiri
Find it at F.W. Olin Library

“I typically don’t enjoy short stories, but I will read anything by Jhumpa Lahiri, and this book made me consider reading more short stories. Lahiri is a master storyteller and keeps culture at the forefront of it all.” — Rachel


Suggested Reads for August

Your August challenge is to read a book translated from another language. Check out our recommended e-book and audiobook titles in Libby, or stop by the Snell Library lobby from 1 – 3 p.m. on Wednesday, August 20, and Thursday, August 21, to browse print books and pick up Reading Challenge swag!

Cover of Butter

Butter: A Novel of Food and Murder, Asako Yukuzi (translated by Polly Barton)
Find it at Snell Library | Read the e-book

Gourmet cook Manako Kajii sits in the Tokyo Detention House convicted of the serial murders of lonely businessmen. The case has captured the nation’s imagination, but Kajii refuses to speak with the press, entertaining no visitors. That is until journalist Rika Machida writes a letter asking for her recipe for beef stew, and Kajii can’t resist writing back. As visits unfold between Rika and the steely Kajii, Rika begins to wonder: do she and Kajii have more in common than she once thought?

Cover of The Café with No Name

The Café with No Name, Robert Seethaler (translated by Katy Derbyshire)
Listen to the audiobook

Summer 1966. Raised in a home for war orphans, Robert Simon has grown into a warm-hearted, hard-working, and determined man. When the former owners of the corner café in Vienna’s Carmelite market square shutter the business, Robert sees that the chance to realize his dream has arrived. The place, dark and dilapidated, is in a poor neighborhood of the Austrian capital, but the air is filled with a desire for renewal. Robert refurbishes the café and customers arrive. Some are in search of company, others long for love, and some seek a place where they can feel understood. As the city is transformed, Robert’s café becomes at once a place of refuge and one from which to observe, mourn, and rejoice.

Cover of The Empusium

The Empusium: A Health Resort Horror Story, Olga Tokarczuk (translated by Antonia Lloyd-Jones)
Find it at Snell Library | Find it at F.W. Olin Library | Listen to the audiobook

September 1913. A young Pole suffering from tuberculosis arrives at Wilhelm Opitz’s Guesthouse for Gentlemen in the village of Görbersdorf, a health resort in the Silesian mountains. Every evening, the residents gather in imbibe the hallucinogenic local liqueur and debate the great issues of the day. Meanwhile, disturbing things are happening in the guesthouse and the surrounding hills. Someone—or something— seems to be watching, attempting to infiltrate this cloistered world. Little does the newcomer realize, as he tries to unravel both the truths within himself and the mystery of the sinister forces beyond, that they have already chosen their next target.

Cover of My Name is Emilia del Valle

My Name is Emilia del Valle, Isabel Allende (translated by Frances Riddle)
Listen to the audiobook

In San Francisco in 1866, an Irish nun, abandoned following a torrid relationship with a Chilean aristocrat, gives birth to a daughter named Emilia del Valle. Raised by a loving stepfather, Emilia grows into an independent thinker and a self-sufficient young woman. As an adult, Emilia becomes a journalist, convincing an editor at The Daily Examiner to hire her. When an opportunity arises to cover a brewing civil war in Chile, she seizes it, and as the war escalates, Emilia finds herself in extreme danger, fearing for her life and questioning her identity and her destiny.

Cover of The Morning Star

The Morning Star, Karl Ove Knausgaard (translated by Martin Aitkin)
Read the e-book

One long night in August, Arne and Tove are staying with their children in their summer house in southern Norway. Their friend Egil has his own place nearby. Kathrine, a priest, is flying home from a Bible seminar, questioning her marriage. Journalist Jostein is out drinking for the night, while his wife Turid, a nurse at a psychiatric care unit, is on a night shift when one of her patients escapes. Above them all, a huge star suddenly appears blazing in the sky. It brings with it a mysterious sense of foreboding. Strange things start to happen as nine lives come together under the star.

Whatever you read, make sure to tell us about it to enter the August prize drawing. Good luck, and happy reading!

Box By Box: Inventorying the Boston Globe Big Dig Records

The Big Dig, a major infrastructure project that aimed to improve traffic flow, dominated the Boston area throughout its construction for 15 years and led to countless articles and columns in the Boston Globe. Former Globe reporters and editors Tom Palmer and Sean Murphy, who both worked at the newspaper for over 30 years, donated their extensive records to the Northeastern University Archives and Special Collections, providing a glimpse into the planning and construction of the Big Dig project. (NUASC holds multiple other collections relating to the Big Dig, as well.)

Aries Peralta, wearing a black jacket, gray baseball cap, and glasses, pulls a box off a shelf in the archives
Aries Peralta works in the Archives and Special Collections. Photo courtesy of Molly Brown

The initial planning of the Big Dig, officially named the Central Artery/Tunnel Project, began in 1982 and actual construction occurred from 1991-2006. The donated records contained articles by both Palmer and Murphy, as well as a third reporter, Charles Sennott.

I find this collection interesting because it is not just a compilation of articles published in the Boston Globe; it consists of the research and reference materials amassed for use in reporting on the issues surrounding the Big Dig. The records reveal the vast context and information a journalist would need to know in order to write cohesive articles, including contracts, technical reports, financial statements, photographs, maps, articles from other news sources, and more.

Below are some selected items to highlight the extent of the collection.

Big Dig Contract Map

Map of Boston with different colored lines representing streets and highways. Various area are labeled with contract numbers


The contracts in progress map is a snapshot of the various contracts happening at one time in downtown Boston and serves as a great visualization of the contracts’ physical locations. It also helps associate the technical contract number with the more publicly known name of any given section of the project, such as the Ted Williams Tunnel identified as contract number C07A1.

The Big Dig Blame Game

Illustrated graphic of a man standing behind a podium, with the neck and tongue of a snake. He is holding a megaphone and is surrounded by a red curtain and creepy clowns driving bumper cars, one of which is holding a shovel. A yellow banner at the top reads "The BIG Dig"


As Massachusetts Governor from 1997-2001, Paul Cellucci was the subject of countless voiced opinions about his tenure and leadership during the Big Dig project. This image of Cellucci as a snake accompanied an article published in a 2000 issue of Boston Magazine that suggested cost overruns were caused by a collective failure of key players, including Cellucci, for not properly managing the project.

A Fifth-Grader’s Opinion on the Big Dig

A piece of notebook paper with a letter written in a child's handwriting: "9/15/97 Dear Globe, I've never seen the big dig but I think it should help Boston. It is horrible traffic in Boston. If the big dig dosn't help it will seem like a wast of 10 billion dollars. Joey LeBlanc Medfield Ma. Dale St. School Grade 5"


Often stuck in traffic with their parents or simply by living in nearby neighborhoods, local students were also affected by the Big Dig project. The Student Newsline section in the Boston Globe presented an opportunity for students to send in their own opinions about the project. Many students offered their own ideas to quickly finish and reduce the costs of the project.

2006 Ceiling Collapse

A gloved hand holds a tape measure to a concrete ceiling, measuring the length of screws sticking down.
A worker stands in a crawlspace above the ceiling of a tunnel, surrounded by concrete and bars.












Reporting on the construction of the Big Dig included documenting tragedies. In 2006, a ceiling panel fell on a car in the Fort Point Channel Tunnel, killing a passenger and injuring the driver. Their family and the public wanted answers as to how the incident could have occurred. As a result, the Boston Globe undertook an in-depth investigation to report and provide answers. These photographs may have been taken to document the other ceiling panels in the rest of the tunnel after the accident occurred.

To learn more about accessing the Boston Globe Big Dig records, email the Northeastern University Archives and Special Collections at archives@northeastern.edu.

Aries Peralta (he/him) recently graduated from Simmons University with an MS in Library and Information Science with a concentration in archives management. He received his BA in history from the University of Connecticut.