Library News

The HistoryMakers Digital Archive: An Essential Resource for African-American History

Looking for a primary source for an essay or digital project? Do you want to know more about, say, the Montgomery Bus Boycott from someone who lived through it? Or are you just bored and looking for something educational to watch? Well, dear reader, have I got the archive for you.

I’d like to present to you the HistoryMakers Digital Archive, a video collection of oral history interviews that is available to all Northeastern students, faculty, and staff. With a focus on African-American history, the Digital Archive is a resource that can be both useful and fascinating to everyone in academia, even if they’re not studying history.

A collage of notable African Americans surrounding the HistoryMakers logo

So, what is oral history? It’s certainly not the history of public speaking or how humans dealt with cavities, nor is it simply anecdotes passed by word of mouth. The Oral History Association defines it as “a field of study and a method of gathering, preserving and interpreting the voices and memories of people, communities, and participants in past events.” Besides being the oldest form of history-gathering, oral history holds special significance to African Americans and other groups of the African diaspora: Not only do many African peoples have long, storied traditions—perhaps most famously, that of the West African griot—that venerate the keepers of oral history as professionals who are just as vital to the community as the soldier or the healer. Further, due to historical laws that either made it illegal or difficult for African Americans to be taught how to read and write, oral history has been one of the crucial ways that we can learn about certain events and periods. For example, during the Great Depression, the U.S. government commissioned a collection of oral history interviews from formerly enslaved people across 17 states. The collection of transcribed interviews, which is available online, is an incredibly valuable resource in broadening your understanding of the experiences of Black people during slavery.

The HistoryMakers Digital Archive follows in this honorable tradition. It compiles oral history interviews with nearly 2,700 historically significant Americans of African descent, designated as “HistoryMakers.” They’re significant for a variety of reasons, but all have made some notable contribution to the fields of medicine, art, music, politics, technology, science, literature, journalism, and more.

The archive includes interviews that provide insight into the lives and deeds of some of the most well-known people in the world—John Lewis, Whoopi Goldberg, Angela Davis, Harry Belafonte, Barack Obama—as well as many other fascinating folks worth learning about who you might not have known about. For instance, there’s Elma Lewis, a Roxbury native who founded her own art school here in Boston. There’s Ed Bullins, a noted playwright and former professor at Northeastern. And then there’s Sylvester Monroe, a journalist who recounts the perils he faced while covering the desegregation of schools in Boston. Heck, I even found an interview from William Ward, the former mayor of my hometown of Chesapeake, Virginia. And that’s just scratching the surface. You can watch interviews from literally thousands of HistoryMakers, each of which offer their own take on their fields, their lives, and the historical events that shaped them.

Part of the beauty of the Digital Archive is how simple it is to use: after spending just a handful of minutes on the website, you’ll more than likely get the hang of it. But if you’d like a step-by-step guide on how to access, navigate, and utilize it, I’ve created a LibGuide that will hopefully be helpful.

In addition, HistoryMakers is hosting a contest in honor of Black History Month. Learn more and sign up here.

Have any further questions about the Digital Archive? You can contact me directly at moyler.h@northeastern.edu or send a note attached to a carrier pigeon to [redacted] Street in Mission Hill.

Sourcery partnership receives $805,000 Andrew W. Mellon Grant

A partnership of various libraries and archives, led by Greenhouse Studios at the University of Connecticut and including the Northeastern University Library, has recently been awarded a $805,000 grant from the Andrew W. Mellon Foundation.

The two-year grant will support the continued development and outreach of Sourcery, “a mobile application that streamlines the scanning of remote of archival materials, provides better connections between researchers and archivists, and offers new and more equitable pathways for archival research.”

According to Greenhouse Studios: “Sourcery is an open-source web application that expands access to non-digitized archival sources. The app, developed by Greenhouse Studios and supported by the non-profit Corporation for Digital Scholarship (CDS), is accessible on any device connected to the Internet. Sourcery provides archivists with a streamlined reference scanning workflow, payment processing services, and analytics on document requests. It provides researchers with a single interface for placing document requests across multiple remote repositories–a practice that has taken on new urgency during this time of limited in-person access to collections.”

Northeastern University Library is one of three partner repositories from which researchers can request documents. The others are Hartford Public Library and the University of Connecticut Archives and Special Collections. A fourth repository—Folger Shakespeare Library—will join the partnership upon completion of a renovation in 2023.

The grant is the second awarded to the group for the Sourcery project, after an initial Andrew W. Mellon Foundation grant in 2020.

The Faces Behind History: Working for the CRRJ

Students of history become familiar with the vast array of human accomplishments. With that knowledge also comes an understanding of human cruelty and racial violence: a perspective humanity shies away from. Perhaps one of the greatest examples of social depravity was in the Jim Crow-era South, a topic I only knew from textbooks and lectures. Working for the Civil Rights and Restorative Justice Project completely changed my awareness of the subject and opened my eyes to the importance of restorative justice.

The CRRJ (part of the Northeastern University School of Law) spearheads a variety of projects meant to bring justice to the victims of racial crimes. Examples of restorative justice include public apologies, memorials, and reconciliation through education. The efforts of the CRRJ not only provide closure and honorable memory to the families of victims, but also valuable opportunities for law students to advance in their field.

Scanned black-and-white photo of George Stinney, a young boy wearing a dark jacket and hat
George Stinney The Civil Rights and Restorative Justice Project, “George Stinney,” Year End Report 2014, 12.

An essential part of the CRRJ’s efforts is the Burnham-Nobles Archive. With an abundance of records (such as police records and death certificates, among many others), the archive serves as the CRRJ’s central hub of information. The latest archive project (set to unveil in 2022) is to transform this data into an interactive and accessible platform that is open to students, researchers, and families. Blending academia, restorative justice, and technology isn’t an easy feat, but it is a relevant and necessary undertaking in today’s society.

I was hired in April of 2021 to work part time assisting the CRRJ’s Burnham-Nobles Archive. I was interested in the position as I recently entered an MA program in Public History and want to work in the archival field. Before working with the project, I was simply passionate about doing archive-related tasks: I didn’t quite realize the breadth of the CRRJ’s project.

It was not until I started doing actual work that I realized the depths of the horror that was the Jim Crow South. It’s one thing to learn about racial violence, but it’s entirely different to work “face to face” with it. One of my first assigned projects was to code cases from Alabama according to the CRRJ’s v1 data dictionary. This seemed straightforward until I began learning about each victim’s story, their age, and their manner of death. Suddenly, the task had taken on a new level of importance: these weren’t faceless victims of race crimes. They were children, parents, siblings, soldiers, students, and workers—human beings senselessly cut down and unprotected by the law. A tragic example is 14-year-old George Stinney (above), a young boy sent to the electric chair on an unfounded accusation of the murder of two white girls.

Today, my outlook on the project is entirely different, and I have learned so much about the history of racial violence in the South, as well as the important connection between archives, history, and social justice. I have worked on a variety of assignments for the CRRJ, including coding work, GeoNames verifications, case abstract extraction/organization, and work on AirTable.

Working for the CRRJ has been essential for my Public History studies because it has given me the “human” element so often missing from the academic world. While I have learned about racial injustice and violence in the past, working for the CRRJ has allowed me to see each incident on an individual level. Additionally, I feel as if I am actually doing something with my work. Rather than just learning about what happened in the Jim Crow era, I feel that my work is helping the CRRJ accomplish its restorative goals to bring justice to the victims.

The CRRJ and the Burnham-Nobles Archives are leaders in the restorative justice movement, and they have given me valuable experience on both a technical level and a deeply human level.

Library Digital Scholarship Group and NULab receive $500,000 NEH grant

The Northeastern University Library’s Digital Scholarship Group and the NULab for Texts, Maps, and Networks received a $500,000 grant from the National Endowment for the Humanities as part of the NEH’s American Rescue Plan program.

The American Rescue Plan aims to provide funding to organizations conducting humanities projects that were adversely affected by the coronavirus pandemic. The grant awarded to the DSG and NULab is specifically focused on supporting humanities organizations.

This grant will help fund a series of digital projects currently underway through the DSG and NULab, but that were delayed or postponed due to the COVID-19 pandemic. It will support efforts to conduct collaborative research, digitize and process archival materials, create metadata, increase web accessibility, and more, while creating many graduate and undergraduate student research positions to conduct this work.

The projects that will benefit from this grant all involve collaborative engagement with communities outside of Northeastern, with many of them focused on resources related to underrepresented groups and social justice efforts. These include:

The grant also includes funding for additional projects organized through the NULab.

Julia Flanders, the director of the Digital Scholarship Group, is excited to get started: “We are honored and energized by this award. It creates wonderful research opportunities for students and will help the entire digital humanities ecology at Northeastern.”

A brief overview of machine learning practices for digital collections

Northeastern University Library’s procedure for digitizing physical materials utilizes a few different workflows for processing print documents, photographs, and analog audio and video recordings. Each step in the digitization workflow, from collection review to scanning to metadata description, is performed with thorough attention to detail, and it can take years to completely process a collection. For example, the approximately 1.6 million photographs in The Boston Globe Library collection held by the Northeastern University Archives and Special Collections may take several decades to complete!

What if some of these steps could be improved by using artificial intelligence technologies to complete portions of the work, freeing staff to focus more effort on the workflow elements that require human attention? Read on for a very brief overview of artificial intelligence and three potential options for processing The Boston Globe Library collection and other digital collections held by the Library.

A three-part cycle, with "Input" leading to "Model Learns and Predicts" leading to "Response" leading back to "Input"

What is artificial intelligence and machine learning?
Artificial intelligence (AI) is a broad term used for many different technologies that attempt to emulate human reasoning in some way. Machine learning (ML) is a subset of AI where a program is taught how to learn and reason on its own. The program learns by using an algorithm to process existing data and find patterns. Every pattern prediction is evaluated and scored according to how accurate the prediction may or may not be until the predictions reach an acceptable level of accuracy.

ML may be supervised or unsupervised, depending on the type of result needed. Supervised learning is when instructions are provided to assist the algorithm to learn how to identify patterns expected to the researcher. Unsupervised learning is when the algorithm is fed data and discovers its own patterns that may be unknown to the researcher.

Ethics
As we undertake this work, it is important to be aware that AI technologies are human-made and therefore human biases are embedded directly within the technology itself. Because AI technologies can be employed at such a large scale, the potential for negative impact caused by these biases is greater than with tools that require standard human effort. Although it is tempting to adopt and employ a useful technology as quickly as possible, this is an area of research where it is imperative that we make sure the work aligns with our institutional ethics and privacy practices before it is implemented.

What AI or ML techniques could be used to help process digital collections?
OCR: The most widely known and used form of AI in digital collections practices may be recognition of printed text using Optical Character Recognition, or OCR. OCR is the process of analyzing printed text and extracting the text objects, like letters, words, sentences. The results may be embedded directly in the file, like a PDF with OCR’d text, or stored separately, like in a METS-ALTO file, or both.

Screenshot of the front page of the Winchester News
Image source: Screenshot of an OCR page of The Winchester News with METS-ALTO encoding opened in AltoViewer.

OCR works rather well for modern text documents, especially those in English, but a particular challenge for OCR is historical documents. For more about this challenge, I recommend A Research Agenda for Historical and Multilingual OCR, a fairly recent report published by NULab.

A screenshot of a search result that reveals the result was returned because the search term matched OCR'd text within the document.

We can already see the benefit of using OCR in the library’s Digital Repository Service, as files with OCR text embedded in the file have the full text extracted and stored alongside the text file. That text is indexed and improves discoverability of text files by retrieving files that match search terms in the file’s metadata or the full text.


The back of a photograph from the Boston Globe Library Collection, featuring difficult-to-read handwritten descriptions.
Digitized back of a photograph from The Boston Globe Library collection.

HTR: Handwritten Text Recognition, or HTR, is like OCR, but for handwritten, not typewritten, text. Handwriting is very unique to an individual and poses a difficult challenge for teaching machines to interpret it. HTR relies heavily on having lots of data to train a model (in this case, lots of digitized images of handwriting), so even once a model is accurately trained on one set of handwriting, it may not be useful for accurately interpreting another set. Transkribus is a project attempting to navigate this challenge by creating training sets for batches of handwriting data. Researchers submit at least 100 transcribed images for a particular handwriting set to Transkribus and Transkribus uses that set as training data to create an HTR model to process the remaining corpus of handwritten text. HTR is appealing for the Boston Globe collection, as the backs of the photographs contain handwritten text describing the image, including the photographer name, date the photograph was taken, classification information, and perhaps a description or an address.

Computer Vision: Computer vision refers to AI technologies that allow machines to work with images and video, essentially training a machine to “see”. This type of AI is particularly challenging because it requires the machine to learn how to observe and analyze a picture and understand the content. Algorithms for computer vision are trained to identify patterns of different objects or people and attempt to accurately sort and identify the patterns. In a picture of the Northeastern campus, for example, a computer vision algorithm may be able to identify building objects or people objects or tree objects.

A black and white photograph of a man being arrested by two police officers next to an analysis of the photo's contents: Footwear (98%); Shoe (96%); Gesture (85%); Style (84%); Military Person (84%); Black-and-white (84%); Military Uniform (80%); Cap (80%); Hat (78%); Street Fashion (75%); Overcoat (75%)
Result of Google Cloud’s Vision API analysis for a black and white photograph.

When used in digital collections workflows, the output produced by computer vision tools will need to be evaluated for its usefulness and accuracy. In the above example, the terms returned to describe the image are technically present in the photo (the subjects are wearing shoes and hats and overcoats), but the terms do not adequately capture the spirit of the image (a person being detained at a demonstration).

There are a lot of ethical concerns about using computer vision, especially for recognizing faces and assigning emotions. If we were to employ this particular technology, it may be able to generate keywords or other descriptive metadata for the Boston Globe collection that may not be present on the back of an image, but we would need to be careful to make sure that the process does not embed problematic assessments into the description, like describing an image of a protest as a riot.

Computer vision is already being employed in some digital collection workflows. Carnegie Mellon University Libraries has developed an internal tool called CAMPI to help archivists enhance metadata. An archivist uses the software to tag selected images, then the program returns other images it identifies as visually similar, regardless of its box and folder, allowing the archivist to easily apply the same tags to those visually similar images without having to manually seek them out.

Many other aspects of AI and ML technologies will need to be researched and evaluated before they can be integrated into our digital collections workflows. We will need to evaluate tools and identify the skills that are needed to train staff to perform the work. We will also continue to watch leaders in this space as they dive deep into the world of artificial intelligence for library work.

Recommended resources:
Machine Learning + Libraries: A Report on the State of the Field / Ryan Cordell : https://blogs.loc.gov/thesignal/2020/07/machine-learning-libraries-a-report-on-the-state-of-the-field/
Digital Libraries, Intelligent Data Analytics, and Augmented Description / University Of Nebraska–Lincoln: https://digitalcommons.unl.edu/libraryscience/396/