The Northeastern University Archives and Special Collections is fortunate to have the records of the Boston Gay Men’s Chorus (BGMC), founded in 1982. BGMC is a 200-voice community ensemble that sings popular and classical music and works to “inspire change, build community, and celebrate difference.”
Some chorus recordings are already available in Northeastern’s Digital Repository Service (DRS) but there are many more in the Archives that haven’t been digitized yet. Recently, members of BGMC working on a documentary requested the digitization of recordings on 1/4″ reel-to-reel tape and Digital Audio Tape (DAT) from the 1980s and 1990s. These recordings included holiday performances, Pride concerts, and a collaboration with the Connecticut Gay Men’s Chorus.
We’re always happy to help make the collections accessible, but the digitization of older audiovisual formats presents challenges. DAT cassettes were released in 1987 and used throughout the 1990s. They encode digital information onto magnetic media and allow for high quality recordings. However, Sony stopped producing DAT cassette decks in 2005 and few people know how to maintain the equipment needed to digitize them. In addition, use of an out-of-repair machine might damage the tape. You can read more about the preservation issues with DAT in archival collections here and here. Luckily, we were able to work with National Boston to digitize these DATs with no issues.
The reel-to-reel or open reel format using magnetic tapes was popular from the 1940s through the 1980s. We also sent our reel-to-reel tapes to National Boston but due to the age and condition of the materials, an extra step was required. Many of the tapes had sticky shed syndrome. This preservation issue is common and affects magnetic media. The tape has three layers: the magnetic portion which contains the information; the base layer; and the binding agent. Sticky shed syndrome causes the binder to degrade, leading the tape to shed bits of itself while being played. Since this causes irreversible loss of information, tapes with sticky shed should be baked before playback. This involves putting them in an oven at a low heat to rebind the layers. You can read more about baking tapes at the Library of Congress here.
Luckily, these gorgeous vocal performances are now preserved in our repository and available here. Thanks to my colleagues in the Archives, especially Molly Brown, and to my colleagues in Digital Metadata, especially Anna Ryerson, for their work coordinating the request and cataloging the recordings.
Looking for a primary source for an essay or digital project? Do you want to know more about, say, the Montgomery Bus Boycott from someone who lived through it? Or are you just bored and looking for something educational to watch? Well, dear reader, have I got the archive for you.
I’d like to present to you the HistoryMakers Digital Archive, a video collection of oral history interviews that is available to all Northeastern students, faculty, and staff. With a focus on African-American history, the Digital Archive is a resource that can be both useful and fascinating to everyone in academia, even if they’re not studying history.
So, what is oral history? It’s certainly not the history of public speaking or how humans dealt with cavities, nor is it simply anecdotes passed by word of mouth. The Oral History Association defines it as “a field of study and a method of gathering, preserving and interpreting the voices and memories of people, communities, and participants in past events.” Besides being the oldest form of history-gathering, oral history holds special significance to African Americans and other groups of the African diaspora: Not only do many African peoples have long, storied traditions—perhaps most famously, that of the West African griot—that venerate the keepers of oral history as professionals who are just as vital to the community as the soldier or the healer. Further, due to historical laws that either made it illegal or difficult for African Americans to be taught how to read and write, oral history has been one of the crucial ways that we can learn about certain events and periods. For example, during the Great Depression, the U.S. government commissioned a collection of oral history interviews from formerly enslaved people across 17 states. The collection of transcribed interviews, which is available online, is an incredibly valuable resource in broadening your understanding of the experiences of Black people during slavery.
The HistoryMakers Digital Archive follows in this honorable tradition. It compiles oral history interviews with nearly 2,700 historically significant Americans of African descent, designated as “HistoryMakers.” They’re significant for a variety of reasons, but all have made some notable contribution to the fields of medicine, art, music, politics, technology, science, literature, journalism, and more.
The archive includes interviews that provide insight into the lives and deeds of some of the most well-known people in the world—John Lewis, Whoopi Goldberg, Angela Davis, Harry Belafonte, Barack Obama—as well as many other fascinating folks worth learning about who you might not have known about. For instance, there’s Elma Lewis, a Roxbury native who founded her own art school here in Boston. There’s Ed Bullins, a noted playwright and former professor at Northeastern. And then there’s Sylvester Monroe, a journalist who recounts the perils he faced while covering the desegregation of schools in Boston. Heck, I even found an interview from William Ward, the former mayor of my hometown of Chesapeake, Virginia. And that’s just scratching the surface. You can watch interviews from literally thousands of HistoryMakers, each of which offer their own take on their fields, their lives, and the historical events that shaped them.
Part of the beauty of the Digital Archive is how simple it is to use: after spending just a handful of minutes on the website, you’ll more than likely get the hang of it. But if you’d like a step-by-step guide on how to access, navigate, and utilize it, I’ve created a LibGuide that will hopefully be helpful.
Northeastern University Library’s procedure for digitizing physical materials utilizes a few different workflows for processing print documents, photographs, and analog audio and video recordings. Each step in the digitization workflow, from collection review to scanning to metadata description, is performed with thorough attention to detail, and it can take years to completely process a collection. For example, the approximately 1.6 million photographs in The Boston Globe Library collection held by the Northeastern University Archives and Special Collections may take several decades to complete!
What if some of these steps could be improved by using artificial intelligence technologies to complete portions of the work, freeing staff to focus more effort on the workflow elements that require human attention? Read on for a very brief overview of artificial intelligence and three potential options for processing The Boston Globe Library collection and other digital collections held by the Library.
What is artificial intelligence and machine learning? Artificial intelligence (AI) is a broad term used for many different technologies that attempt to emulate human reasoning in some way. Machine learning (ML) is a subset of AI where a program is taught how to learn and reason on its own. The program learns by using an algorithm to process existing data and find patterns. Every pattern prediction is evaluated and scored according to how accurate the prediction may or may not be until the predictions reach an acceptable level of accuracy.
ML may be supervised or unsupervised, depending on the type of result needed. Supervised learning is when instructions are provided to assist the algorithm to learn how to identify patterns expected to the researcher. Unsupervised learning is when the algorithm is fed data and discovers its own patterns that may be unknown to the researcher.
Ethics As we undertake this work, it is important to be aware that AI technologies are human-made and therefore human biases are embedded directly within the technology itself. Because AI technologies can be employed at such a large scale, the potential for negative impact caused by these biases is greater than with tools that require standard human effort. Although it is tempting to adopt and employ a useful technology as quickly as possible, this is an area of research where it is imperative that we make sure the work aligns with our institutional ethics and privacy practices before it is implemented.
What AI or ML techniques could be used to help process digital collections? OCR: The most widely known and used form of AI in digital collections practices may be recognition of printed text using Optical Character Recognition, or OCR. OCR is the process of analyzing printed text and extracting the text objects, like letters, words, sentences. The results may be embedded directly in the file, like a PDF with OCR’d text, or stored separately, like in a METS-ALTO file, or both.
OCR works rather well for modern text documents, especially those in English, but a particular challenge for OCR is historical documents. For more about this challenge, I recommend A Research Agenda for Historical and Multilingual OCR, a fairly recent report published by NULab.
We can already see the benefit of using OCR in the library’s Digital Repository Service, as files with OCR text embedded in the file have the full text extracted and stored alongside the text file. That text is indexed and improves discoverability of text files by retrieving files that match search terms in the file’s metadata or the full text.
HTR: Handwritten Text Recognition, or HTR, is like OCR, but for handwritten, not typewritten, text. Handwriting is very unique to an individual and poses a difficult challenge for teaching machines to interpret it. HTR relies heavily on having lots of data to train a model (in this case, lots of digitized images of handwriting), so even once a model is accurately trained on one set of handwriting, it may not be useful for accurately interpreting another set. Transkribus is a project attempting to navigate this challenge by creating training sets for batches of handwriting data. Researchers submit at least 100 transcribed images for a particular handwriting set to Transkribus and Transkribus uses that set as training data to create an HTR model to process the remaining corpus of handwritten text. HTR is appealing for the Boston Globe collection, as the backs of the photographs contain handwritten text describing the image, including the photographer name, date the photograph was taken, classification information, and perhaps a description or an address.
Computer Vision: Computer vision refers to AI technologies that allow machines to work with images and video, essentially training a machine to “see”. This type of AI is particularly challenging because it requires the machine to learn how to observe and analyze a picture and understand the content. Algorithms for computer vision are trained to identify patterns of different objects or people and attempt to accurately sort and identify the patterns. In a picture of the Northeastern campus, for example, a computer vision algorithm may be able to identify building objects or people objects or tree objects.
When used in digital collections workflows, the output produced by computer vision tools will need to be evaluated for its usefulness and accuracy. In the above example, the terms returned to describe the image are technically present in the photo (the subjects are wearing shoes and hats and overcoats), but the terms do not adequately capture the spirit of the image (a person being detained at a demonstration).
There are a lot of ethical concerns about using computer vision, especially for recognizing faces and assigning emotions. If we were to employ this particular technology, it may be able to generate keywords or other descriptive metadata for the Boston Globe collection that may not be present on the back of an image, but we would need to be careful to make sure that the process does not embed problematic assessments into the description, like describing an image of a protest as a riot.
Computer vision is already being employed in some digital collection workflows. Carnegie Mellon University Libraries has developed an internal tool called CAMPI to help archivists enhance metadata. An archivist uses the software to tag selected images, then the program returns other images it identifies as visually similar, regardless of its box and folder, allowing the archivist to easily apply the same tags to those visually similar images without having to manually seek them out.
Many other aspects of AI and ML technologies will need to be researched and evaluated before they can be integrated into our digital collections workflows. We will need to evaluate tools and identify the skills that are needed to train staff to perform the work. We will also continue to watch leaders in this space as they dive deep into the world of artificial intelligence for library work.
Introduction Part of the digitization process includes the creation of metadata for each record so that people can find an individual item with the sea of documents. Metadata is the identifying information of a record, such as its title, author, creation date, and other components.
Recently, archivists have placed greater emphasis on the subject heading aspect of cataloging records.1 Archivists now recognize that the creation of subjects and descriptions as access points to a record is an inherently biased activity that can influence how one approaches and perceives the record itself and the topics it contains. While these access points are extremely helpful in improving search results, these pathways are created by archivists, i.e. people. Since archivists create metadata, the data reflects our perspectives, thereby making it imperative that we be mindfully aware of our unconscious biases. We must do the necessary self-evaluative work about ourselves, the power dynamics in which we function, and the multifarious impacts of our decisions on various groups.
Records are created within certain settings for certain purposes—whether political or social—and an archivist inserts the meta-narrative layer of collecting and making accessible those records. There is power in that process and traditionally the process has privileged dominant social systems, which then reinforces social inequities. The myth of neutrality in subject cataloging has led to subject headings that can reinforce biases, stereotypes, and offensive representations, as well as misrepresent and alienate marginalized communities. For instance, a reclassification project at GBH recognized the negative false equivalence of police only interacting with criminals in their legacy subject term “Law Enforcement & Crimes,” which they have changed to “Legal System.”2
Recently, many archivists have risen to the challenge of acknowledging the persistency of power dynamics and are actively seeking to infuse their metadata creation with inclusion, diversity, and social justice practices. I myself have recently undertaken the ethical reasoning behind the use of certain subject headings to achieve descriptions that not only increase searchability and accuracy but also are respectful and empowering to subjects previously ignored. It is my hope that by developing cultural competency, the records will be more accessible to the communities reflected in their content, which may be one small step towards actively dismantling oppressive systems.
The Collection and Daniela Saunders As I digitized the Freedom House Inc. Records, I stumbled upon an eye-opening folder about the Police-Community Relations Committee. The records from this folder of items from 1960 to 1966 document a growing awareness in Roxbury of police-community relation issues. At the time, there were community memories of problems and instances a decade prior. Back in 1952, the murder of Rabbi Zuber sparked meetings calling for community action. However, the initial uproar dwindled and while close relations and neighbors continued to fight for change, it was a small endeavor.
Some larger efforts did persist, including a Police-Community Relations Institute Conference held in 1960 that connected with religious organizations to discuss the relations between mass media, social work agencies, the judicial court system, civil rights, legislation, and the police. However, the improvements called for in the decade of discussions did not become sweeping real-world improvements. As a result, over the course of a year between the summers of 1962 and 1963, there were a number of stranglings of women in the greater Boston area.3
On January 5, 1963, 16-year-old Daniela Saunders was murdered in an alleyway between Warren Street and Elm Hill Park, just a few blocks from her home. The next day, 500 members of her community met with Otto P. Snowden and Freedom House to discuss what underlying social problems led to the tragedy. Initiated by a small group of mothers voicing the need to prevent such violence, the meeting expanded to the 500-person turnout. Many individuals voiced their perspectives on the issue:
Dewey Duckett outlined the general disinterest of the Boston Police Department Division 9 towards the community it was supposed to protect. He talked about how “the local police had clearly evidenced an incapacity to understand or respect either the local citizens themselves or their simple desire for minimal adequate protection.”4
Attorney Benjamin Johnson called for the creation of a 100-person auxiliary police of community members.
Mrs. Leona Tynes cited the practical issue of poor lighting facilities.
Mrs. Oswald Jordan recalled the aftermath of Rabbi Zuber’s murder and described the emotional toll of these types of meetings over the last decade since they had not led to any actual change.
At the end of the meeting, the goal was set to create a committee to meet with city officials, namely Commissioner Edmund L. McNamara, Captain Paul Sullivan, and Sergeant Kelly of Division 9. The other four main suggestions were to add foot patrolmen; ensure that police answered complaints with courtesy instead of their current lack of sensitivity; increase the effort to improve problem areas; and fire police that demonstrated bias towards the black community.
Another meeting held January 8, 1962, at the Jeremiah E. Burke School further expanded the four main issues. About 1,500 citizens gathered to demand change. Kenneth Guscott, representing the NAACP, called for a Villante Committee similar to what the Peace Corps created in Harlem. Police Commissioner McNamara personally attended this meeting, although he was met with objections when he attempted to downplay his former neglect by referring to his personal connection with a black member of the police force.
The various efforts aimed to “promote a better understanding between the protected and the protector.”5 The end goal was a positive coordinated action program formulated and carried out by neighborhood associations in affiliation with the local police. Along with Mayor John F. Collins and Commissioner McNamara’s immediate pledges to increase training in criminal investigation and compulsory attendance of courses at Northwest University and the FBI National Academy, the events led to long-term communication between the Roxbury community, city officials, and the police. The Freedom House Inc. Records reflect and display these sustained efforts.
Daniela Saunders’ Impact The events of Daniela Saunders’ murder and the aftermath from Roxbury’s community response are integral components to the larger historic narrative of the police-community relations documented in the Freedom House Inc. Records. Her story may be limited to a folder in this vast collection but her impact disseminates through many boxes. So many activities were initiated by her tragic demise.
However, most metadata elements do not provide space for Daniela. She wasn’t the author or creator of the records, she was not included in the title of the records, and her name was often eliminated in the documents themselves. Within the records of Folder 1015, Daniela was more of a ghost, a whisper, trickled throughout the newspaper articles, letters, meeting minutes, and reports. She may have been the impetus for change, but she didn’t have agency in these metadata components.
Additionally, in the larger historic narrative, Daniela has been forgotten. She is currently not listed as one of the Boston Strangler’s 13 victims despite the connection to the “Phantom Strangler” made in 1963.6
When making the metadata for items in Folder 1015, I wanted to allow Daniela to regain her own agency in being remembered. The power of remembering is enormous—it becomes public memory and informs current events. Therefore, archival records provide an opportunity to bear witness to an event when it has been lost to time. I knew I needed a way to provide a pathway to Daniela and link her to these records. I produced these conditions by making Daniela a Name Subject Heading, a practice that we are not often implementing in the Freedom House Inc. digitization project. Due to the large scope of the collection and the logistical issues of maintaining authorized subject headings over 83 containers, Name Subject Headings for individuals are a rare occurrence.
However, with the addition of this metadata component, Daniela’s story becomes accessible to the public. She is no longer a passive victim, marginalized and obscured, but is now an active agent at the forefront of police-community relations in 1963 Roxbury. People can now find the records related to Daniela and they can situate her contribution within the larger Freedom House and Roxbury narratives.
Additionally, the records can give the public a resource for holding historical agents accountable. The 1960s were fraught with many issues between communities of color and the police nationwide. The events of 1963 in Roxbury become a part of that larger context.
Finally, by recognizing Daniela and the events of 1963, I hope that the records and their metadata have an enduring impact on our current society. Police brutality, racism, abuse, systematic oppression, and unnecessary force are all topics that we see in the news every day. Past calls for better training and systematic changes to the police force are similar to present-day news stories. We are constantly exposed to the reality of this violence and our nation collectively feels an emotional toll possibly similar to the one described by Mrs. Oswald Jordan in January 1963. Maybe these historic records can help inform our present discourse. By knowing what happened in the past, maybe we can make more informed decisions, and ultimately, be the change we strive to see.
1A non-comprehensive list of recent literature includes, Jillian Ewalt, “Toward Inclusive Description: Reparations through Community-Driven Metadata,” NEA Newsletter 46, no. 2 (April 2019): 4-7; Rosale de Mattos, “The Representation of Archival Information in Controlled Vocabularies: The Context of the Archival Institutions in Rio de Janeiro,” Knowledge Organization 47, no. 7 (2019): 548-557; Samuel J. Edge, “A Subject “Queer”-y: A Literature Review on Subject Access to LGBTIQ Materials,” Serials Librarian 75, no. 1-4 (Jul-Dec 2018): 81-90; Gracen Brilmyer, “Archival assemblages: applying disability studies’ political/relational model to archival description,” Archival Science 18, no. 2 (Jun 2018): 95-118. 2Miranda Villesvik and Raananah Sarid-Segal, “Making Metadata Inclusive to Marginalized Voices” (presentation, Archives for a Changing World, NEA Spring Conference, Virtual, March 27, 2021). 3The Boston Strangler continued to murder young women in the Boston area until 1964. For more information, see Ronald Lettieri, “Boston Strangler.” Salem Press Encyclopedia (2019); Jess Bidgood, “50 Years Later, a Break in a Boston Strangler Case,” New York Times, July 11, 2013; Paul Hoblin, Boston Strangler (Unsolved Mysteries). Abdo Publishing, 2012; Susan Kelly, The Boston Stranglers: The Public Conviction of Albert DeSalvo and the True Story of Eleven Shocking Murders. Secaucus, N.J.: Carol Pub. Group, 1995. 4“Report from special community meeting about police issues, Daniela Saunders and Rabbi Zuber murders, and race relations held January 6, 1096.” January 6, 1963. Freedom House Inc. Records (M16). Northeastern University Library. Archives and Special Collections Department. Northeastern University, Boston, Massachusetts. Box 30, Folder 1015. 5“Outline on various phases of police activities.” April 28, 1964. UASC identifier: M16_B030_F1015_005. Freedom House Inc. Series 3: Programs. Sub-Series B: Urban Renewal. Neighborhood Associations. Police-Community Relations Committee, 1960-1966. 6Jack Thomas, “Victims of the Boston Strangler,” The Boston Globe, July 11, 2013. https://www.bostonglobe.com/metro/2013/07/11/victims-boston-strangler/CwbsZlSNcfwmhSetpqNlhL/story.html
Beginning on June 30, the Northeastern University Library is no longer subscribing to the database Nexis Uni, transitioning instead to a pair of databases – Access World News and Westlaw Campus Research – that together provide even more news and law resources through much easier user interfaces.
Why replace Nexis Uni? Over the years, Nexis Uni has been removing much of its content while steadily increasing its prices. That combination, along with a difficult-to-use interface, has led many libraries and institutions to cancel their subscriptions and put money toward more cost-effective and user-friendly databases and resources.
What new databases should I be using instead? For the cost of Nexis Uni, the Library was able to acquire access to two new databases that, together, provide much of the same content in a far easier-to-use format. Access World News Research Collection from Newsbank includes current and archived news content from more than 12,700 sources, spanning over 200 countries and territories and combining all formats (full-text articles, web-only content, and PDF image collections) in a single interface. You can browse Access’ full list of sources here.
For legal and business content, Westlaw Campus Research contains primary and secondary legal sources including statutes, codes, and case law, as well as the American Jurisprudence legal encyclopedia. On the business side, it contains tools like Hoover’s and the Company Investigator, which provides public and private company information and hard-to-find information on small businesses and partnerships. It also can be used to prepare company reports using visual graphics. This reference guide provides detailed information how to use Westlaw.
Other databases also provide useful news resources, including Factiva (which includes access to business news, including the Wall Street Journal and Barron’s); Pressreader (which covers daily news in more than 100 countries); and ProQuest News and Newspapers(which includes current and archival access to newspapers like the New York Times, Wall Street Journal, Washington Post, Boston Globe, Chicago Tribune, Newsday, and Los Angeles Times, as well as more than 80 local and regional titles).
In addition, Northeastern University students, faculty, and staff can access the Wall Street Journal‘s website by using their NU credentials by going to wsj.com/northeastern.