Data Management

What is the DRS and who is it for?

What is the DRS?

The Digital Repository Service (DRS) is an institutional repository that was designed by the Northeastern University Library to help members of the Northeastern community organize, store, and share the digital materials that are important to their role or responsibilities at the university. This can include scholarly works created by faculty and students; supporting materials used in research; photographs and documents that represent the history of the community; or materials that support the day-to-day operations of the university.

While the DRS itself is a technical system that stores digital files and associated information to help users find what they need, we also consider the DRS to be a service for the university community: library staff are here to help you organize, store, share, and manage the digital materials that have long-lasting value for the university community and beyond.

Result listing in the DRS for a report titled "Exploring the Effectiveness of Bite-Sized Learning for Statistics via TikTok" and includes metadata and an image of the report
Published research from the Northeastern community available in the DRS.

Northeastern is not alone in this endeavor. Repository services are now standard practice for most academic institutions, including Harvard University Library (who also use the name “Digital Repository Service”), Stanford University Library (a leader in technical development for repository systems), Tufts Libraries, and other institutions around the world.

Who uses the DRS?

The DRS has been used by faculty, staff, students, and researchers from all corners of the university community for 10 years. There are too many use cases to mention in one brief blog post, but here are some trends we’ve seen in what users choose to deposit the last few years.

  • Open access copies of research publications, as well as working papers and technical reports
  • Publications and data that supports published research
  • Event recordings, photographs, newspapers, and almost any kind of material you can think of to support the day-to-day operations and activity at the university
  • Student research projects and classwork, like oral histories and research projects. Students are also required to contribute their final version of their thesis or dissertation.
  • Digitized and born-digital records from the Archives and Special Collections, including photographs, documents, and audio and video recordings

These files, and all the other audio, video, document, and photograph files in the DRS, have been viewed or downloaded 11.2 million times since the DRS first launched in 2015. Nearly half of the files in the DRS are made available to the public and are therefore available for the wider world to discover. Materials in the DRS have been cited in reporting by CNN, Pitchfork, WBUR, and Atlas Obscura, among others, and are regularly shared on social media or in Reddit threads. As a result, Northeastern continues to contribute the work produced here to the larger scholarly and cultural record, and to the larger world.

Who supports the DRS?

The day-to-day work managing, maintaining, and supporting users of the service comes from staff in Digital Production Services:

  • Kim Kennedy supervises the digitization of physical materials and processing of born-digital and digitized materials.
  • Drew Facklam and Emily Allen create and maintain the descriptive metadata that helps you find what you need.
  • And all of us in the department, including part-time staff, are responsible for general management of the system, including batch ingesting materials, holding consultations and training sessions, answering questions, and leading conversations about how to improve the system and the service.
Two people stand in front of a presentation with a screenshot of the DRS behind them
Sarah Sweeney and David Cliff, DRS staff, posing in 2015 with the homepage of the recently launched DRS. 

The DRS is also supported by a number of library staff members across the library:

  • David Cliff, Senior Digital Library Developer in Digital Infrastructures, is the DRS’ lead developer and system administrator.
  • Ernesto Valencia and Rob Chavez from the Library Technology Services and Infrastructure departments also provide development support and system administration.
  • Many librarians in the Research and Instruction department do outreach about the service and support faculty as they figure out how to use it in their work.
  • Jen Ferguson from Research Data Services also connects faculty and researchers to the DRS, while also providing data management support for those wishing to use the DRS to store their data.
  • Members of the library administration, including Dan Cohen, Evan Simpson, Tracey Harik, and the recently retired Patrick Yott have contributed their unwavering support and advocacy for developing and maintaining system an service.

We are all here to help you figure out how the DRS may be used to make your work and academic life easier. To dive deeper into what the DRS is and how to use it, visit the DRS subject guide or contact me or my team.

The library is celebrating 10 years of the DRS! Check out A Decade of the Digital Repository Service to read more about the history of the DRS.

A Decade of the Digital Repository Service

Northeastern University Library’s institutional repository, the Digital Repository Service, is celebrating 10 years of caring for the university’s scholarly, archival, and administrative high-value materials. From day one, the mission of the DRS has been to provide a long-term, sustainable home for the born digital and digitized content being produced by members of the Northeastern community.

More than just a technical system, the DRS is a service provided by the library to help solve a common problem for faculty, staff, students, researchers, and project teams: where can I store the digital output from my work? The DRS allows these projects developed at Northeastern to be maintained and shared with a wider audience. In addition to maintaining the DRS system, services provided by DRS staff include running training sessions, answering questions, consulting, and depositing files for users.

Originally developed as a prototype in 2011, the system was created by a library team — three developers, the repository manager, a Northeastern co-op, and a library administrator — with the goal of constructing a completely realized system ready for production. The first version was ready to be used fully by the Northeastern community in June 2015.

The DRS was launched with some rough edges, which were slowly smoothed into the system users are familiar with today. We have received tremendous response from users about the usefulness of the system, as well as thoughtful and constructive feedback about how the system can be improved (e.g. faster page load times, better search functionality, and more control over files, among others).

The DRS homepage displayed on a laptop screen with a hand typing on the computer's keyboard
The DRS, as it appeared in 2015.

We have done our best to grow with the university community as its needs shift by increasing support for datasets, loading large batches of files on behalf of users and project teams, and tripling our original storage capacity, but there is always more to be done to meet the needs of our users.

The shape of the content stored in the DRS has shifted over the years, as well. Initially just for theses and dissertations, university photographs, and archival material, the DRS now fully supports various types of project materials for digital humanities research, datasets for researchers in various disciplines, oral histories, and many others.

Since its launch, DRS content has been viewed, downloaded, or streamed more than 1.1 million times, and we’ve had more than 13,000 members of the Northeastern community sign into the system. The DRS averages approximately 2,000 unique visitors and 4,000 views, downloads, and streams a day.

Screenshot of a DRS display of a research poster titled "Investigating and addressing the needs of research support staff"
The DRS provides a home for and access to research and projects by members of the Northeastern community.

The success of the system can be attributed to the combined efforts of staff in many library departments, including development and system administration from Library Technology Services and Digital Infrastructures; outreach and faculty support from Research and Instruction; data management support from Research Data Services; issue triage and metadata collaboration with Resource and Discovery Services; and continual support and advocacy from library administration. And, of course, Digital Production Services, the department primarily responsible for maintaining the system and supporting the service through digital production, metadata maintenance, and user support.

The DRS is not the first system of its kind supported by the library. It adopted its first repository system in the early 2000s, followed by IRis in 2007. The library’s commitment to maintaining the scholarly output of the university was formed during those early years, a commitment we have refined and strengthened over the more than 20 years of dedicated support for faculty, staff, and students working to help fulfill the university’s mission. It’s been a great pleasure to support the Northeastern community in this way, and we look forward to the next 10 years and beyond.

Scan It Right: Starting Your Own Digitization Project

Whether you are digitizing old family photos or creating a paperless record-keeping system, reformatting analog materials can be a lot of work! Here are some suggestions for what to think about when starting a project.

Documents, Photographs, Flat Art, Slides, and Negatives

Two unidentified women and one man standing in front of a computer.
Computer training course sponsored by New England
Telephone and Telegraph Company. https://repository.library.northeastern.edu/files/neu:126895

Choosing a Scanner
A paper-based document, such as a report, on normal paper can go through a sheetfed scanner.

Photographs, artwork, or material on old or delicate paper should go on a flatbed scanner.

Slides and negatives can go on specialized scanners or on multipurpose scanners.

While all of these material types can be scanned in a home or office, if you are dealing with many items, it can be more efficient to send them to a vendor.

File Type
TIFF is one of the standard file types for scanned images and an excellent choice for saving high-quality images long-term. If you would like to read more about standards for digitization, check out the FADGI Technical Guidelines for Digitizing Cultural Heritage Materials. If you need smaller files, you can use Photoshop to save TIFF images as JPEG files. The Northeastern community has free access to the Adobe Creative Cloud, which includes Photoshop and Acrobat.

PDF is a good file type for documents. Some scanners will let you save automatically to PDF. You could also save the document pages as TIFF files, then use Adobe Acrobat to combine the files into a PDF.

File Naming
Give your files unique and descriptive names and avoid spaces in the names — use underscores, dashes, or camel-case instead. Think about how the file names will sort in Finder or Windows Explorer. Some examples:

  • Faculty_Report_1970_01.pdf
  • ChemBuilding001.tif, ChemBuilding002.tif, etc.
Paul Mahan from the Boys' Clubs of Boston using an enlarger at a photographic laboratory
Paul Mahan from the Boys’ Clubs of Boston
using an enlarger at a photographic laboratory. https://repository.library.northeastern.edu/files/neu:212609

Resolution
Resolution is how many pixels the scanner captures per inch of the original material. This is usually expressed in ppi (pixels per inch) or dpi (dots per inch). A higher dpi will capture more detail but will result in a larger file size.

Based on the FADGI guidelines mentioned above, for text-based materials like journal articles or reports, 300 dpi is sufficient for most uses. For photographs and more image-heavy material, use 400 dpi. For slides or negatives, use around 3000 dpi.

Black and White, Grayscale, or Color
You can base this on the material you are scanning. If the material is entirely black and white or grayscale, then you can scan in black and white or grayscale. If the item has color that you want to capture, then scan in color.

Brightness, Contrast, and Cropping
Most scanners will allow you to adjust brightness and contrast settings. If you are scanning documents, adjust until the text appears solid (not choppy but not too dark or blown out). For images, adjust until the brightness and contrast look true to the original.

Text Searchability
If you are creating a PDF, in most cases it should be text searchable for accessibility. To do this, you need to run OCR (Optical Character Recognition) on the document. For members of the Northeastern community, this is available in Adobe Acrobat.

Audiovisual Material

If you want to reformat A/V material (like VHS and audiocassette tapes) yourself, the following webinars from Community Archiving Workshop provide some guidance on the type of equipment to purchase.

However, it is often easiest to work with a vendor for A/V transfers. These materials can suffer from degradation that makes them challenging to capture. The Association of Moving Image Archivists has a directory of vendors.

In addition, the following guides from the National Archives and Records Administration can help you identify formats in your possession before you talk with a vendor. The first focuses on audio formats, like cassette tapes, and the second focuses on video formats, like VHS tapes.

Storage

For the files you create, make sure you save multiple copies in different geographic locations. For example, you might save one copy on your computer; another in a cloud-based location, like Backblaze or Google Drive; and then the final copy on an external hard drive.

You can also share files with friends and family through shared folders on Google Drive. For A/V materials, you can post unlisted videos on YouTube, so folks can only view them if they have a link.

Have any questions? Feel free to contact the librarians in Digital Productions Service at Library-DPS@northeastern.edu.

New policies will impact research data sharing and scholarly communication

We’re monitoring recent changes to policy and legislation that will likely impact the work of Northeastern University faculty, staff, and student researchers. Read on for a brief overview of three of these impending changes, in order of their expected implementation dates.

NIH (National Institutes of Health) Policy for Data Management and Sharing

What is it? The NIH’s new policy on data management and sharing aims to improve the reproducibility and reliability of NIH-funded work by broadening access to research uploads.

When will the changes take place? January 25, 2023

How might this impact researchers?

  • DMSPs: All NIH proposals will require the submission of a data management and sharing plan (DMSP). Previously, only NIH proposals above a certain funding threshold required a DMSP.
  • Data availability: Research data is expected to be made accessible “as soon as possible, and no later than the time of an associated publication, or the end of the award/support period, whichever comes first.” Further, the new policy strongly encourages the use of established repositories to share data.
  • Costs: Reasonable costs related to data management and sharing may be included in NIH budget requests.

Additional resources:

CHIPS and Science Act

What is it? The CHIPS and Science Act is primarily related to semiconductor manufacturing and the STEM workforce pipeline, but also includes some open science directives.

When will the changes take place? One year following enactment (circa September 2023)

How might this impact researchers? Once the act takes effect, applications for National Science Foundation awards will be required to include machine-readable data management plans (DMPs). We do not anticipate that this will significantly impact researchers, as most DMPs are already machine-readable unless they include tables or charts. This requirement will enable more seamless information sharing between systems used by institutions and funders, ultimately reducing the paperwork burden for researchers.

Additional resource:

White House Office of Science & Technology Memorandum: Ensuring Free, Immediate, and Equitable Access to Federally Funded Research

What is it? OSTP’s new memorandum (aka the Nelson memo) builds upon OSTP’s 2013 Holdren memo. The new memo will make research funded by all U.S. government agencies immediately available to the public. This eliminates the current optional 1-year embargo period and applies to both publications and the data underlying peer-reviewed research. Under the new Nelson memo, the definition of publications is widened beyond articles to also apply to peer-reviewed book chapters and conference proceedings.

When will the changes take place? The Nelson memo will first impact funding agencies, which will be expected to fully implement their public access and data sharing plans by the end of 2025.

How might this impact researchers? Once the memo goes into effect, researchers and members of the public will benefit from broader, more immediate access to federally funded research results. The memo urges the use of persistent identifiers (PIDs) to unambiguously identify authors, affiliations, funders, and more, so this would be a great time to acquire and begin using an ORCID iD if you don’t already have one. The U.S. government has also signaled interest in examining current academic incentive structures to better recognize institutions and researchers for their support of public access to research.

Additional resources:

The library is working with campus partners, including Research Administration and Research Computing, to develop guidance and resources to assist researchers as they navigate these changes. As always, if you need assistance with a data management or data management and sharing plan, or if you’re searching for a secure, permanent home for your research outputs, we’re here to help!