Mellon Foundation Awards $350,000 Grant to Fund Plan for AI Book Training Commons

The Northeastern University Library and Authors Alliance have received a $350,000 grant from the Mellon Foundation to plan a public-interest book training commons for artificial intelligence.

With the increasingly essential role that artificial intelligence plays in society, the importance of including information from books in AI’s large language models becomes more pivotal. The more than 129 million books written over the past 500 years are vital training data sources for AI, providing a quality, breadth, and diversity of content related to human thinking that can strengthen AI’s scope and accuracy.

The main goal of this project is to develop a plan for either establishing a new organization or identifying the relevant criteria for an existing organization (or organizations) to attempt the work of creating and stewarding a large-scale public interest training commons of books.

“We are grateful for the Mellon Foundation’s generous support for this important new project,” said Dean of the Library Dan Cohen. “We and the Authors Alliance are excited to convene writers, publishers, Librarians, technologists, and other stakeholders to explore the best way for books to be incorporated in an ethical and productive way that serves the public.”

Northeastern University Library’s role will be to support and coordinate the project and to host one of the two meetings of stakeholders.

With this work, planners hope to answer key questions, including:

  • What are the right goals and mission for such an effort, taking into account both the long and short-term;
  • What are the technical and logistical challenges that might differ from existing library-led efforts to provide access to collections as data;
  • How to develop a sufficiently large and diverse corpus to offer a reasonable alternative to existing sources;
  • What a public-interest governance structure should look like that takes into account the particular challenges of AI development;
  • How do we, as a collective of stakeholders from authors and publishers to students, scholars, and libraries, sustainably fund such a commons, including a model for long-term sustainability for maintenance, transformation, and growth of the corpus over time;
  • Which combination of legal pathways is acceptable to ensure books are lawfully acquired in a way that minimizes legal challenges;
  • How to respect the interests of authors and rightsholders by accounting for concerns about consent, credit, and compensation; and
  • How to distinguish between the different needs and responsibilities of nonprofit researchers, small market entrants, and large commercial actors.

The Authors Alliance is an organization focused on creating resources and opportunities for authors interested in sharing their work broadly, in the interest of the public good.