Preserving Manifold Editions: Topics of Consideration
Digital preservation is a cornerstone issue for the Manifold team to address during our second phase. In creating a platform for publishers to collect and disseminate singular works that contribute to the scholarly record, it is imperative that we provide a means to preserve those works against the rigors of time and technological obsolescence. Publishers, authors, and the communities they serve need to be confident the materials they are publishing on Manifold will not only be discoverable, but also reliably accessible and useable for future engagement.
Challenges
Manifold and other nascent publishing platforms enable scholars to explore multimodal forms of expression.1 Presently, however, there are no established standards for the preservation of long-form scholarship made up of collections of interrelated multimedia materials or materials in various states of completeness, which might also be entwined with conversations occurring in social media and/or the platform's own native annotation and commenting system.
The Manifold team is composed of a publisher, a digital humanities center, and a development agency—our missions haven't historically centered around preservation. To engage with the question of preservation earnestly, we first need to better understand the ecosystem, expectations, and available options. A good part of the work we are presently undertaking is in educating ourselves, through independent research and in talking with librarians and other experts to help us craft a course that meets the needs of our users today and the needs of users years hence. Those needs inform two very simple questions that are foundational to our endeavor: What are we preserving and how are we preserving it?
For multimodal projects, the question of what may be simple, but it's certainly not easy, and the implications of exploring it are, well, manifold. For instance, do we want to be opinionated about preserving works in progress versus a peer-reviewed version of record? What constitutes the version of record when there isn't a corresponding print edition? Does that answer affect the mechanisms of preservation we're enabling, or is that a question best reserved for individual publishers? What about comments authors and readers have made on the system? Should we aim to make it possible to preserve all of them? Just a subset? The public ones, or perhaps a select, curated collection? What about those comments that are part of classroom work? How do we signal such intent to those making the comments, and does that alter their possible engagement? What about interactions brought in through social media? And then there are ancillary resources. Do we simply preserve the source files and their metadata, or do we work to make them relational to their placement/function in and around the text they are supporting? What about resources that the platform isn't hosting—do we want to only preserve their metadata or can/should we go further? How does all of this work if a publisher has enabled a paywall (a forthcoming Manifold feature)?
The how of this equation is a bit more straightforward. Generally speaking there are a three approaches, beyond redundancy, to preserving digital materials: bitwise, migration, and emulation.
Bitwise preservation takes the ones and zeros that make up the materials in the computer system and exports them out exactly as is from start to finish, in order. This captures the content accurately, but it may or may not be usable for researchers now, depending on the capabilities of the holding agency, and there's no guarantee it will be usable in the future if the required software to engage with the material is no longer supported.
A migratory approach is transformative, converting the content from one format to another (e.g., from a Word document to XML). This approach has multiple benefits: it ensures that the essential elements of the scholarship is preserved (hopefully in an open and extensible format as a guard against obsolescence) while also remaining accessible and usable. This is a targeted strategy that looks at the content itself independent of any software required to run it. It also necessitates intermittent review for possible further transformations. While it would be ideal to retain the exact look and feel of the original expression, the focus here is on the content itself versus any specific rendering of it.
The last approach, emulation, looks to preserve both the content and the software used to run it as a means to exactly recreate the original experience. This path requires the most intensive amount of upkeep to maintain fidelity with the original while maintaining code that will at some point fall out of favor.2
It is worth nothing that none of these approaches are mutually exclusive. Indeed a good preservation strategy is likely to involve more than one of these paths. And which path, or combination of paths chosen, will help determine the underlying mechanics of how we enable the preservation of Manifold content. Our current thinking favors the migratory approach and its focus on maintaining the material in a usable and extensible format that can evolve with new standards while maintaining fidelity to the original scholarship. It's possible our solution will involve a strategy to export projects as EPUBs or, maybe, by exporting all the source text and metadata in JSON and packaging it with the supporting ancillary resources for keeping in third-party preservation agencies/systems or institutional repositories (IRs).3 It may mean something somewhat different yet as different kinds of resources have different needs (e.g., independent apps that are made part of a project as a resource may require a more emulative approach).
As we navigate these questions, we're also going to need to appreciate nuanced legal and social concerns. Beyond the copyright, licensing, and permission obligations authors and publishers must work through, there are also ethical and moral issues involved in the collection, publication, and storage of source materials of which we must be cognizant and respectful. In the same way we must also be respectful of those users who engage with and possibly contribute to scholarship on Manifold in a material way through their annotations and comments.
Progress So Far
While we're still early in phase 2, preservation is an issue we've already begun exploring with zeal. Informally, this has been a topic of interest and rich discussion with our collaborators at the University of Michigan and New York University. In September we'll be workshopping this issue more formally with those groups in Ann Arbor, by generous invitation from Charles Watkinson, alongside fellow stakeholders from Stanford University Press, librarians from both the University of Minnesota and the University of Michigan, and representatives from Portico, CLOCKSS, and HathiTrust.
This past spring we not only had the privilege to take part in the Library Publishing Forum's preconference, we also benefited from the various sessions that spoke to preservation and from all the perspectives and expertise of the many thought leaders who were present that week. Cliff Lynch's closing comments to the preconference, especially, serve as a charge and reminder of the serious import of creating the means to preserve those materials our platform enables.
As Jojo noted in her recent post about our Manifold team meeting in Minneapolis, we've also met with members of the University of Minnesota Library specializing in digital preservation. Being able to present Manifold in person to library staff at a major research institution and solicit their feedback and suggestions was a fantastic opportunity for us generally, and particularly invaluable as we continue to think through how best for Manifold to function in this space. Meeting with library staff who are expert in digital preservation is something we made a cornerstone issue as part of our phase 2 roadmad, and in the coming months we will also be meeting in person with library personnel at both CUNY and the University of Oregon to both inform future development plans as well as review the progress we've made.
Next Steps
While we are still in an information gathering mode, we also now have enough background to begin building a story around this issue on our GitHub repository that Zach and his team at Cast Iron can use to start developing the tools and means to export content from Manifold. That story is likely to go live on GitHub in mid-September, and we'll announce when it does via social media and our newsletter so those who are interested can review it and make recommendations.
In conjunction with our upcoming meetings, we'll be in conversation with the major preservation agencies to not only help inform our platform's preservation methods but also hopefully forge technical connections with them. Thus if publisher has a relationship with an agency, they can easily prepare an export of their content that the agency would be able to use.
When these features are more realized and the clouds of construction clear, we will be crafting a template preservation statement that publishers can use to clearly outline to their authors and readers what materials they are preserving, what materials aren't being preserved, what agencies are involved and their own level of commitment, and the means and strategies by which the preservation process is working.4 This template will be constructed so as to be easily customizable based on the choices within Manifold a publisher has made.
* * *
For publishers who have never before extended themselves into this arena, it can seem overwhelming at first. Our experience thus far though has made it clear to us that the preservation of digital scholarship is equally as important as publishing it. We are committed to crafting the means for you to do so confidently and in ways that leverage your existing processes and relationships.
If you have input on this matter you'd like us to consider or are actively working on issues of digital preservation and have a perspective you'd like to share, we invite you to do so in the comments below or on our Slack channel.
—Terence Smyre, Manifold Digital Projects Editor
Footnotes
-
Other platforms exploring this new publishing space include Scalar, Michigan's Fulcrum, Getty's Quire, and Stanford University Press's digital publishing initiative. ↩
-
The Smithsonian Institution Archives provides a nice summary of the different preservation strategies, as does the University of Michigan Library's Digital Preservation Unit. ↩
-
Our aim is to make it possible for publishers to easily integrate with those preservations systems with whom they may already have an established relationship—Fulcrum, Portico, LOCKSS, CLOCKSS, HathiTrust—while still providing a means for those who don't have such relationships to output their content in sensible ways for their IR. ↩
-
Much of our thinking around a preservation statement comes from Jeremy Morse at the University of Michigan as well as other participants at the 2018 Library Publishing Forum in Minneapolis. For examples of such statements, see the University of Illinois Urbana–Champaign's Publishing Without Walls website or HathiTrust's digital preservation policy. ↩