Digital preservation and LOCKSS networks
[Versió catalana][Versión castellana]
Miguel Ángel Márdero Arellano
Coordinator of the Brazilian Network of Digital Preservation Services CARINIANA
Instituto Brasileiro de Informação em Ciência e Tecnologia
The world of digital preservation has various ways of dealing with digital objects. New technologies replace old ones, and we may often feel that we do not need different ways of working with them. However, every time we improve in the processing and storage of our digital collections, the perception is that there is not enough time and money to guarantee their survival. Preservation has different meanings depending on the context in which it develops (MÁRDERO, 2008).
One reason for this perception could be the fact that digital objects have no meaning on their own. They are not static and commodified, and they do not remind us of their value by sitting on a shelf within reach, with their titles in view on their spines or on their covers. The threats that endanger this information are also out of sight and could make these digital objects unrecoverable.
A comprehensive means for preserving and safeguarding digital objects and making them accessible in the future, particularly objects that contain the record of advances in human knowledge, is an essential basis for human progress. Digital preservation can never be a problem that has been resolved. It is a task that does not end and becomes more difficult over time as formats, software and hardware fade from memory and creators and curators face new challenges.
Distributed digital preservation in practice
In general, digital preservation is still perceived as a complex, costly task that requires years of planning and considerable injections of money and other resources (MARCONI; SCHAEFER, 2016). This preservation requires active management to guarantee that the content and the data are and remain in a good state. Like digital content, back-up copies and stored content can deteriorate over time. Therefore, long-term digital preservation is a requirement for all institutions whose business model includes technological dependency for the storage of their digital collections. This implies the existence of systems and processes of archiving with specialised management and costly perpetual operations.
Finding the right balance between guaranteeing the integrity and authenticity of digital objects in the long term and addressing institutions’ concerns about personal information and security is one of the hardest issues faced by digital preservation networks and information professionals in general.
Since the start of the twenty-first century, the LOCKSS program (Lots Of Copies Keep Stuff Safe) at Stanford University has helped communities to construct and preserve their digital collections. This is an essential step to guarantee access over time (REICH, 2014). The initiative demonstrates that it is possible to construct digital preservation networks that help a range of institutions from large universities to research centres and public libraries.
Among the advantages attributed to these distributed networks are the support of similar institutions, an increase in the level of awareness on the preservation of digital assets, and an increase in the knowledge base required to maintain programs that include mechanisms to verify the integrity of collections. Digital preservation networks also offer excellent opportunities for international collaboration. The geographic separation of LOCKSS nodes is one of the main characteristics. The more distance between the various LOCKSS copies, the greater capacity for survival the network will have.
The general principle is the same for each network preservation service. There are various preservation nodes that execute LOCKSS software. Each one of the nodes uses a web crawler to obtain a copy of the target content and then participate in the LOCKSS protocol of polling and voting to establish consensus, so that when a node detects damage in its preserved copy, it obtains repairs from the original source or from a tested peer.
What varies greatly from one LOCKSS network to another are the issues of governance and policy: the beneficiaries of the preservation network, the content that is within its scope, who operates the service, who has access to the preserved content and under what circumstances, who provides financial support to cover the types of costs, which services are provided beyond preservation (for example, bibliographic metadata, usage reports, etc.).
The difference between LOCKSS and most of the other preservation systems does not lie in the type of techniques that are used but in when they are used. For the sake of economy, the LOCKSS system only stores the original bits and delays all operations on them, except the verification of integrity, for as long as possible. Some systems migrate formats preventively “en masse”, even when they are not obsolete, to formats that are assumed to be even less obsolete, which consumes processing resources, and they store both the original and the migrated copies, which consumes storage resources. In contrast, LOCKSS only migrates formats of individual files, and only when a reading request indicates that migration of this file is necessary. The migrated version is discarded when it is no longer needed, to save storage. This capacity has not been used in practice as the formats of the contents preserved in the LOCKSS system have still not become obsolete.
The more content that is replicated, the greater the opportunities for preservation in the long term. The collaborative approach to address the economic, legal, technical and social challenges of constructing and preserving collections of digital content is practical and attainable. The LOCKSS Program enables communities to preserve their important digital assets in their own country.
The LOCKSS community
Over time, the LOCKSS software has evolved (CATALDO, 2016) from a set of functions that are necessary to implement only the requirements of its global network, to a software solution with various functions that can be combined and configured to implement the requirements of various preservation initiatives. All of this variability is contained in the LOCKSS software that everyone uses. Together, these networks, and the libraries and publishers that support them, are contributing to the sustainability of the LOCKSS software and the content preserved in it. Neither the software nor the preservation services are independent of each other, and although each part is important, the collective is more powerful than the sum of the parts. It is the only network of property preservation and community control services, and collectively more digital content is preserved in a way that is much more professional and sustainable than is recognised currently.
The preservation services and this community of practice around the LOCKSS technology are really important. Many libraries, museums and archives do not have the experience or the internal tools for digital preservation. The community means that experts from any region are available for everyone. LOCKSS is an independent digital preservation system, so that institutions can continue to use any content management system, digital asset management system, repository or other software that they already apply.
During 2023, the new version 2.0 of the LOCKSS software will be launched. Version 2.0 will provide the same safe preservation environment, but it has been designed to better support the practices of continuous publication, to be interoperable with a series of mechanisms of input and storage systems, and to preserve more interactive resources. Like all digital preservation networks, will all these factors there are execution challenges that include ensuring the active participation of members; maintaining the institutional commitment; identifying and retaining employees with the required technical knowledge; handling problems of scalability (for example, the need to rapidly add storage capacity when necessary); and keeping the costs as low as possible to maintain existing members and attract new ones, at the same time as creating a strategic portfolio of requests that can be used for new hardware and/or storage capacity.
The Cariniana Network: a community of practice
In Brazil, LOCKSS software continues to serve to establish the architecture of collaborative subnetworks for the processing and preservation of digital technical and scientific publications. Recognition of the importance of the participation of the LOCKSS Alliance is based on the assumption that it was necessary to develop national collaborative networks that ensure not only migrations, but also the context, structure and accessibility of digital documents produced in the country, which contributes to safeguarding national heritage. The acceptance of a practical solution for digital preservation such as LOCKSS, which has been adopted by research institutions in Europe, Asia and North America, means being in line with other initiatives that have demonstrated the scientific value of the system as an international standard for digital preservation (MÁRDERO, 2012).
As an institutional practice, there is still relatively limited experience of digital preservation projects in Brazil. The expansion of procedures and institutionalisation of policies in this field involves organising networks of archives, or libraries, that contribute to disseminating technologies and support services, promote the exchange of collections, and expand the opportunities for exchange between researchers who work on the same topics.
Digital preservation activities should be considered in the form in which they are conceived and in their performance as an active proposal within the curation and management of digital collections. They have specific profiles, depending on the needs of the community of users and the local situation. A clear interpretation of digital preservation promotes its adoption in organisational culture, with management activities and strategic planning.
Research to develop digital preservation networks is relevant, due to the amount of information that is produced in the scientific and academic fields in the digital domain. The criteria for validating this information require guarantees of long-term preservation, which include the authenticity and reliability of the digital objects and the identification records. This is in contrast to the immeasurable volume of information, reproduction and sharing in unreliable digital media and the condition of technological obsolescence to which digital objects and their support services are subjected.
CATALDO, Tobin. “A New Approach to Configuration Management for Private LOCKSS Networks”, D-Lib Magazine 22, no. 3/4 (March/April 2016). <https://doi.org/10.1045/march2016-cataldo>.
MARCONI, Tim; SCHAEFER, Sibyl. “Appendix B: Case Studies (University of California at San Diego Library and Chronopolis)”, in Digital Preservation Essentials, ed. Christopher Prom (Chicago: Society of American Archivists, 2016), 109.
MÁRDERO ARELLANO, M. A. Cariniana: uma rede nacional de preservação digital. Ciência da Informação, Brasília, v. 41, n. 1, p. 83-91, January/April 2012. Available at: <http://revista.ibict.br/ciinf/article/view/1354>.
MÁRDERO ARELLANO, M. A. Critérios para a preservação digital da informação científica [thesis]. Brasília: Universidade de Brasília; 2008. Available at: <https://repositorio.unb.br/bitstream/10482/1518/1/2008_
REICH, V. A. (2014). LOCKSS: ensuring access through time. Ciência da Informação, 41(1). <https://doi.org/10.18225/ci.inf.v41i1.1353>