Archives provide an indispensable component of the digital ecosystem by safeguarding information and enabling access to it. Harmonisation of currently fragmented archival approaches is required to provide the economies of scale necessary for general adoption of end-to-end solutions. There is a critical need for an overarching methodology addressing business and operational issues, and technical solutions for ingest, preservation and re-use.
In co-operation with commercial systems providers, E-ARK created and piloted a pan-European methodology for electronic document archiving, synthesising existing national and international best practices, that will keep records and databases authentic and usable over time.
The methodology was implemented in an open pilot in various national contexts, using existing, near-to-market tools, and services developed by the partners. This will allow memory institutions and their clients (public- and private-sector) to assess, in an operational context, the suitability of those state-of-the-art technologies.
Our objective has been to provide a single, scalable, robust approach capable of meeting the needs of diverse organisations, public and private, large and small, and able to support complex data types. E-ARK has demonstrated the potential benefits for public administrations, public agencies, public services, citizens and business by providing simple, efficient access to the workflows for the three main activities of an archive - acquiring, preserving and enabling re-use of information.
The practices developed within the project will reduce the risk of information loss due to unsuitable approaches to keeping and archiving of records. The project has been public facing, providing a fully operational archival service, and access to information for its users. The project results were generic and scalable in order to build an archival infrastructure across the EU and in environments where different legal systems and records management traditions apply. E-ARK has provided new types of access for business users.
E-ARK piloted an end-to-end OAIS-compliant e-archival service covering ingest, vendor-neutral archiving, and reuse of structured and unstructured data, thus covering both databases and records, addressing the needs of data subjects, owners and users. The pilot and methodology also focused on the essential pre-ingest phase of data export and normalisation in source systems. The pilot integrated tools currently in use in partner organisations, and provided a framework for providers of these and similar tools ensuring compatibility and interoperability. A core component of the project was the integration platform which uses the existing ESSArch Preservation Platform (EPP) application as an Archival Information System, which was already in productive deployment at the National Archives of Norway and Sweden. In order to achieve scalability, E-ARK adopted a data management and storage layer for this tool on top of the proven open-source Cloudera CDH4 distribution of Apache Hadoop, enabling storage and computational power to be seamlessly added to the system.
The pilot ran in several national archives, each of which provided data to run in the pilot instance by agreement from an associated government data owner (e.g. national or regional / federal).
The project outputs will be sustained over the long term by project partner The DLM Forum, comprising 22 national archives and associated commercial and technical providers. Using the open Apache licensing model, commercial suppliers will be able to incorporate the project outputs (particularly the open interfaces for pre-ingest, ingest, archival, access and re-use) into their own systems, enhancing their longevity. National archives running E-ARK pilot instances will serve as exemplars for others wanting to adopt up the new e-archiving open system.
In addition, project partner, The Digital Preservation Coalition will promote best practices in this area, as will our dedicated government institution partners.