How do I get my data out of my electronic records system (e.g. Sharepoint) and into an archive?
This is a major priority for E-ARK, as it can be a real headache for e.g. government departments to package their data for transfer into an archive in the manner that the archives require. Practical issues include the fact that records systems use their own hierarchical classifications, which do not necessarily match those used by an archive. How can such compatibility problems be overcome? We have studied current best practice, and are now working on data export specifications that data producers can use to get their data out of their source systems and into the archives in an “archive-acceptable” format. These open specifications are based on the MoReq schema, and cater for a wide range of systems, including Electronic Records Management Systems (ERMSs) as well as simple file based systems. Our Work Package 3 (WP3), led by the National Archives of Estonia, is spearheading this work, and they have already produced reports, conference presentations etc. with key practical information.
How do I archive databases?
E-ARK is producing everything you need for each stage of digital archiving, and we are covering database archiving as well as the archiving of digital records. We have studied current best practice, and based on this we have produced draft specifications showing how to put data (including databases and their contents) into an archive, store them there, and then access them later for discovery and re-use. We have been working closely with the Swiss Federal Archives and the Swiss Koordinationsstelle für die dauerhafte Archivierung elektronischer Unterlagen (KOST), and you will find the latest version 2.0 of the pdf Software Independent Archival of Relational Databases (SIARD) format (2.02 MB) on our website for your feedback. Our Work Package 4 (WP4), led by the Austrian Institute of Technology, is spearheading this work, and they have already produced reports, conference presentations etc. with key practical information.
- D4.1 - Report on available formats and restrictions
- D4.2 - E-ARK AIP draft specification
- Presentations from E-ARK/DPC Database Archiving Event
Are there any general models or schemas that show the digital archival processes step by step?
We have produced a comprehensive general model that is fundamental to our entire project: it covers all the tools, processes, workflows, users etc. and specifically includes the pilot implementations (various parts of our final E-ARK system will be piloted by 7 national archives). Our Work Package 2 (WP2), led by the National Archives of Hungary, is spearheading this work, and they have already produced reports, conference presentations etc. with key practical information.
Are there any new ways to discover archival data? Can we do complex searches or just google type searches?
We are looking at new ways of discovery for a wide range of data and many types of users: businesses, researchers, citizens, government departments etc. Whilst sensitive data has to be protected, we are looking for the best tools and techniques for accessing and analyzing any data that is open for discovery. We have studied current best practice in this area, and have used our findings as a basis for our developments which include data mining, Online Analytical processing (OLAP) and other advanced searching techniques. Our Work Package 5 (WP5), led by the Danish National Archives, is spearheading this work, and they have already produced reports, conference presentations etc. with key practical information . Our Work Package 6 (WP6), led by the Austrian Institute of Technology, is also contributing to the advanced searches effort with a report on faceted searches.
- D5.1 - GAP report between requirements for access and current access solutions
- D5.2 - E-ARK DIP draft specification
- D6.1 - Faceted Query Interface and API
What has Big Data got to do with digital archiving? What is Hadoop and can we use it?
Big Data is a broad church, but can be said to involve powerful (fast) analysis of large volumes of varied data to produce valuable new insights with greater accuracy. Big Data has associations with open data, and cloud computing, with an emphasis on large-scale accessibility. The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing. We have developed an integrated system using a Hadoop cluster, running software such as Solr, Hive, Pig, Mamout etc. Big Data also leans heavily on previous architectures such as multi-dimensional databases and data warehousing, and we are using Big Data techniques such as data mining, data warehousing, dimensional modelling and Online Analytical Processing (OLAP) to carry out large-scale analysis of e.g. geographical data (geo-data) using Oracle Warehouse Builder and Oracle OLAP. Our Work Package 6 (WP6), led by the Austrian Institute of Technology, is spearheading the Big Data work, and they have already produced reports, conference presentations etc. with key practical information. Work Package 5 (WP5), led by the Danish National Archives, is using Big Data techniques for discovery, and they have already produced reports, conference presentations etc. with key practical information. Work Package 4 (WP4), led by the Austrian Institute of Technology, is using Big Data techniques such as dimensional modelling to archive databases, and they have already produced reports, conference presentations etc. with key practical information.
Are there any standards to help me archive my data properly?
Producing specifications and schemas forms a vital part of our work, alongside developing open source tools / workflows and frameworks. Standardizing the digital archival process across Europe and beyond should be a real help to institutions large and small, governmental, commercial or academic. We are creating our schemas to be as flexible and useful as possible – with mandatory elements that are essential to comply with best practice, and plenty of flexible options so that institutions / individuals can customize their archives in myriad different ways. Our work is not just for national archives – we do everything with regional and local archives in mind too. We have several reports dealing with standardizing issues.
- D3.2 – E-ARK SIP Draft Specification
- D4.2 - E-ARK AIP draft specification
- D5.2 - E-ARK DIP draft specification
- E-ARK Conference Presentations
How does digital archiving vary from country to country in Europe?
We have a broad range of national archives taking part in E-ARK, with many more countries represented in our Archival Advisory Board. This enables us to take account of many different types of archival practice: some archives currently have no digital archives, some archives deal with everything as a database, some archives deal only with records etc. We covered current archival practice across Europe in our best practice reports in the first year of the project.
- D3.1 – Report on Available Best Practices
- D4.1 - Report on available formats and restrictions
- D5.1 - GAP report between requirements for access and current access solutions
How does the law affect digital archiving in each European country? Are there any EC laws / directives that affect all digital archiving, and what is on the horizon in this respect?
These are vital considerations for E-ARK as each country needs to be able to use our outputs within their own legal framework. For this reason, we have undertaken comprehensive research to determine upcoming legislation that will affect practical digital archiving. We have a pdf dedicated, extensive legal study (1.60 MB) which we will keep updated throughout the project.
Do the archives have any examples or use cases to inspire me?
We are developing pilot cards to show how our archival partners will actually be using E-ARK in their pilot implementations. These cards will highlight the use cases for each national archive, showing why they joined the project and what benefits they expected to gain. We will be publishing the pilot cards on the E-ARK website as they become available, keep an eye on our news feed for more information.
Are there any Open Source digital tools I can use? Can they be integrated? Will they fit with commercial tools/systems and existing Open Source tools/systems? Will they scale?
Our tools and platforms are all designed to be scalable and open source, so they will be suitable for your archiving needs. We have leading open source and proprietary commercial partners both in the project consortium, and on our Commercial/Technical Advisory Board, in order to ensure integration and a good fit with existing archival systems. Our aims with respect to scalability are covered in Work Package 6 (WP6).
Can I use something developed for a national archive in my regional archive/local archive/research data centre?
Yes, this is our plan: our designs are for all archival shapes and sizes. We have representation from national and regional archives, and would also welcome input from local archives/research data centres.
How can I measure how well my organization is performing in terms of digital archiving? Are we beginners, or a bit further along the road?
We have been working on a specialized business maturity model to enable institutions to gauge their progress in this regard. All the information necessary for digital archiving, including vocabulary management, is going into a dedicated, long-term Knowledge Base, to be hosted by the DLM Forum on their website. Our Work Package 7 (WP7), led by the Instituto Superior Técnico, Lisbon, Portugal, is spearheading this work, and they have already produced reports, conference presentations etc. with key practical information.
- D7.1) A Maturity Model for Information Governance – initial version
- D7.2) Initial Assessment and Evaluation
I am unsure about or would like to comment on something that E-ARK is doing. How can I make my thoughts known to them?
Please send us your feedback – email This email address is being protected from spambots. You need JavaScript enabled to view it.
Does it matter which preservation strategy I use with digital archiving? For example, can I use migration or emulation or a hybrid?
E-ARK is preservation-strategy neutral, and we are consciously identifying metadata (data about data) elements catering for both migration and emulation.
Do you have questions for us? If so, please get in touch. You can join our mailing list, and we are looking for more members on our Data Provider Advisory Board; and local archive members for our Archival Advisory Board.