The E-ARK Project, the Digital Preservation Coalition and the UK Data Archive are delighted to annouce that registration is now open for a free workshop on the preservation of relational databases to be held at the University of Portsmouth on 19th and 20th February 2015.
Introduction
Relational database management systems are one of the essential building blocks of information technology. Ubiquitous but often obscured behind layers of scripting, processing, forms and queries, they are arguably the most important invention of the Twentieth Century. It’s hard to think of a software application or service that does not have some fundamental dependency on database technologies.
So it’s surprising that the digital preservation community seems to have spent so little time explicitly considering their preservation. More accurately it’s surprising that there is less awareness and integration between the digital preservation community and the various tools and approaches used in commercial settings to manage the long term accessibility of records in database systems known as ‘data warehousing’. There’s no question that databases present a complex challenge to preservation. They can be difficult to document and difficult to understand even when they are documented. The complex interdependencies of data, query and scripting make migration problematic and highly specialised. Database migration is often seen as a purely technical operation, upgrading one legacy system with another soon-to-be-legacy replacement.
Relational databases and to some extent data warehousing approaches, which favour structure and homogeneity, are sometimes contrasted with ‘big data’ approaches that tend to favour heterogeneity and de-normalisation. It could be suggested that a concern with relational databases is outmoded and that the preservation community could simply adopt big data approaches. But the contrast can be overstated, especially when preservation issues are discussed. In practice ‘big data’ tools seem to offer improved workflows that complement rather than replace existing data warehouse tools. And even if ‘big data’ tools are the solution for access they still need to integrate with fundamental preservation processes and standards.
Better technical guidance and organisational know-how are needed if the digital preservation community is to offer confident and consistent solutions to long-term access for relational databases.
This two-part workshop, made possible by the E-ARK project and sponsored by the DPC, will review the state of the art in the preservation of databases and explore emerging themes around the preservation of ‘big data’.
The workshop will be split over two days:
Day one will:
- Start out by clarifying where databases, data warehouses and big data complement / overlap each other
- Review the state of the art in the preservation of databases
- Present case studies of current tools and practices around the preservation of relational databases
- Introduce commercial approaches to ‘data warehousing’ and explore the relationship with preservation
- Introduce big data approaches for database preservation
Day two will:
- Review the state of the art in the use of ‘big data’ and its implications for preservation
- Examine and debate the use cases for archived databases / big data
- Identify recommendations for further research and guidance in the preservation of ‘big data’
Interested parties are welcome to attend either or both days.
Who Should Come?
This briefing day will interest:
- Collections managers, librarians, curators, archivists in memory institutions
- CIOs and CTOs in organisations with commercial intellectual property
- Records managers and business analysts with requirements for long-lived data or legacy systems
- Vendors and developers with digital preservation solutions
- Researchers with interests e-infrastructure and digital preservation
Registration
Registration is now open on the DPC website and will close 7 days before the event - http://bit.ly/1Hj72bg
Draft Programme
Day One – Thursday 19th Feb 2015
1000 – Registration opens, tea and coffee
1030 – Welcome and Introduction
1035 – Why preserving databases matters and why it is harder than it sounds (Matthew Woollard, UK Data Archive (UKDA))
1055 – The E-ARK project and database preservation (Kuldar Aas, National Archives of Estonia)
1135 – What do we mean by big data, data warehousing and OAIS? (Janet Delve, University of Portsmouth; Karin Bredenberg, National Archives of Sweden)
1205 – Q&A
1215 – Lunch
State of the art: Case studies in Preserving Databases
1315 – Case study one: Anders Bo Nielsen, Danish National Archives
1335 – Case study two: Andreas Voss, Swiss National Archives 1355 – Case study three: Hélder de Jesus Almeida da Silva, KEEP Solutions
1415 – Case study four: Tarvo Kärberg, National Archives of Estonia
1435 – Q&A
1445 – Tea and coffee
State of the art: Case studies in Preserving Databases contd. 1510 – Case study five: Jože Škofljanec, Boris Domajnko, Slovenian National Archives
1530 – Introducing big data solutions: E-ARK big data techniques at AIT, Rainer Schmidt, Austrian Institute of Technology.
1550 – Round table
By 1630 close
Day Two – Friday 20th Feb 2015
0930 – Registration opens, tea and coffee
1000 – Welcome back and synopsis of day one
1020 – Preserving databases: practical lessons from Archaeology (Jo Gilham, ADS)
1040 – Big data and relational data: the same but different (Nathan Cunningham, UKDA)
1110 – De-normalising data for archival preservation (Jan Rörden, University of Cologne)
1140 – Data mining for accessing archived databases (Richard Healey, University of Portsmouth)
1210 – Q&A
1220 – Panel session: Preserving big data and relational databases: – what is to be done?
1315 - Next steps and future directions
By 1330 close
The E-ARK Project has published 3 new reports which describe the outcomes of our research into the Use Cases which will be addressed in our Pilot Phase (D2.1), identified best practices in digital archiving covering ingest workflows, SIP formats and records export (D3.1) and an initiation document for the creation of a Pan-European AIP format.
These can be downloaded from the Resources Section of our website.
the DLM Forum, has released details of its forthcoming 7th Triennial Conference, to be held at the premises of the
Instituto Superior Técnico (Engineering School of Lisbon University)
Between Wednesday 12 and Friday 14 November 2014.
The Conference is entitled:
"Making the Information Governance Landscape in Europe"
The conference will be preceded by 2 days of DLM Workshops and Tutorials on Monday 10 and Tuesday 11 November, and will include sessions on the work of the E-ARK Project.
A call has now opened for submission of papers on 3 streams:
- Managing information for control, access and compliance
- Records Management in transition
- Archival initiatives for ingest, preservation and access
Full details of the Conference are available at http://dlmlisbon2014.dlmforum.eu
The online journal International Science Grid This Week (ISGTW) has published an article about the E-ARK project, highlighting the importance of the work being carried out.
IISGTW is a weekly online newsletter about distributed computing, cloud computing, and supercomputing, and their impacts on scientific research. iSGTW is a collaboration between the Open Science Grid (OSG) and several organizations in the US and Europe, and is intended to help inform the global cyberinfrastructure (CI) community about technological advances, transformative science, professional and workforce development opportunities, new uses of technology in previously underrepresented fields of research, and creative ways to fund, sustain, and broaden local CI initiatives.
A project to create an electronic ‘ark’ for digital and paper-based archives has received a cash injection. The project will address the problem of archiving digital data from many different kinds of systems all across Europe.
The European Commission has awarded £6M to archiving and digital preservation specialists to create E-ARK, a revolutionary method of archiving data that is set to become the gold standard across Europe. The system will ensure current digital archives, including ‘big data,’ are future-proofed. (Big data is data sets of such a size that it is difficult to manage with traditional software and databases.)
Digital preservation specialists at the University of Portsmouth are leading the project, which involves over a dozen major partners including a core group of European national archives. The University’s Dr Janet Delve and Professor David Anderson are tackling what they describe as a mammoth undertaking to address an issue which becomes larger by the day.
Dr Delve said: “The size of the problem is huge. We are looking at years of accumulated data across almost 30 countries that have been stored using a variety of different methods and on different systems. With the onset of e-government and open data initiatives, archives now have to cope with storing huge amounts of digital material. The size of the problem is growing because of the colossal quantity of electronic data generated on a daily basis from organisations as diverse as banks, public health organisations and national archives.
“Our objective is to reduce the risk of information loss due to poor methods of keeping and archiving records by providing one common, robust approach. It must be replicable and scalable to meet the needs of many kinds of organisations, public and private, large and small, and able to support complex data types such as web pages and big data.
“The term ‘archives’ usually conjures a vision of vast rooms filled with dusty papers, guarded by a wizened archivist. Not anymore.”
E-ARK (European Archival Records and Knowledge Preservation) will benefit public administrations, public agencies, public services, citizens and business by providing easy and efficient access to the archived records.
Professor Anderson said that a major issue to overcome is navigating different legal systems and records management traditions. He described the task of creating and building an infrastructure usable by all countries across different types of organisation as an enormous jigsaw with hundreds of parts that need to be examined and assessed. “We will take the best bits from the systems we see and our aim is to create something that we know large organisations and archivists alike are crying out for.”
The E-ARK project will examine current best practices to create a pilot archiving service to keep records authentic and usable. It will address the three main endeavours of an archive – acquiring, preserving and enabling re-use of information.
The project will spend three years creating a standard archival process at a pan-European level supported by guidelines and recommended practices that will cater for a range of data from different types of source including record management systems and databases.
The project will be public-facing, providing a fully operational service and access to information for its users, taking account of all legal constraints.
The project launches today at the Instituto Superior Técnico in Lisbon, Portugal. Other than the five national archives, organisations involved include four leading research institutions, three providers of archiving software solutions and services, two government agencies, and two international membership organisations representing communities who will benefit from the project, such as data owners and providers, archives, software vendors and solution providers.
Latest News
-
E-ARK Rated "Excellent" by the EC - "A European Showcase Project"
03-04-2017
The European Commission's Year 3 Review of the E-ARK Project has rated it as "Excellent" and have...
-
E-ARK Year 3 Summary Report Published
16-03-2017
The E-ARK summary report of project activities in the final year of the project can be downloaded by...
-
E-ARK Report on Pilots
13-02-2017
E-ARK Project Report on Pilots The E-ARK Project final report on the pilots describes the...
-
DLM Forum announces the launch of the DAS Board
02-02-2017
The DLM Forum have formally launched the Digital Archiving Standards Board (DAS Board). The main...
-
E-ARK Project Completes its Reseach Phase
02-02-2017
The E-ARK Project has now reached the end of its Research Phase On 31 January 2017, after 3 years, the...
@EARKProject on Twitter
For the latest EARK news follow us on Twitter @EARKProject