pdf D6.3 Data Mining Showcase Popular

By 665 downloads

Download (pdf, 3.12 MB)

620998 E-ARK D6.3 updated.pdf

E-ARK has developed the Integrated Platform Reference Implementation (IPRIP) which is a software prototype for package creation, search, and access using data-centric technologies. The underlying concepts and technologies of the system have been detailed in past deliverables (D6.1 and D6.2). The main components of the IPRIP e-archiving environment include a scalable packaging infrastructure for creating OAIS Information Packages and a data-centric repository for searching and accessing data items on demand.

This deliverable focuses on recent developments that support the deployment of the IPRIP in different configurations at E-ARK stakeholder sites. A flexible packaging mechanism combined with a standalone backend implementation enables custom single-server deployments on demand. The scalable, Hadoop-based backend implementation has been ported to the latest CDH (Cloudera Distribution Hadoop) distribution in order to support a recent technology stack, advanced data mining concepts, and enterprise demands. Showcases that dealt with the application of text mining approaches, the extraction and visualization of geographical information, the implementation of a database archiving workflow, and mass document ingest have been implemented and deployed at stakeholder sites in Hungary and Slovenia.

Latest News

RSS 2.0 Feed - Latest News