Managing the Evolution and Preservation of the Data Web
R&D Project - European
The Web has not only caused a revolution in communication; it also has completely changed the way we gather and use data. Open data -- data that is available to everyone -- is exponentially growing, and it has completely transformed the way we now conduct any kind of research or scholarship; it has changed the scientific method. The recent development of Linked Open Data has only increased the possibilities for exploiting public data. Given the value of open data how do we preserve it for future use? Currently, much of the data we use, e.g. demographic records, clinical statistics, personal and enterprise data as well as many scientific measurements cannot be reproduced. However, there is overwhelming evidence that we should keep such data where it is technically and economically feasible to do so. Until now this problem has been approached by keeping this information in fixed data sets and using extensions to the standard methods of disseminating and archiving traditional (paper) artifacts. Given the complexity, the interlinking and the dynamic nature of current data, especially Linked Open Data, radically new methods are needed. DIACHRON tackles this problem with a fundamental assumption: that the processes of publishing and preservation data are one and the same. Data are archived at the point of creation and archiving and dissemination are synonymous. DIACHRON takes on the challenges of evolution, archiving, provenance, annotation, citation, and data quality in the context of Linked Open Data and modern database systems. DIACHRON intends to automate the collection of metadata, provenance and all forms of contextual information so that data are accessible and usable at the point of creation and remain so indefinitely. The results of DIACHRON are evaluated in three large-scale use cases: open governmental data life-cycles, large enterprise data intranets and scientific data ecosystems in the life-sciences.