Modern science is driven by data. In almost all areas of science, major advances are now possible thanks to the ability to process and analyse vast amounts of data. Yet scientific communication still deals mainly in results, rather than data and methods. Experiments are described informally, and only the outcome of the scientific workflow is reported. The data is often difficult to find, and the software used to analyze it is difficult to interpret or simply unavailable.
Data2Semantics attacks this problem on several levels. Practically, we will develop a generic system which can serve as the backbone for several real-life use cases. To this end we will work together with Elsevier, Philips, DANS and COMMIT itself to provide systematic access not only to scientific results, but also the data, the methods, the software and the people behind it.
We will tackle several challenges in developing this system: machine learning on data in a semantic description, information retrieval in a semantic context, annotation of software and methodology for full provenance and investigation of the process describing data in terms of a semantic model.
The task of transforming a real or virtual architecture to a complete semantic description relates directly to SNE’s activities in modeling infrastructure semantically in the NDL language. Data2Semantics will develop generic machine learning techniques for semantic data that can be used directly to solve path planning and optimization problems on infrastructures described in NDL.
 From data to semantics for scientific data publishers
Gerben de Vries