Skip to content

Repository Storage

The SoilWise repository aims at merging and seamlessly providing different types of content. To host this content and to be able to efficiently drive internal processes and to offer performant end user functionality, different storage options are implemented.

  1. A relational database management system for the storage of the core metadata of both data and knowledge assets.
  2. A Triple Store to store the metadata of data and knowledge assets as a graph, linked to soil health and related knowledge as a linked data graph.
  3. Git for storage of user-enhanced metadata.

Functionality

PostgreSQL RDBMS: storage of raw and augmented metadata

Info

Current version: Postgres release 12.2;

Technology: Postgres

Access point: SQL

A "conventional" RDBMS is used to store the (augmented) metadata of data and knowledge assets. The harvester process uses it to store the raw results of the metadata harvesting of the different resources that are currently connected. Various metadata augmentation jobs use it as input and write their input to this data store. The catalogue also queries the Postgress database.

There are several reasons for choosing an RDBMS as the main source for metadata storage and metadata querying

  • An RDBMS provides good options to efficiently structure and index its contents, thus allowing performant access for both internal processes and end user interface querying.
  • An RDBMS easily allows implementing constraints and checks to keep data and relations consistent and valid.
  • Various extensions, e.g. search engines, are available to make querying, aggregations even more performant and fitted for end users.

Virtuoso Triple Store: storage of SWR knowledge graph

Info

Current version: Virtuoso release 07.20.3239

Technology: Virtuoso

Access point: Triple Store (SWR SPARQL endpoint) https://repository.soilwise-he.eu/sparql

A Triple Store is implemented as part of the SWR infrastructure to allow a more flexible linkage between the knowledge captured as metadata and various sources of internal and external knowledge sources, particularly taxonomies, vocabularies and ontologies that are implemented as RDF graphs. Results of the harvesting and metadata augmentation that are stored in the RDBMS are converted to RDF and stored in the Triple Store.

A Triple Store is selected as a parallel storage because it offers several capabilites

  • It allows the linking of different knowledge models, e.g. to connect the SWR metadata model with existing and new knowledge structures on soil health and related domains.
  • It allows reasoning over the relations in the stored graph, and thus allows connecting and smartly combining knowledge from those domains.
  • Through the SPARQL interface, it allows users and processes to use such reasoning and exploit previously unconnected sets of knowledge.

Git: storage of code and configuration

Info

Technology: Gitlab and GitHub

Access point: https://github.com/soilwise-he

Git is used to store versions of Soilwise code, documentation and configuration. It is also used for issue and release management and automated pipelines for deployment, augmentation, validation and harvesting external sources.