DOMAIN KNOWLEDGE
  • Data Transformation
  • Linked Data Integration
TOOLS

SAGE

SAGE is a linked data generation, enrichment, publication and management tool.

Learn more about SAGE

The tools and platforms developed by Datoptron have been successfully used in the cultural heritage and public administration domains by hundreds of organisations.

SAGE is a web-based tool for producing, enriching, publishing, accessing and managing RDF datasets. RDF data can be directly imported or generated from diverse, non-RDF data sources and data formats, organized in datasets, enriched using annotators wrapping web-based or other third party services, and the enrichments can be manually validated. All datasets, including any annotations, can be published in RDF stores, indexed and accessed through API calls.

Powered by D2RML, SAGE can import data generated from diverse sources (e.g. relational databases, REST APIs, SPARQL endpoints, local system and remote files) and diverse formats (e.g. XML, JSON, CSV, Excel spreadsheets, plain text) using powerful, custom transformation rules that may combine data from multiple sources. Data already in RDF form can be directly imported. For more fine-grained data management and access, the imported data may be organized in multiple datasets and catalogs, whose metadata are modelled using the DCAT and VOID linked open vocabularies.

Imported data can be published in multiple RDF stores and indexed. Currently, SAGE supports the OpenLink Virtuoso and Blazegraph RDF stores. Publicly published datasets and catalogs can be individually searched, accessed through dedicated SPARQL endpoints and browsed using an embedded LodView viewer.

Through SAGE, selected parts of published datasets can also be annotated and enriched by invoking relevant external API services. Such services include e.g. tools linking data to relevant Wikidata, DBPedia, Geonames and other resources, or tools that detect occurrences of vocabulary terms in the data. Built-in support for NERD (named entity recognition and disambiguation), SKOS vocabulary lookup, and SPARQL query annotators is provided. The enrichments, which are modelled using the W3C annotation model, can then be manually validated through an integrated validation subsystem that allows bulk validations through text grouping and text frequency sorting, assignment of validation tasks to multiple users, and close monitoring of the overall validation process.

SAGE has been used in several projects, including STIRData, where it has been used to transform into a common RDF model and publish as linked data millions of company data entries from several European business registries, and Europeana XX where it has been for the automatic enrichment and validation of hundreds of thousands of cultural item records.

Discover how you can embrace your data, explore new insights and drive new value for your organization

We are a group of talented people including researchers, software developers, ontology engineers and machine learning experts with long experience on applying cutting-edge research findings and technology on real-world applications.

Let's Work Together