SAGE is a web-based tool for producing, enriching, publishing, accessing and managing RDF datasets. RDF data can be directly imported or generated from diverse, non-RDF data sources and data formats, organized in datasets, enriched using annotators wrapping web-based or other third party services, and the enrichments can be manually validated. All datasets, including any annotations, can be published in RDF stores, indexed and accessed through API calls.
Powered by D2RML, SAGE can import data generated from diverse sources (e.g. relational databases, REST APIs, SPARQL endpoints, local system and remote files) and diverse formats (e.g. XML, JSON, CSV, Excel spreadsheets, plain text) using powerful, custom transformation rules that may combine data from multiple sources. Data already in RDF form can be directly imported. For more fine-grained data management and access, the imported data may be organized in multiple datasets and catalogs, whose metadata are modelled using the DCAT and VOID linked open vocabularies.
Imported data can be published in multiple RDF stores and indexed. Currently, SAGE supports the OpenLink Virtuoso and Blazegraph RDF stores. Publicly published datasets and catalogs can be individually searched, accessed through dedicated SPARQL endpoints and browsed using an embedded LodView viewer.
Through SAGE, selected parts of published datasets can also be annotated and enriched by invoking relevant external API services. Such services include e.g. tools linking data to relevant Wikidata, DBPedia, Geonames and other resources, or tools that detect occurrences of vocabulary terms in the data. Built-in support for NERD (named entity recognition and disambiguation), SKOS vocabulary lookup, and SPARQL query annotators is provided. The enrichments, which are modelled using the W3C annotation model, can then be manually validated through an integrated validation subsystem that allows bulk validations through text grouping and text frequency sorting, assignment of validation tasks to multiple users, and close monitoring of the overall validation process.
SAGE has been used in several projects, including STIRData, where it has been used to transform into a common RDF model and publish as linked data millions of company data entries from several European business registries, and Europeana XX where it has been for the automatic enrichment and validation of hundreds of thousands of cultural item records.