This script in this module converts a simple RDF vocabulary, described in YAML, into a formal RDFS in JSON-LD, Turtle, and HTML. Optionally, a simple JSON-LD @context is also generated for the vocabulary. Neither the script nor the YAML format is prepared for complex vocabularies; its primary goal is to simplify the generation of simple, straightforward RDFS vocabularies not requiring, for instance, sophisticated OWL statements.
When running, the script relies on two files:
- The
vocabulary.ymlfile, containing the definition for the vocabulary entries. (It is also possible to use a different name for the YAML file, see below.) - The
template.htmlfile, used to create the HTML file version of the vocabulary. (It is also possible to use a different name for the template file, see below.)
The vocabulary is defined in a YAML file, which contains several block sequences with the following keys: vocab, prefix, ontology, class, property, individual, and datatype. Only the vocab and ontology blocks are required, all others are optional.
Each block sequence consists of blocks with the following keys:id, property, value, label, upper_value, domain, range, deprecated, comment, status, defined_by, context, and see_also. The interpretation of these key/value pairs may depend on the top level block where they reside, but some have a common interpretation.
-
Common key/value pairs for the
class,property,datatype, andindividualblocks:labelrefers to a short header label to the term. If missing, the capitalized value ofidis used.commentrefers to a longer description of the term, and can be used for blocks in theclass,propertyandindividualtop-level blocks. It may include HTML Flow content elements. The comment will be encapsulated into an HTML<div>element and will then be displayed verbatim in the HTML version of the vocabulary, and as a literal of typerdf:HTMLin the JSON-LD and Turtle versions. Note that the Markdown syntax for simple formatting, like the use of backtick for<code.../>, may also be used.typerefers to RDF types. Note that the tool automatically adds types likerdf:Property,rdfs:Class, etc.; this key is to be used for the vocabulary specific types only.defined_byshould be a URL, or a list thereof, referring to the formal definition(s) of the term.see_alsorefers to one or more blocks withlabelandurlkeys, providing a human-readable title and a URL, respectively, to an external document that can be referred to by the description of the term. (These are translated into anrdfs:seeAlsoterm in the vocabulary.)- The
statuskey refers to a string that can bestable,reserved, ordeprecated. The terms are divided, in the HTML output, into these three sections.stableis the default. - The
deprecatedkey refers to a boolean, signaling whether term is deprecated or not. Default isfalse. This property is a leftover from earlier version and is overwritten, if applicable, by the value ofstatus. - The
contextkey refers to list of URLs or two special keywords. It is used to add information on JSON-LD@contextfile(s) that "mention" the term; the list of URLs refer to the relevant@contextfile. If the value isvocab, and a global@contextfile is defined in thevocabblock, that "default"@contextis used. Finally, if the value of the property isnone, there is no context file reference for the term. The default setting isvocab(i.e., unless it is otherwise specified, the default value is used for the term). - The
examplekey refers to on or more blocks withlabelandjsonkeys, providing a (JSON) example with a title. These examples are placed, in the HTML version, to the end of the section referring to a term (the examples are ignored in the Turtle and the JSON-LD versions). Care should be taken to use the"|"block style indicator in the YAML file for the examples. - The
known_askey refers to a string, that can be used, in rare occasions, as an alias to the term's label. It currently used when generating a JSON-LD context file, as the name of the property in the context file instead of the official label. This means that JSON-LD users, using that context file, must refer to this alternative name in their code (this may come in handy when the official name changes but the old name is kept for JSON users for backward compatibility reasons).
-
Top level blocks:
-
vocab: a block with theidand thevaluekeys defining the prefix and the URL of the vocabulary, respectively. Theidprovides a prefix that can be used in the vocabulary descriptions, e.g., for cross-references. The additional, optionalcontextkey may provide a default context file reference (as a URI), used by all terms unless locally overwritten (see above). Note that thecontextkey is required if the HTML template includes a context section. -
prefix: definition of a prefixes, and corresponding URLs, for each external vocabulary in use, defined by theidandvaluekeys, respectively.Some id/value pairs are defined by default, and it is not necessary to define them here. These are:
dc(forhttp://purl.org/dc/terms/),owl(forhttp://www.w3.org/2002/07/owl#),rdf(forhttp://www.w3.org/1999/02/22-rdf-syntax-ns#),rdfs(forhttp://www.w3.org/2000/01/rdf-schema#),xsd(forhttp://www.w3.org/2001/XMLSchema#), andschema(forhttp://schema.org/). -
ontology: definition of "ontology properties", that is, statements made about the vocabulary itself. The (prefixed) property term is defined by thepropertykey, and the value by thevaluekey. If the value can be parsed as a URL, it is considered to be the URL of an external resource; otherwise, the value is considered to be (English) text.It is good practice to provide, at least,
dc:descriptionas an ontology property with a short description of the vocabulary.The script automatically adds a
dc:datekey with the generation time as a value. -
class: blocks of a class definitions. For each class heidkey defines the class name (no prefix should be used here). Possible superclasses are defined by theupper_valuekey as a single term, or a sequence of terms. -
property: blocks of a property definitions. For each property theidkey defines the property name (no prefix should be used here); possible superproperties are defined in the by theupper_valueas a single term, or as a sequence of terms. The domain and range classes can also be provided as a single term, or as a sequence of terms, through thedomainandrangekeys, respectively.Note that both the
domainand therangekeys can take an array of class references as values. For the former this means the resulting domain is the union of the referred classes, whereas for the latter it is the intersection.The
rangekey may also use the (single)IRI(orURL) term instead of class references. This keyword denotes a property that has no explicit range, but whose objects are expected to be IRI references. The generated vocabulary annotates these properties as belonging to theowl:ObjectPropertyclass, which is the term reserved for properties whose objects are not supposed to be literals. A comment is also generated into the HTML description of the term.The
datasetkey can also be set to a boolean value. This key only influences the generated JSON-LD@context: if the value istrue, the JSON-LD@containeris set to the@graphvalue for the property, signalling that the value refers to a dataset (or graph). See the JSON-LD Specification for further details. -
individual: blocks of definitions of individuals, i.e., a single resources defined in the vocabulary. For each individual theidkey defines the property name (no prefix should be used here); the possible types are defined in the block fortypeas a single term, or a sequence of terms. (Earlier versions of this tool usedupper_valuefor the same purpose, but that usage, though still understood for backward compatibility reasons, is deprecated.) -
datatype: blocks of datatype definitions. For each datatype theidkey defines the datatype name (no prefix should be used here). The possible types are defined in the block forupper_valueor fortype, as a single term for possible datatype this is derived from.
-
For classes, properties, individuals, and datatypes, the id key, and either the comment or the defined_by keys, are required. All the others are optional.
There are some examples in the example directory on GitHub that illustrate all of these terms.
The value of the id key is, usually, simple reference identifying the class, property, etc., as part of the vocabulary. It is also possible to use a curie instead of a simple reference. Such terms are considered to be external terms: terms that are formally defined in another vocabulary, and are listed only to increase the readability of the vocabulary specification. (Typical cases are schema.org or Dublin Core terms, that are frequently used in combination with other vocabularies.)
External terms, while they appear in the HTML document generated by the tool, do not result in formal RDF statements in Turtle, JSON-LD, or RDFa; they only appear as information only items in the generated document.
The prefix part of the curie must be defined through the prefix top level block.
Some efforts are made to make the output files (HTML, JSON-LD, and Turtle) properly formatted to make them readable. A subset of the editorconfig facilities are also taken into account. Namely, if an .editorconfig file is found, the following supported pairs are used (with the default values in parenthesis):
indent_style(space)insert_final_newline(false)indent_size(4)max_line_length(0)end_of_line(lf)
See the .editorconfig for further details.
The script is in TypeScript (version 5.0.2 and beyond) running on top of node.js (version 21 and beyond). It can also run with deno (version 2.1 and beyond).
Beyond the YAML file itself, the script relies on an HTML template file, i.e., a skeleton file in HTML that is completed by the vocabulary entries. The example template file on GitHub provides a good starting point for a template that also makes use of respec. The script relies on the existing id values and section structures to be modified/extended by the script. Unused subsections (e.g., when there are no deprecated classes) are removed from the final HTML file.
The script can be used as a standard npm module via:
npm install yml2vocab
The npm installation installs the node_modules/.bin/yml2vocab script. The script can be used as:
yml2vocab [-v vocab_file_name] [-t template_file_name] [-c]
If deno is installed globally, one can also run the script directly (without any further installation) by
deno run --allow-read --allow-write --allow-env /a/b/c/main.ts [-v vocab_file_name] [-t template_file_name] [-c]
in the top level. To make it simpler, a binary, compiled version of the program can be generated by
deno compile --allow-read --allow-write --allow-env main.ts
which results in an executable file, called yml2vocab, that can be stored anywhere in the user's $PATH.
The program can also be run without installing the package locally. Just do a:
deno run -A jsr:@iherman/yml2vocab/cli [-v vocab_file_name] [-t template_file_name] [-c]
The script generates the vocab_file_name.ttl, vocab_file_name.jsonld, and vocab_file_name.html files for the Turtle, JSON-LD, and HTML+RDFa versions, respectively. The script relies on the vocab_file_name.yml file for the vocabulary specification in YAML and a template_file_name file for a template file. The defaults are vocabulary and template.html, respectively.
If the -c flag is also set, the additional vocab_file_name.context.jsonld is also generated, containing a JSON-LD file that can be used as a separate @context reference in a JSON-LD file. Note that this JSON-LD file does not necessarily use all the sophistication that JSON-LD defines for @context; these may have to be added manually.
The simplest way of using the module from Javascript is:
const yml2vocab = require('yml2vocab');
async function main() {
await yml2vocab.generateVocabularyFiles("vocabulary","template.html",false);
}
main();
This reads (asynchronously) the YAML and template files and stores the generated vocabulary representations (see the command line interface for details) in the directory alongside the YAML file. By setting the last argument to true a @context is also generated.
The somewhat lower level yml2vocab.VocabGeneration class can also be used:
const yml2vocab = require('yml2vocab');
const vocabGeneration = new yml2vocab.VocabGeneration(yml_content); // YAML content is text form, before parsing
const turtle: string = vocabGeneration.getTurtle(); // returns the turtle content as a string
const jsonld: string = vocabGeneration.getJSONLD(); // returns the JSON-LD content as a string
const html: string = vocabGeneration.getHTML(template_file_content); // returns the HTML+RDFa content as a string
const context: string = vocabGeneration.getContext(); // returns the minimal @context file for the vocabulary
Running TypeScript is used instead of Javascript is similar, except that the require must be replaced by:
import yml2vocab from 'yml2vocab';
There is no need to install any extra typing, it is included in the package. The interfaces are simply using strings, no extra TypeScript type definitions have been defined.
The package is also available on JSR @iherman/yml2vocab. All previous examples are valid for deno, except for the import statements which should be:
import yml2vocab from 'jsr:@iherman/yml2vocab'
Note that deno can also import npm packages if explicitly named, so the following import statement is also valid:
import yml2vocab from 'npm:yml2vocab'
No prior installation step is necessary.
The repository may also be cloned.
Readme.md: this file.package.json: configuration file fornpm.deno.json: configuration file fordenoexample: a folder with examples for vocabulary definition files and the generated RDF vocabulary files.libdirectory: the TypeScript modules for the script.distdirectory: the Javascript distribution files (compiled from the TypeScript sources usingtscinnode.js)main.ts: the TypeScript entry point to the script as a command line toolindex.ts: the top level type interface, to be used if the files are used by an external script.docsdirectory: documentation of the package as generated by Typedoc
The following files and directories are generated/modified by either the script or npm; better not to touch these directly:
package-lock.json: used bynpmas an internal file for the packages.node_modulesdirectory: the various Javascript libraries used by the script. This directory should not be uploaded to GitHub, it is strictly for the local activation of the script.deno.lock: used bydenoto manage imported packages using its own mechanism (bypassingnode_modules).
The original idea, structure, and script (in Ruby) was created by Gregg Kellogg for v1 of the Credentials Vocabulary and with a vocabulary definition using CSV. The CSV definitions have been changed to YAML, and the script itself has been re-written in TypeScript, and developed further since.
Many features are the result of further discussions with Many Sporny and Benjamin Young.