XML to CUE Encoding Proposal #3776
Replies: 5 comments 9 replies
-
|
Thank you for writing up this proposal! I'll write some minor high level thoughts we've had about it in this comment here, and split up some other thoughts into separate comments so they can spawn their own sub-threads without being mixed together.
|
Beta Was this translation helpful? Give feedback.
-
|
Following my point 3 above, what would happen for repeated XML elements with the same tag name but different namespaces? For example: Interestingly enough, badgerfish seems to not be clear on this edge case. |
Beta Was this translation helpful? Give feedback.
-
|
It's worth pointing out that your example and a variant with the same namespace URIs but with different prefixes, like are effectively representing the same data, but would result in different decoded CUE. This affects badgerfish as well, so it's presumably not a significant problem. Update: light edit by @myitcv to name the blobs of XML |
Beta Was this translation helpful? Give feedback.
-
|
I would like to take a second to acknowledge that this discussion is about emulating xml semantics in cue, and that that is insane and impressive at the same time. |
Beta Was this translation helpful? Give feedback.
-
|
As of 096a114, the Please note that this new encoding is experimental for now, given that it's a bespoke design that might need to be tweaked as we gain experience with it. There is a Go package available here: https://pkg.go.dev/cuelang.org/go/encoding/xml/koala And the CLI supports it as follows: |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
XML to CUE Encoding Proposal
Problem
Many users would benefit from using CUE with their XML files, however CUE does not currently have an encoding that supports XML.
Purpose of this document
This document puts forward a proposal for an XML to CUE mapping called
koalathat can be used to add an XML encoding to CUE.Given XML has constructs like attributes and namespaces that don't have identical analogues in CUE, there are many approaches for mapping from XML to CUE, with other future XML encodings possible. Examples of future options could include encodings that use a schema-guided approach (similar to the textproto encoding), and raw low-level AST style encodings that model each XML construct as a node that associates to abstractions like attributes, tags, and other content.
Objectives
The mapping in this proposal aims to:
Non-Objective
Proposed Mapping
The proposed mapping follows a convention that is inspired by the Badgerfish convention, with deviations for compatibility with CUE and increased readability.
This new mapping will be called
koalaand follows the rules below:$, and$$, where that property belongs to the struct that is mapped from the XML content's parent element.\r) found in a string are discarded from that string.Sample CUE constraints for XML using
koalaUsing the rules above, one would be able to write CUE constraints for XML as shown below:
Given an XML file with a
noteelement and abookelement, we could write a CUE schema to define types as shown below:XML
CUE constraints
Mapping examples
The examples below illustrate each of the mapping rules defined above:
1. Elements
The XML
noteelement below maps to thenotestruct in CUE.XML
CUE
2. Nested Elements
Nesting an XML
toelement to thenoteelement from the first example results in a nested CUEtostruct inside thenotestruct.XML
CUE
3. Attributes
The
alphaattribute of thenoteelement in XML below maps to the$prefixed$alphaproperty of thenotestruct in CUE.XML
CUE
4. Content
The content of the
noteXML element below maps to the value of the$$property of thenotestruct in CUE.XML
CUE
5. XML Lists
The multiple XML
noteelements at the same level map to a list ofnotestructs in CUE.XML
CUE
6 and 7. Namespace Definitions and References
The
handrXML namespace definitions declared in thetableXML element are declared as properties of theh:tablestruct in CUE.Note how the namespace prefixed XML element names like
h:table,h:tr,h:tdandr:blahcarry across to the key names of their corresponding CUE structs.XML
CUE
8. Element and attribute values
XML element and attribute values map to strings, as shown in the example below.
XML
CUE
Alternative Conventions Considered
Although no known conventions exist to map from XML to CUE, there are a number of known mappings that take XML to JSON, which we can take inspiration from.
Parker and Spark Conventions
The Parker and Spark conventions use a very simplistic model where XML elements are mapped to object properties, and attributes are ignored.
We wish to maintain attribute information so we cannot use these mapping conventions as a whole.
Badgerfish
The Badgerfish convention maps elements, attributes, and content from XML to JSON. We follow the many of the rules in the Badgerfish convention, described here. Notable differences are listed below to allow for mapping to CUE and for increased readability:
XML attributes map to CUE properties starting with a
$prefix instead of an@prefix, given@is already reserved in CUE for CUE attributes. Although we could still use the@prefix using quotes in CUE, we do not want to overload the usage of@for two concepts (ie: for XML attribute prefixes and for CUE attributes). Using the$prefix will also provide a less verbose notation given quotes do not need to be used with this prefix.Given a single
$is not a valid identifier in CUE, we use$$as the property to model element text content instead of$. Although we could use a quoted"$"as the key, we avoid this to prevent ambiguity with usage of$in other contexts, (such as "root element" in JSONPath), and to minimize usage of quotes.For namespaces, we do not recursively define namespaces in nested objects as this would un-necessarily increase verbosity in the mapped CUE. Instead we align more closely to how namespaces are defined in the XML, and only define namespaces in the CUE at the same level as they are declared in the XML. To illustrate how
koalasimplifies the mapping, we provide the example below (Badgerfish mapping taken from here):XML
Badgerfish
koala
GData
The GData convention is similar to Badgerfish, however makes no distinction between identifiers used for elements and those used for attributes.
Unlike the Badgerfish convention, if one were to use this convention to map from XML to CUE, it would mean that it becomes ambiguous whether you are referring to an attribute or to an element when writing a CUE constraint. Further, it is not clear from the rules specified here what happens when there is a collision between an element name and an attribute name.
Abdera
This convention is similar to the GData convention, however, it uses separate
childrenandattributeabstractions when both nested XML elements and attributes are mentioned. Having to mentionchildrenand/orattributesin CUE constraints, as well as integer indexes forchildrenarrays increases verbosity and complexity, which goes against the readability objective of this paper. To illustrate this with an example for Abdera:XML
would map to:
CUE
JsonML
Short for JSON Markup Language, this convention makes heavy use of arrays to ensure an order-preserving mapping, where each element maps to an array entry, and each attribute also maps to an array entry. An example mapping is shown here.
Having to work out (count) integer indexes when writing a CUE constraint rather than just simply using the element and attribute identifiers found in the XML makes this mapping too unwieldy to use for the purposes of our mapping.
Testing Plan
The XML to CUE mapping scenarios required are covered by the examples described here. We will consider the solution complete once it can both decode and encode the examples shown there, along with any other test cases requested by the CUE maintainer team.
Deployment Plan
The new
koalaencoding will not be the default XML encoding, but rather an opt-in encoding. Users will be able to use this from the command line using a command similar to:cue vet schema.cue xml+koala: data.xmlGiven this is not the default encoding, the command below would not work:
cue vet schema.cue data.xmlThis will initially be an experimental encoding, which will be specified in the documentation, however given that it requires opt-in when the xml encoding to be used is specified, it does not need to be toggled using the
CUE_EXPERIMENTvariable as other experimental features are.We also note that embedded XML within CUE will not be supported on day 1.
Beta Was this translation helpful? Give feedback.
All reactions