Skip to content

Add YAML-safe string formatter and Liquid filter #34

@briandominick

Description

@briandominick

Summary

We need a shared utility that formats arbitrary Ruby strings as YAML-safe scalar values, choosing between plain, quoted, or block styles based on content. This logic should be exposed both as:

  1. A Ruby method (e.g. to_yaml_scalar_format)
  2. A Liquid filter with equivalent behavior (e.g., yaml_scalar_format)

Both should live in (or be sourced from) the SchemaGraphy module so the behavior is consistent anywhere YAML is emitted.

The goal is correctness first, readability second. When uncertain, prefer a safer representation over a clever one.


Desired behavior (high-level)

Given an input string or other scalar (Boolean, Numeric, etc), the formatter should decide whether to emit it as:

  • a plain scalar (no wrapper),
  • a quoted scalar (single or double quotes), or
  • a block scalar (|, |-, or > as appropriate).

The logic should be centralized and reused by both the Ruby API and the Liquid filter.


Scalar selection rules (rewordable guidance)

1. Plain / flow style (no wrapper)

Use an unquoted scalar only if the string contains no characters that require escaping or quoting in YAML.

Examples:

  • No :
  • No quotes (' or ")
  • No backticks
  • No special YAML-reserved or ambiguous characters

If the string is “boringly safe,” leave it unwrapped.


2. Quoted style

If quoting is required but the string is otherwise simple and single-line:

  • Prefer double quotes ("...") in most cases
  • Prefer single quotes ('...') if the source string already contains double quotes

This applies when the string contains characters such as:

  • :
  • '
  • "
  • backticks
  • other characters that YAML requires to be quoted or escaped

The intent is to preserve readability while remaining YAML-correct.


3. Block style (literal or folded)

Use a block scalar whenever the string is complex or potentially risky.

Specifically, use block style if the string:

  • Is multi-line
  • Contains any dangerous or ambiguous combination of characters
  • Is difficult to reason about safely in quoted form
  • Falls into a “when in doubt” category

Safety and clarity take precedence over compactness.

Exact block indicator choice (|, |-, >) can be decided by the implementer based on existing conventions.


Liquid filter

Expose the same logic as a Liquid filter so templates can safely emit YAML scalars.

Example intent:

{{ 'my string containing " character' | yaml_scalar_format }}

Expected output:

'my string containing " character'

The filter should delegate to the same underlying logic as the Ruby method.


Other Considarations

Non-string Scalar Handling

This issue has not focused on non-String scalars, but they should be accommodated according to Ruby Psych's interpretation of YAML 1.2 (or otherwise).

For instance, true and false should be unquoted, though that already fits the logic expressed.

Similarly, all integers and floats should be unquoted, of course.

YAML Tagging as Necessary

In any truly extraordinary cases, the explicit type tag such as !!str/!!int/!!bool/!!float.

Further Research

Some research was performed to establish that no existing library handled this, but search again to see if any published gems handle the logic of any part of this.

Additional Arguments

Also consider whether to let users argue preferences, such as:

{{ 'my string containing " character' | yaml_scalar_format: 'double-quotes' }}

Where 'double_quotes' could be 'single-quotes' but defaults to 'unquoted' as the preferred "default" expression, so the above would render:

'my string containing " character'

But any other expression would be double-quoted, such as:

{{ 'my string containing no double-quote characters' | yaml_scalar_format: 'double-quotes' }}

Which would yield:

"my string containing no double-quote characters"

Consider other additional arguments as might appeal to fully customizing output.


Acceptance criteria

  • One canonical formatting method in SchemaGraphy
  • One Liquid filter backed by the same logic
  • Correct YAML output across plain, quoted, and block styles
  • Conservative behavior: correctness over cleverness

Sub-issues

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions