Skip to content

Fix JSON deserialization of empty containers and string-embedded delimiters#845

Open
tigercosmos wants to merge 1 commit into
solvcon:masterfrom
tigercosmos:feature/json-parse-empty-and-string-delimiters
Open

Fix JSON deserialization of empty containers and string-embedded delimiters#845
tigercosmos wants to merge 1 commit into
solvcon:masterfrom
tigercosmos:feature/json-parse-empty-and-string-delimiters

Conversation

@tigercosmos
Copy link
Copy Markdown
Member

The hand-rolled JSON deserializer in SerializableItem.cpp could not parse several valid inputs and silently accepted some invalid ones. Empty arrays [] and empty objects {} were rejected outright, and any primitive string value embedding a structural delimiter (,, ], }) was truncated at the first delimiter because the value scanner did not track quoted strings. Both parse_array and parse_object now scan leaf values with quoted-string and backslash-escape awareness, and accept empty containers.

The same string-blindness affected nested containers: the loops that extracted a nested {...} or [...] counted raw braces and brackets, so a string value such as "a]b" inside a nested array still split the container at the wrong place. Those three loops are replaced by a single shared scan_balanced_expression helper that ignores string contents (and their escapes) when matching the closing delimiter, so nested structures like {"k":["a]b"]} parse correctly.

Trailing commas are now rejected rather than silently tolerated: an empty container stays valid, but a closing delimiter immediately following a comma raises a parse error, keeping the parser aligned with strict JSON.

The change adds twenty gtest cases under the Json suite covering empty and whitespace-only containers, leaf and nested string values that embed each delimiter, escaped quotes and backslashes through both the leaf and container scanners, trailing/leading/double-comma rejection, unterminated nested containers, and an end-to-end round trip. A few pre-existing weaknesses are intentionally deferred: the End state still accepts a single trailing character after the closing bracket, malformed end-of-input can leave an empty value expression before classification, and number classification remains a character-class check.

parse_array and parse_object now track quoted-string state (with
backslash escapes) while scanning values, accept empty [] and {}
containers, and reject trailing commas. A shared
scan_balanced_expression helper replaces the three raw brace/bracket
counting loops, so delimiters embedded in nested string values no
longer split a container in the wrong place. Adds 20 Json gtest cases.
Copy link
Copy Markdown
Member Author

@tigercosmos tigercosmos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yungyuc Please take a look, thanks!

// (with backslash escapes) are ignored when matching, so brackets or braces
// embedded in string values do not unbalance the scan. Throws when the
// container is not closed.
inline void scan_balanced_expression(const std::string & json, size_t & index, char open_char, char close_char, int & line, int & column, std::string & out)
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extract the reused logic.

Comment on lines +247 to +248
// Track quoted-string state so that ',' or ']' inside a
// string value does not terminate the scan early.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix the issue.

Comment on lines +486 to +487
// Track quoted-string state so that ',' or '}' inside a
// string value does not terminate the scan early.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix the issue.

// sequences. They exercise the parser directly through detail::JsonNode so
// that the produced structure can be inspected precisely.

TEST(Json, parse_empty_array)
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More test cases to increae the coverage.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like the tests to be moved to Python, which requires the Python wrapper for the serialization code. Please help create an issue to track the work (it's future work that does not need to be included in this PR).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yungyuc Sure, let me move all serialization test to python in the next PR.

@tigercosmos tigercosmos marked this pull request as ready for review May 30, 2026 16:51
Copy link
Copy Markdown
Member

@yungyuc yungyuc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix looks good to me.

A derived issue: The JSON processing (serialization) code should live in C++ for performance and flexibility, but the tests should be in Python for ease of maintenance and natural documentation. It should be fixed in the future.

// sequences. They exercise the parser directly through detail::JsonNode so
// that the produced structure can be inspected precisely.

TEST(Json, parse_empty_array)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like the tests to be moved to Python, which requires the Python wrapper for the serialization code. Please help create an issue to track the work (it's future work that does not need to be included in this PR).

@yungyuc yungyuc added the performance Profiling, runtime, and memory consumption label May 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Profiling, runtime, and memory consumption

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants