Replies: 2 comments
-
|
Hey, thanks for this. If I understand correctly, there is a risk that if the serialized ATN is malformed, then the deserializer will crash?
Can you provide an example of how they would do that? Not saying they can't, but if they can then they can also do all sorts of very mean things, no? How would protecting the running process against data injection in this specific area prevent from other ways to break the running process? (genuinely trying to understand) |
Beta Was this translation helpful? Give feedback.
-
|
Thanks for the quick reply - yes, that’s exactly the risk: if the serialized ATN is malformed (e.g., truncated or with inflated counts), the C++ deserializer can read past the end of the provided buffer and crash. I fully agree with your point about the “classic” ANTLR workflow: normally the serialized ATN is generated at build time and embedded in the generated sources, so it is effectively trusted. Where this becomes more than a theoretical concern is when the runtime ends up deserializing ATN data that is not strictly controlled by the application author. A few realistic patterns I’ve seen:
So I’m not claiming this prevents a fully-powerful attacker who can arbitrarily modify process memory or binaries. The value is mainly: if an attacker (or a faulty artifact) can influence a narrow input surface that reaches Below is the key excerpt that demonstrates the core bug pattern (unchecked count → loop → OOB read), matching the structure in // Attacker-controlled count drives reads with no bounds checks.
size_t nstates = data[p++];
for (size_t i = 0; i < nstates; i++) {
int stype = data[p++]; // OOB read when buffer is too small
int ruleIndex = data[p++]; // further OOB reads
}And the smallest harness idea is: provide only a few ints (header + nstates), but set nstates huge so the loop forces p to run past the end of the buffer and crash. If it helps your triage, I can share a tiny standalone reproducer plus an AddressSanitizer trace that clearly shows the out-of-bounds read |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi maintainers 👋
I couldn’t find a dedicated security contact (e.g.,
SECURITY.mdor a private reporting address) in this repository, so I’m posting this as a public GitHub Discussion.Tone/intent: I’m trying to be careful with details because this looks like a memory-safety issue in the C++ runtime. I’m sharing enough to enable triage and reproduce safely, while avoiding “drop-in” exploitation guidance. If you prefer a private channel, please point me to it and I’ll immediately move the full technical PoC/inputs there.
Summary (what I think is happening)
In the C++ runtime,
ATNDeserializer::deserialize(...)appears to read from aSerializedATNViewusing an indexp(e.g.,data[p++]) without consistently verifying thatpremains within bounds of the backing buffer.The deserialization logic is driven by counts embedded in the serialized data itself (e.g.,
nstates,nedges, etc.). If an attacker can influence the serialized ATN payload (even in a non-standard integration), they can set these counts such that the deserializer attempts to read beyond the end of the provided buffer, leading to:This matches the general pattern of CWE-125: Out-of-bounds Read.
Affected area (where to look)
File:
ATNDeserializer.cpp(C++ runtime)Function:
ATNDeserializer::deserialize(SerializedATNView data) constThe key pattern is:
size_t nstates = data[p++];(reads an untrusted count)for (size_t i = 0; i < nstates; i++) { ... data[p++] ... }pincrements repeatedly and reads are performed without strong bounds checks tied todata.size().(If my file path or exact lines differ due to branch/version, the above describes the logic pattern I observed.)
Why this matters (threat model)
In “classic” ANTLR usage, serialized ATN is generated by the tool and embedded in generated sources, so the payload is usually trusted.
However, real-world applications sometimes:
In those integrations, an unsafe deserializer becomes a reliability and security risk.
Even if you consider “attacker controls serialized ATN” to be out-of-scope, I still think hardening is warranted because the fix is localized and improves robustness (and it may help fuzzing / defensive posture overall).
Reproduction (high-level, safe)
This is the minimal idea to reproduce a crash (details can be provided privately):
nstatesis set to a very large value.ATNDeserializer::deserialize(...)with thisSerializedATNView.data[p++]until it crosses the buffer boundary, eventually causing an invalid read and crash.Suggested remediation / hardening options
Any of the following would address the core issue:
Centralized bounds checking in
SerializedATNView::operator[]index < size(and fails fast with a controlled exception/error).Explicit checks in
ATNDeserializerbefore each readif (p >= data.size()) throw ...;readInt()/readShort()that trackpand validate bounds.Add “remaining length” validation for sections driven by counts
nstateselements, validate that enough buffer remains for at least the minimal representation of each item (even if formats differ).The goal would be to turn “crash on malformed input” into a controlled parse failure.
Requested next steps
Thanks for taking a look
Beta Was this translation helpful? Give feedback.
All reactions