Skip to content

Conversation

@sukumargaonkar
Copy link
Contributor

@sukumargaonkar sukumargaonkar commented Nov 13, 2025

Description
The SSE event spec supports 3 different delimiters (pair of CRLF / LF / CR). This PR updates the stream processing translator for GCP VertexAI to support all 3 delimiters.

SSE docs about delimiters

Note: The docs define an event as event = *( comment / field ) end-of-line
end-of-line is defined as a single CRLF or LF or CR (not a pair)
But the comment and field definitions also end in end-of-line implying that an event always ends in a pair of CRLR / CR / LF

@codecov-commenter
Copy link

codecov-commenter commented Nov 13, 2025

Codecov Report

❌ Patch coverage is 96.42857% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 84.16%. Comparing base (eb7ded6) to head (52781e5).

Files with missing lines Patch % Lines
internal/translator/openai_gcpvertexai.go 96.42% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1536      +/-   ##
==========================================
+ Coverage   84.14%   84.16%   +0.02%     
==========================================
  Files         149      149              
  Lines       12971    12992      +21     
==========================================
+ Hits        10914    10935      +21     
  Misses       1436     1436              
  Partials      621      621              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

}
lines := bytes.Split(bodyBytes, []byte("\n\n"))
bodyReader := io.MultiReader(bytes.NewReader(o.bufferedBody), body)
scanner := bufio.NewScanner(bodyReader)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When implementing the mcp sse parser, I got issues with an initial implementation that used the Scanner; for some servers, I got long lines and the parsing failed with bufio.ErrTooLong (bufio.Scanner: token too long). I only caught this when using some real MCP servers.
I don't know if this can be the case as well here, but just a data point to consider not using the scanner if long lines could be expected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point.
as of go 1.25.3 the default for MaxScanTokenSize seems to be 64KB

Since each streaming chunk represents couple of generated tokens, the size of each chunk should not exceed this limit. though i couldn't find docs that guarantee that the chunk will be smaller than a specific size

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What comes from the LLM is inherently unpredictable, and I don't know if we can make that statement (for example, there are some tests that use chunks that have tool calls, and the length of the arguments could be unpredictable...

I agree it is unlikely to happen, but my main concern is that we are introducing a limitation we didn't have before, and I don't want this to hit us in the form of a bug.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two approaches i can think of

  1. make the MaxScanTokenSize configurable
  2. follow the mcp sse parser logic to explicitly parse the chunks

I personally prefer option 1 as relying on scanner to do the parsing aspect is simpler and easier maintain

what do you think @nacx ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about a third option: in the previous implementation, we're loading everything in memory and doing a bytes.Split. Why don't just keep that, but use the right function to split? I mean, test for the presence of the delimiters to figure out the "split characters", then keep the original code just splitting using those.

This way we don't change the behaviour, keep it backwards-compatible, and we don't introduce a new config option that users won't know what value to set confidently and how to set it. Also, if we find loading everything in memory to be an issue (it hasn't been till now), then we can address the stream reader approach, but probably it's not needed for this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated implementation in 52781e5

another round of review @nacx ?

@nacx nacx self-assigned this Nov 13, 2025
@sukumargaonkar sukumargaonkar marked this pull request as ready for review November 13, 2025 15:34
@sukumargaonkar sukumargaonkar requested a review from a team as a code owner November 13, 2025 15:34
@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Nov 13, 2025
@sukumargaonkar
Copy link
Contributor Author

unsure about test failures

github.com/envoyproxy/ai-gateway/tests/e2e-upgrade

timed out waiting for the condition on pods/envoy-default-upgrade-test-bd44e180-cd5d859c6-2rj22
    upgrade_test.go:146: 
        	Error Trace:	/home/runner/work/ai-gateway/ai-gateway/tests/e2e-upgrade/upgrade_test.go:146
        	Error:      	Received unexpected error:
        	            	[before upgrade (requests made: 0, current pods: envoy-default-upgrade-test-bd44e180-6d66b8f5d5-7q68j(Running), envoy-default-upgrade-test-bd44e180-cd5d859c6-2rj22(Running))] unexpected status code 500: body=
        	Test:       	TestUpgrade/rolling_out_pods

github.com/envoyproxy/ai-gateway/tests/e2e-inference-extension

error: error executing jsonpath "'{.items[0].spec.initContainers[*].name} {.items[0].spec.containers[*].name}'": Error executing template: array index out of bounds: index 0, length 0. Printing more information for debugging the template:
	template was:
		'{.items[0].spec.initContainers[*].name} {.items[0].spec.containers[*].name}'
	object given to jsonpath engine was:
		map[string]interface {}{"apiVersion":"v1", "items":[]interface {}{}, "kind":"List", "metadata":map[string]interface {}{"resourceVersion":""}}

@nacx
Copy link
Contributor

nacx commented Nov 18, 2025

/retest

Signed-off-by: Sukumar Gaonkar <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants