Skip to content

Conversation

@shreealt
Copy link
Contributor

@shreealt shreealt commented Nov 1, 2025

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #7324

Release Notes: Yes/No

Signed-off-by: Shreemaan Abhishek <[email protected]>
Signed-off-by: Shreemaan Abhishek <[email protected]>
Signed-off-by: Shreemaan Abhishek <[email protected]>
Signed-off-by: Shreemaan Abhishek <[email protected]>
Signed-off-by: Shreemaan Abhishek <[email protected]>
Signed-off-by: Shreemaan Abhishek <[email protected]>
Signed-off-by: Shreemaan Abhishek <[email protected]>
Signed-off-by: Shreemaan Abhishek <[email protected]>
@shreealt shreealt requested a review from a team as a code owner November 1, 2025 07:33
}

// EnvoyGatewayTraces defines control plane tracing configurations.
type EnvoyGatewayTraces struct {
Copy link
Member

@zirain zirain Nov 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In EG, we use tracing in other place, better to keep consistency.

type RateLimitTracing struct {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't understand your point.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the name should be EnvoyGatewayTracing instead of EnvoyGatewayTraces.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Signed-off-by: Shreemaan Abhishek <[email protected]>
Signed-off-by: Shreemaan Abhishek <[email protected]>
Signed-off-by: Shreemaan Abhishek <[email protected]>
Signed-off-by: Shreemaan Abhishek <[email protected]>
Signed-off-by: Shreemaan Abhishek <[email protected]>
Signed-off-by: Shreemaan Abhishek <[email protected]>
@codecov
Copy link

codecov bot commented Nov 2, 2025

Codecov Report

❌ Patch coverage is 45.85366% with 111 lines in your changes missing coverage. Please review.
✅ Project coverage is 71.11%. Comparing base (c29a629) to head (9836161).
⚠️ Report is 20 commits behind head on main.

Files with missing lines Patch % Lines
internal/gatewayapi/runner/runner.go 2.63% 36 Missing and 1 partial ⚠️
internal/provider/kubernetes/controller.go 43.47% 26 Missing ⚠️
internal/traces/register.go 20.00% 19 Missing and 1 partial ⚠️
internal/gatewayapi/resource/resource.go 50.00% 4 Missing and 4 partials ⚠️
internal/message/types.go 55.55% 4 Missing and 4 partials ⚠️
internal/xds/runner/runner.go 79.16% 3 Missing and 2 partials ⚠️
internal/admin/console/api.go 25.00% 2 Missing and 1 partial ⚠️
internal/cmd/server.go 0.00% 2 Missing ⚠️
internal/logging/log.go 81.81% 1 Missing and 1 partial ⚠️

❌ Your patch check has failed because the patch coverage (45.85%) is below the target coverage (60.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7403      +/-   ##
==========================================
- Coverage   72.26%   71.11%   -1.16%     
==========================================
  Files         231      275      +44     
  Lines       34084    34940     +856     
==========================================
+ Hits        24632    24848     +216     
- Misses       7676     8296     +620     
- Partials     1776     1796      +20     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@arkodg
Copy link
Contributor

arkodg commented Nov 3, 2025

thanks for working on this @shreealt

is it possible to break this feature up into 2

  • the trace/span plumbing that can be implemented and trace ids be logged with current logger/stdout, that can be muted using log levels ( INFO/ERROR), this can be part of v1.6
  • in v1.7 we can add the API to export the traces and sample it

parentCtx = update.Value.Context
}

_, span := tracer.Start(parentCtx, "XdsRunner.subscribeAndTranslate")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add another child span to r.cache.GenerateNewSnapshot and use the trace id as the version inside it, so this id trickles into envoy proxy

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, but I hope this doesn't come off as a surprise to users?

Signed-off-by: Shreemaan Abhishek <[email protected]>
Signed-off-by: Shreemaan Abhishek <[email protected]>
Signed-off-by: Shreemaan Abhishek <[email protected]>
@shreealt shreealt requested review from arkodg and zirain November 15, 2025 08:51
r.Logger.Info("received an update", "key", update.Key)
val := update.Value
func(update message.Update[string, *resource.ControllerResourcesContext], errChan chan error) {
parentCtx := context.Background()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this defensive code needed ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not? 🤔

is it guaranteed that these values will not be nil?

// so when a delete is triggered, delete all keys
if update.Delete || val == nil {
if update.Delete || valWrapper == nil || valWrapper.Resources == nil {
span.AddEvent("delete_all_keys")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not a useful span
instead can we create two child spans

  • one to track translate time
  • one to track statusUpdate


message.HandleSubscription(message.Metadata{Runner: r.Name(), Message: message.XDSIRMessageName}, c,
func(update message.Update[string, *ir.Xds], errChan chan error) {
func(update message.Update[string, *message.XdsIRWithContext], errChan chan error) {
Copy link
Contributor

@arkodg arkodg Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar to the gateway-api runner can we add a parent span and 2 child span for translate & updateSnapshot

var Hash = cachev3.IDHash{}
var (
Hash = cachev3.IDHash{}
tracer = otel.Tracer("envoy-gateway/gateway-api")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tracer = otel.Tracer("envoy-gateway/gateway-api")
tracer = otel.Tracer("envoy-gateway/xds")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also does the envoy-gateway prefix exist since thats the idiomatic way to do things ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, pretty much that.

defer s.mu.Unlock()

version := s.newSnapshotVersion()
_, span := tracer.Start(ctx, "SnapshotCache.GenerateNewSnapshot")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we keep things consistent and add spans in runner instead of the within the func ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's cleaner this way? 🤔

there are two usages of this function in runner.go, so the current way is better than wrapping GenerateNewSnapshot in two places?

defaultMaxConnectionAgeGrace = 2 * time.Minute
)

var tracer = otel.Tracer("envoy-gateway/gateway-api")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
var tracer = otel.Tracer("envoy-gateway/gateway-api")
var tracer = otel.Tracer("envoy-gateway/xds")

traceLogger := r.Logger.WithTrace(parentCtx)
traceLogger.Info("received an update")

_, span := tracer.Start(parentCtx, "XdsRunner.subscribeAndTranslate")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add 2 child spans, one for translate and one for update snapshot

enabled: false
opentelemetry-collector:
enabled: true
mode: deployment
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you undo the changes in test

@arkodg
Copy link
Contributor

arkodg commented Nov 18, 2025

hey @shreealt left some comments, once those are addressed, can you also share before/after logs, tia !

@arkodg
Copy link
Contributor

arkodg commented Nov 20, 2025

@shreealt we'll also need Equal functions for the new message structure, to avoid comparing Context and only compare the Data

Signed-off-by: Shreemaan Abhishek <[email protected]>
@shreealt
Copy link
Contributor Author

/retest

Signed-off-by: Shreemaan Abhishek <[email protected]>
emptyClusterName = "EmptyCluster"
)

var tracer = otel.Tracer("envoy-gateway/xds/translator")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets not add any Otel things to the lib, but in the runner please, in case these libs are made public and used purely for translation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Signed-off-by: Shreemaan Abhishek <[email protected]>
Signed-off-by: Shreemaan Abhishek <[email protected]>
Signed-off-by: Shreemaan Abhishek <[email protected]>
@shreealt shreealt requested a review from arkodg November 23, 2025 08:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for watchable message tracing

3 participants