Skip to content

Conversation

@Mojachieee
Copy link
Contributor

@Mojachieee Mojachieee commented Aug 7, 2025

fixes #7007

@codecov
Copy link

codecov bot commented Aug 7, 2025

Codecov Report

❌ Patch coverage is 90.55118% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.9%. Comparing base (eda888f) to head (980b74b).
⚠️ Report is 140 commits behind head on main.

Files with missing lines Patch % Lines
exporters/otlp/otlptrace/otlptracegrpc/client.go 88.2% 8 Missing and 4 partials ⚠️
Additional details and impacted files

Impacted file tree graph

@@          Coverage Diff           @@
##            main   #7142    +/-   ##
======================================
  Coverage   82.9%   82.9%            
======================================
  Files        264     265     +1     
  Lines      24628   24754   +126     
======================================
+ Hits       20423   20535   +112     
- Misses      3822    3832    +10     
- Partials     383     387     +4     
Files with missing lines Coverage Δ
...rters/otlp/otlptrace/otlptracegrpc/internal/x/x.go 100.0% <100.0%> (ø)
exporters/otlp/otlptrace/otlptracegrpc/client.go 89.9% <88.2%> (-1.2%) ⬇️

... and 2 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@flc1125
Copy link
Member

flc1125 commented Aug 7, 2025

@flc1125
Copy link
Member

flc1125 commented Aug 7, 2025

The following indicators are missing some attributes (including unit tests):

https://github.com/open-telemetry/semantic-conventions/blob/v1.36.0/docs/otel/sdk-metrics.md#metric-otelsdkexporterspaninflight

  • otel.sdk.exporter.span.inflight
    • server.address
    • server.port
  • otel.sdk.exporter.span.exported
    • server.address
    • server.port
  • otel.sdk.exporter.operation.duration
    • server.address
    • server.port
    • rpc.grpc.status_code

@flc1125 flc1125 requested a review from Copilot August 7, 2025 15:17
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds experimental self-observability metrics to the OTLP trace gRPC exporter. When enabled via the OTEL_GO_X_SELF_OBSERVABILITY environment variable, the exporter will emit metrics tracking span export operations including in-flight spans, exported spans count, and operation duration.

  • Introduces an experimental feature flag system for enabling self-observability
  • Adds metric instrumentation to track span export operations in the gRPC exporter
  • Updates module dependencies to include required metric packages

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
exporters/otlp/otlptrace/otlptracegrpc/internal/x/x.go Implements experimental feature flag system for self-observability
exporters/otlp/otlptrace/otlptracegrpc/internal/x/x_test.go Tests for the experimental feature flag functionality
exporters/otlp/otlptrace/otlptracegrpc/client.go Adds metric instrumentation to the gRPC client for tracking export operations
exporters/otlp/otlptrace/otlptracegrpc/client_test.go Comprehensive tests for self-observability metrics functionality
exporters/otlp/otlptrace/otlptracegrpc/go.mod Updates dependencies to include metric packages
CHANGELOG.md Documents the new experimental feature

@Mojachieee
Copy link
Contributor Author

The following indicators are missing some attributes (including unit tests):

https://github.com/open-telemetry/semantic-conventions/blob/v1.36.0/docs/otel/sdk-metrics.md#metric-otelsdkexporterspaninflight

  • otel.sdk.exporter.span.inflight

    • server.address
    • server.port
  • otel.sdk.exporter.span.exported

    • server.address
    • server.port
  • otel.sdk.exporter.operation.duration

    • server.address
    • server.port
    • rpc.grpc.status_code

These are all added now

@Mojachieee Mojachieee requested a review from flc1125 August 8, 2025 12:26
@Mojachieee Mojachieee requested a review from flc1125 August 11, 2025 09:35
Copy link
Member

@flc1125 flc1125 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last remaining issue.

Also, please resolve the conflicts.

Finally: Thank you for your contribution.

Comment on lines +381 to +383
m := mp.Meter("go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc",
metric.WithInstrumentationVersion(sdk.Version()),
metric.WithSchemaURL(semconv.SchemaURL))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
m := mp.Meter("go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc",
metric.WithInstrumentationVersion(sdk.Version()),
metric.WithSchemaURL(semconv.SchemaURL))
m := mp.Meter(
"go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc",
metric.WithInstrumentationVersion(sdk.Version()),
metric.WithSchemaURL(semconv.SchemaURL),
)

Comment on lines +95 to +96
c.initSelfObservability()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flattening initSelfObservability seems appropriate. This is the only call site and this function is scoped to setup a new client, which includes telemetry.


defer func() {
duration := time.Since(start)
durationAttrs := make([]attribute.KeyValue, 0, len(c.selfObservabilityAttrs)+2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is allocated every call. A pool should be used to amortize the slice allocation.

Comment on lines +215 to +216
for _, ss := range ps.ScopeSpans {
spanCount += len(ss.Spans)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handle nil values.

Suggested change
for _, ss := range ps.ScopeSpans {
spanCount += len(ss.Spans)
for _, ss := range ps.GetScopeSpans() {
spanCount += len(ss.GetSpans())

Comment on lines +223 to +236
durationAttrs := make([]attribute.KeyValue, 0, len(c.selfObservabilityAttrs)+2)
durationAttrs = append(durationAttrs, c.selfObservabilityAttrs...)
durationAttrs = append(durationAttrs,
c.operationDurationMetric.AttrRPCGRPCStatusCode(otelconv.RPCGRPCStatusCodeAttr(status.Code(err))))

exportedAttrs := make([]attribute.KeyValue, 0, len(c.selfObservabilityAttrs)+1)
exportedAttrs = append(exportedAttrs, c.selfObservabilityAttrs...)

if err != nil {
// Try to extract the underlying gRPC status error, if there is one
rootErr := err
if s, ok := status.FromError(err); ok {
rootErr = s.Err()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exact allocations can be made here for the cost of a few more branches which is worth it.

Suggested change
durationAttrs := make([]attribute.KeyValue, 0, len(c.selfObservabilityAttrs)+2)
durationAttrs = append(durationAttrs, c.selfObservabilityAttrs...)
durationAttrs = append(durationAttrs,
c.operationDurationMetric.AttrRPCGRPCStatusCode(otelconv.RPCGRPCStatusCodeAttr(status.Code(err))))
exportedAttrs := make([]attribute.KeyValue, 0, len(c.selfObservabilityAttrs)+1)
exportedAttrs = append(exportedAttrs, c.selfObservabilityAttrs...)
if err != nil {
// Try to extract the underlying gRPC status error, if there is one
rootErr := err
if s, ok := status.FromError(err); ok {
rootErr = s.Err()
}
rootErr := err
// Extract the underlying gRPC status error, if there is one.
if s, ok := status.FromError(err); ok {
rootErr = s.Err()
}
n := len(c.selfObservabilityAttrs)
var durationAttrs, exportedAttrs []attribute.KeyValue
if rootErr != nil {
durationAttrs = make([]attribute.KeyValue, n, n+2)
exportedAttrs = make([]attribute.KeyValue, n, n+1)
} else {
durationAttrs = make([]attribute.KeyValue, n, n+1)
exportedAttrs = make([]attribute.KeyValue, n, n)
}
_ = copy(durationAttrs, c.selfObservabilityAttrs)
scAttr := c.operationDurationMetric.AttrRPCGRPCStatusCode(otelconv.RPCGRPCStatusCodeAttr(status.Code(err)))
durationAttrs = append(durationAttrs, scAttr)
_ = copy(exportedAttrs, c.selfObservabilityAttrs)
if err != nil {

Comment on lines +445 to +446
// nextExporterID returns a new unique ID for an exporter.
// the starting value is 0, and it increments by 1 for each call.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change
// nextExporterID returns a new unique ID for an exporter.
// the starting value is 0, and it increments by 1 for each call.
// nextExporterID returns a monotonically increasing int64 starting at 0

DataPoints: []metricdata.HistogramDataPoint[float64]{
{
Attributes: attribute.NewSet(
semconv.OTelComponentName("otlp_grpc_span_exporter/1"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This relies on test execution order. It is brittle and will break when test are run in parallel or new cases are added. The generator needs to be reset per test case or this needs to not be evaluated as strictly.

Comment on lines +655 to +657
if tt.enabled {
t.Setenv("OTEL_GO_X_SELF_OBSERVABILITY", "true")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two test cases and this conditional splits them. They should be made into their own tests to just remove the complexity being added to accommodate everything here.

}

original := otel.GetMeterProvider()
defer otel.SetMeterProvider(original)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
defer otel.SetMeterProvider(original)
t.Cleanup(func() { otel.SetMeterProvider(original) })

}
}

func Test_getServerAttrs(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func Test_getServerAttrs(t *testing.T) {
func TestGetServerAttrs(t *testing.T) {

@pellared
Copy link
Member

pellared commented Sep 1, 2025

@Mojachieee PTAL #7272

@flc1125
Copy link
Member

flc1125 commented Sep 11, 2025

Hi, since this process involves specification adjustments and historical review records (which may contain invalid review suggestions), it’s impossible to tell which items need attention amid the large volume of information.

Could we create a new PR based on the current branch before preparing for the review, so that subsequent reviews can proceed more smoothly?

Thanks~

@MrAlias MrAlias added this to the v1.39.0 milestone Sep 15, 2025
@MrAlias
Copy link
Contributor

MrAlias commented Sep 15, 2025

@Mojachieee please take a look at #7007. I have updated the issue with a check list of items from our project Observability guidelines that need to be completed in this PR.

As @flc1125 pointed out, you may find it easier to open a new PR. But feel free to update this PR as well. If you plan to update this PR, please mark it as a draft while you are working to address the items listed in #7007 and the Observability guidelines.

@Mojachieee Mojachieee closed this Sep 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Trace SDK observability - otlptracegrpc exporter metrics

4 participants