feat(python-sdk): add extension instrumentation#2096
Conversation
Signed-off-by: Tomas Pilar <thomas7pilar@gmail.com>
Summary of ChangesHello @pilartomas, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the observability of the Python SDK by introducing OpenTelemetry-based instrumentation, specifically targeting the A2A extension components. It provides a robust framework for tracing functions and class methods, allowing for automatic generation of spans that capture the flow and performance of operations within agent extensions. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces OpenTelemetry-based instrumentation for extensions, which is a great addition for observability. The core of this is the new agentstack_sdk/util/telemetry.py file, which provides trace_function and trace_class decorators.
My review focuses on the correctness and robustness of these new utilities. I've identified a couple of important issues:
- A bug in how
asyncio.CancelledErroris handled, which could lead to incomplete trace data. - An issue with tracing methods decorated with
@asynccontextmanager, likelifespan, which would result in incorrect trace spans.
I've provided suggestions to fix these issues. Additionally, I've pointed out a couple of areas for improvement in code quality. Overall, this is a solid feature, and with these adjustments, it will be even more robust.
|
|
||
| def __init_subclass__(cls, **kwargs): | ||
| super().__init_subclass__(**kwargs) | ||
| trace_class(kind=SpanKind.SERVER)(cls) |
There was a problem hiding this comment.
The trace_class decorator is being applied to BaseExtensionServer, which includes the lifespan method decorated with @asynccontextmanager. The current trace_function implementation does not correctly handle async context managers, which will lead to incorrect tracing (the span will only cover the creation of the context manager, not its execution). To prevent issues, you should exclude lifespan from tracing. It's also good practice to exclude internal methods like _fork.
| trace_class(kind=SpanKind.SERVER)(cls) | |
| trace_class(kind=SpanKind.SERVER, exclude_list=["lifespan", "_fork"])(cls) |
| except asyncio.CancelledError as ce: | ||
| exception = None | ||
| span.record_exception(ce) | ||
| raise |
There was a problem hiding this comment.
The attribute_extractor will not receive the asyncio.CancelledError exception because exception is explicitly set to None. This can lead to incomplete telemetry data. The exception should be captured and passed to the extractor.
| except asyncio.CancelledError as ce: | |
| exception = None | |
| span.record_exception(ce) | |
| raise | |
| except asyncio.CancelledError as ce: | |
| exception = ce | |
| span.record_exception(ce) | |
| raise |
|
|
||
| from agentstack_sdk.util.telemetry import trace_class | ||
|
|
||
| tracer = trace.get_tracer(__name__) |
| @functools.wraps(func) | ||
| async def async_wrapper(*args, **kwargs) -> Any: | ||
| """Async Wrapper for the decorator.""" | ||
| tracer = trace.get_tracer(INSTRUMENTING_MODULE_NAME, INSTRUMENTING_MODULE_VERSION) |
There was a problem hiding this comment.
Getting the tracer on every function call is inefficient and leads to code duplication with the sync_wrapper (line 126). The opentelemetry.trace.get_tracer call is idempotent, but best practice is to call it once per module. Consider getting the tracer once, either at the module level or within trace_function but outside the async_wrapper and sync_wrapper definitions.
Signed-off-by: Tomas Pilar thomas7pilar@gmail.com
Summary
Linked Issues
closes #1975
Documentation
If this PR adds new feature or changes existing. Make sure documentation is adjusted accordingly. If the docs is not needed, please explain why.