-
Notifications
You must be signed in to change notification settings - Fork 162
feat(csharp/src/Drivers/Databricks): Implement telemetry tag definitions (phase 1) #3653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat(csharp/src/Drivers/Databricks): Implement telemetry tag definitions (phase 1) #3653
Conversation
|
please make sure to have good PR descriptions |
| public TelemetryTagAttribute(string tagName) | ||
| { | ||
| TagName = tagName ?? throw new ArgumentNullException(nameof(tagName)); | ||
| ExportScope = TagExportScope.ExportAll; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this used? Why hardcode to ExportAll?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
defaulted to export local as discussed offline
| public const string DriverRuntime = "driver.runtime"; | ||
|
|
||
| // Feature Flags | ||
| [TelemetryTag("feature.cloudfetch", ExportScope = TagExportScope.ExportDatabricks, Description = "CloudFetch enabled")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these only for Export to Databricks? Not local?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated to export all
| /// <summary> | ||
| /// Gets tags allowed for Databricks export (privacy whitelist). | ||
| /// </summary> | ||
| // TODO: Explore alternate approaches to avoid maintaining separate GetDatabricksExportTags methods in each event class. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a todo comment to explore alternate approaches to GetDatabricksExportTags method (other than reflection), or changing the flow based on how we use this method (in the next phases)
CurtHagenlocher
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I have some questions and recommendations for changes.
|
|
||
| namespace Apache.Arrow.Adbc.Drivers.Databricks.Telemetry.TagDefinitions | ||
| { | ||
| public static class TelemetryTagRegistry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| public static class TelemetryTagRegistry | |
| internal static class TelemetryTagRegistry |
This is only used from inside the driver, right?
| /// <summary> | ||
| /// Checks if a tag should be exported to Databricks. | ||
| /// </summary> | ||
| public static bool ShouldExportToDatabricks(TelemetryEventType eventType, string tagName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While it's difficult to get a sense for how this code will be used before it's integrated with the rest of the code, it looks like this is probably the primary entry point to the list of tags. If I say something like ShouldExportToDatabricks(TelemetryEventType.Error, "tag1") || ShouldExportToDatabricks(TelemetryEventType.Error, "tag2") then the current code will end up instantiating the same HashSet<string> twice -- once per call to ShouldExportToDatabricks. If this is only called fairly rarely, then maybe that's not a big deal. But if this is going to be called fairly frequently then it probably makes more sense to cache the HashSet<string> as a static inside each class rather than instantiating a new one each time.
| /// Defines the types of telemetry events that can be emitted by the driver. | ||
| /// Each event type has its own set of allowed tags defined in corresponding *Event classes. | ||
| /// </summary> | ||
| public enum TelemetryEventType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| public enum TelemetryEventType | |
| public enum TelemetryEventType |
This is only used from inside the driver, right? If it's not already that way, we can use InternalsVisibleTo to make it available to tests.
This PR implements Phase 1 of telemetry tag definitions (telemetry-activity-based-design.md) for the C# Databricks ADBC driver, establishing structured telemetry collection with privacy-aware export controls.
Changes
Introduces a declarative tag definition system across three telemetry event types:
1. Connection Events - Captures connection lifecycle and driver configuration
2. Statement Execution Events - Tracks query performance and resource usage
3. Error Events - Records error patterns and retry behavior
Export Scope Design
Tags use
TagExportScopeenum to control export destinations:ExportLocal- Sensitive data only (SQL, errors, addresses)ExportDatabricks- Only to databricks serviceExportAll- Non-sensitive operational metrics (both local & Databricks)TelemetryTagRegistryprovides centralized privacy filtering viaGetDatabricksExportTags()andShouldExportToDatabricks().Design Note: Added TODO comment acknowledging the current approach duplicates export scope between attributes and whitelist methods.
Testing
TelemetryTagRegistryTests)Next Steps
Future phases will add ActivityListener, metrics aggregator, and exporter components.