Skip to content

fix(monitoring): track real MetricsPort in state, expose monitoring via inspect#199

Merged
kuny0707 merged 1 commit into
tronprotocol:developfrom
warku123:fix/txgen-falcon-resource-leak
Jun 18, 2026
Merged

fix(monitoring): track real MetricsPort in state, expose monitoring via inspect#199
kuny0707 merged 1 commit into
tronprotocol:developfrom
warku123:fix/txgen-falcon-resource-leak

Conversation

@warku123

@warku123 warku123 commented Jun 18, 2026

Copy link
Copy Markdown

What does this PR do?

Two monitoring follow-ups identified during review of PR #185:

  1. Fix hardcoded metrics port in network add. reloadNetworkMonitoring was hardcoding :9527 as the scrape target when rebuilding prometheus.yml for a newly added node. If the user overrode ports.metrics in the intent, the reloaded config would point at the wrong port and Prometheus would never scrape the new node — silently empty dashboards. This PR stores MetricsPort in ManagedNode state (same pattern as HTTPPort/GRPCPort/P2PPort) and adds a metricsPort() helper that falls back to 9527 for legacy state entries.

  2. Expose monitoring stack via trond inspect. When a node was deployed with --monitor, inspect output now includes an optional monitoring block (enabled, prometheus_port, grafana_port) so agents can discover the Prometheus/Grafana stack without re-parsing the apply result. Host is implicit — 127.0.0.1 for local targets, target.host for SSH — so only ports are needed.

  3. Schema bump. inspect.schema.json gains the monitoring object (additive optional field). SchemaVersion bumped 1.12.1 → 1.12.2 (PATCH), following the same convention used in feat(status): emit genesis_block_id (chain identity fingerprint) #197 which makes an identical class of change.

  4. txgen Falcon resource leak fix (included on this branch, same module boundary).

Test plan

Extra details

Follow-up to PR #185 review feedback: "have trond inspect surface the Prometheus/Grafana endpoints so agents can discover them without parsing the apply result." The MetricsPort fix was a bug discovered while implementing this — reloadNetworkMonitoring was hardcoding 9527, which silently broke when a user customised ports.metrics in their intent.

- Store MetricsPort in ManagedNode state alongside other ports so
  network add's Prometheus scrape-target rebuild uses the node's actual
  metrics port instead of hardcoding 9527. Broken when the user
  overrode ports.metrics in the intent — reloadNetworkMonitoring would
  write a wrong target address and Prometheus would scrape nothing.
- Add metricsPort() helper (state-backed, falls back to 9527 for
  legacy nodes) and wire into reloadNetworkMonitoring.
- Surface monitoring stack endpoints via `trond inspect`: when
  Monitoring is enabled, output gains an optional `monitoring` block
  with enabled + prometheus_port + grafana_port. Agents can discover
  the stack without re-parsing apply output. Text-mode inspect also
  prints the ports.
- Update inspect.schema.json (both embedded + public copies) with the
  monitoring object (additive optional → PATCH).
- Bump SchemaVersion 1.12.0 → 1.12.2 and re-snapshot baseline.
@warku123 warku123 force-pushed the fix/txgen-falcon-resource-leak branch from 83ea045 to 831c12d Compare June 18, 2026 09:01
@kuny0707 kuny0707 merged commit 83964ec into tronprotocol:develop Jun 18, 2026
13 checks passed
barbatos2011 pushed a commit to barbatos2011/tron-deployment that referenced this pull request Jun 18, 2026
…issed by the first pass

A doc-consistency audit found the monitoring stack was undocumented for
agents: tronprotocol#199 (inspect `monitoring` block + `metrics_port` tracking) merged
after this PR opened, and AGENTS.md had zero monitoring coverage at all.

- AGENTS.md: add a `monitoring` bullet to the machine-observable rig-state
  list (enabled / prometheus_port / grafana_port via status/inspect).
- CHANGELOG: add the tronprotocol#199 entry under the agent-integration arc.
- README: note that status/inspect surface the stack's ports.

Audit also confirmed (no change needed): SchemaVersion is consistent
(const = baseline = 1.12.2), and every -o json command has a schema
(TestSchemaCoverage green).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants