enhancement(panoramic): merge ground-truth into panoramic#1379
enhancement(panoramic): merge ground-truth into panoramic#1379
Conversation
Merge the ground-truth correctness test runner into panoramic so both test types (integration and correctness) share a single binary with TUI support, structured output, and parallel execution. - Add correctness module to panoramic with config, runner, sync, and analysis submodules (moved from ground-truth) - Add DiscoveredTest enum that dispatches to integration or correctness runner based on config schema detection - Support multiple -d flags to discover tests from multiple directories - Add 20-minute default timeout for correctness tests - Update Makefile targets (test-correctness, test-correctness-case) - Update GitLab CI to use panoramic for correctness tests - Rename GROUND_TRUTH_* env vars to PANORAMIC_* across CI and airlock - Remove ground-truth crate from workspace - Update docs, README, and PR title scope allowlist Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Binary Size Analysis (Agent Data Plane)Target: 4dd143e (baseline) vs 77ad79f (comparison) diff
|
| Module | File Size | Symbols |
|---|---|---|
anon.4f8fd67d74ae1f1600187cfeb0121be9.1.llvm.8327523549320936916 |
+129 B | 1 |
anon.4f8fd67d74ae1f1600187cfeb0121be9.1.llvm.4033737582525021100 |
-129 B | 1 |
anon.4f8fd67d74ae1f1600187cfeb0121be9.4.llvm.8327523549320936916 |
+114 B | 1 |
anon.4f8fd67d74ae1f1600187cfeb0121be9.4.llvm.4033737582525021100 |
-114 B | 1 |
anon.4f8fd67d74ae1f1600187cfeb0121be9.3.llvm.8327523549320936916 |
+108 B | 1 |
anon.4f8fd67d74ae1f1600187cfeb0121be9.3.llvm.4033737582525021100 |
-108 B | 1 |
anon.4f8fd67d74ae1f1600187cfeb0121be9.0.llvm.8327523549320936916 |
+96 B | 1 |
anon.4f8fd67d74ae1f1600187cfeb0121be9.0.llvm.4033737582525021100 |
-96 B | 1 |
anon.4f8fd67d74ae1f1600187cfeb0121be9.2.llvm.8327523549320936916 |
+94 B | 1 |
anon.4f8fd67d74ae1f1600187cfeb0121be9.2.llvm.4033737582525021100 |
-94 B | 1 |
Detailed Symbol Changes
FILE SIZE VM SIZE
-------------- --------------
[NEW] +129 [NEW] +40 anon.4f8fd67d74ae1f1600187cfeb0121be9.1.llvm.8327523549320936916
[NEW] +114 [NEW] +25 anon.4f8fd67d74ae1f1600187cfeb0121be9.4.llvm.8327523549320936916
[NEW] +108 [NEW] +19 anon.4f8fd67d74ae1f1600187cfeb0121be9.3.llvm.8327523549320936916
[NEW] +96 [NEW] +7 anon.4f8fd67d74ae1f1600187cfeb0121be9.0.llvm.8327523549320936916
[NEW] +94 [NEW] +5 anon.4f8fd67d74ae1f1600187cfeb0121be9.2.llvm.8327523549320936916
[DEL] -94 [DEL] -5 anon.4f8fd67d74ae1f1600187cfeb0121be9.2.llvm.4033737582525021100
[DEL] -96 [DEL] -7 anon.4f8fd67d74ae1f1600187cfeb0121be9.0.llvm.4033737582525021100
[DEL] -108 [DEL] -19 anon.4f8fd67d74ae1f1600187cfeb0121be9.3.llvm.4033737582525021100
[DEL] -114 [DEL] -25 anon.4f8fd67d74ae1f1600187cfeb0121be9.4.llvm.4033737582525021100
[DEL] -129 [DEL] -40 anon.4f8fd67d74ae1f1600187cfeb0121be9.1.llvm.4033737582525021100
[ = ] 0 [ = ] 0 TOTAL
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…oth parse errors Integration tests handle their own timeout internally via the assertion runner, so the outer timeout is only needed for correctness tests which have no built-in timeout. Also capture both integration and correctness parse errors in try_load_test so the user sees why each schema failed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Regression Detector (Agent Data Plane)Regression Detector ResultsRun ID: 061fb133-03a0-44f9-bf71-7d5bb80f1514 Baseline: 4dd143e Optimization Goals: ✅ No significant changes detected
|
| perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
|---|---|---|---|---|---|---|
| ➖ | otlp_ingest_logs_5mb_throughput | ingress throughput | -0.01 | [-0.13, +0.11] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_logs_5mb_memory | memory utilization | -0.84 | [-1.32, -0.36] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_logs_5mb_cpu | % cpu utilization | -2.32 | [-7.02, +2.38] | 1 | (metrics) (profiles) (logs) |
Fine details of change detection per experiment
| perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
|---|---|---|---|---|---|---|
| ➖ | otlp_ingest_metrics_5mb_memory | memory utilization | +3.55 | [+3.38, +3.72] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_100mb_3k_contexts_cpu | % cpu utilization | +3.18 | [-3.14, +9.50] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_5mb_cpu | % cpu utilization | +1.83 | [-0.34, +4.01] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_1mb_3k_contexts_cpu | % cpu utilization | +1.82 | [-52.42, +56.05] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_metrics_5mb_cpu | % cpu utilization | +1.14 | [-5.24, +7.52] | 1 | (metrics) (profiles) (logs) |
| ➖ | quality_gates_rss_dsd_medium | memory utilization | +0.88 | [+0.71, +1.04] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_5mb_throughput | ingress throughput | +0.39 | [+0.31, +0.47] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_ottl_filtering_5mb_memory | memory utilization | +0.39 | [+0.14, +0.64] | 1 | (metrics) (profiles) (logs) |
| ➖ | quality_gates_rss_dsd_low | memory utilization | +0.28 | [+0.11, +0.45] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_10mb_3k_contexts_memory | memory utilization | +0.26 | [+0.10, +0.41] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_500mb_3k_contexts_memory | memory utilization | +0.26 | [+0.11, +0.40] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_512kb_3k_contexts_memory | memory utilization | +0.21 | [+0.06, +0.36] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_500mb_3k_contexts_cpu | % cpu utilization | +0.20 | [-1.28, +1.67] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_100mb_3k_contexts_memory | memory utilization | +0.18 | [+0.03, +0.34] | 1 | (metrics) (profiles) (logs) |
| ➖ | quality_gates_rss_idle | memory utilization | +0.16 | [+0.12, +0.20] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_1mb_3k_contexts_memory | memory utilization | +0.12 | [-0.02, +0.27] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_ottl_transform_5mb_memory | memory utilization | +0.08 | [-0.08, +0.24] | 1 | (metrics) (profiles) (logs) |
| ➖ | quality_gates_rss_dsd_heavy | memory utilization | +0.05 | [-0.09, +0.18] | 1 | (metrics) (profiles) (logs) |
| ➖ | quality_gates_rss_dsd_ultraheavy | memory utilization | +0.02 | [-0.10, +0.14] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_metrics_5mb_throughput | ingress throughput | +0.00 | [-0.16, +0.17] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_10mb_3k_contexts_throughput | ingress throughput | +0.00 | [-0.16, +0.16] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_1mb_3k_contexts_throughput | ingress throughput | +0.00 | [-0.05, +0.06] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_512kb_3k_contexts_throughput | ingress throughput | +0.00 | [-0.05, +0.05] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_logs_5mb_throughput | ingress throughput | -0.01 | [-0.13, +0.11] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_100mb_3k_contexts_throughput | ingress throughput | -0.02 | [-0.04, +0.01] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_ottl_transform_5mb_throughput | ingress throughput | -0.09 | [-0.16, -0.02] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_5mb_memory | memory utilization | -0.58 | [-0.74, -0.41] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_ottl_transform_5mb_cpu | % cpu utilization | -0.61 | [-2.77, +1.54] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_ottl_filtering_5mb_throughput | ingress throughput | -0.63 | [-0.70, -0.55] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_logs_5mb_memory | memory utilization | -0.84 | [-1.32, -0.36] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_500mb_3k_contexts_throughput | ingress throughput | -1.27 | [-1.41, -1.14] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_logs_5mb_cpu | % cpu utilization | -2.32 | [-7.02, +2.38] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_ottl_filtering_5mb_cpu | % cpu utilization | -3.82 | [-6.15, -1.49] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_10mb_3k_contexts_cpu | % cpu utilization | -4.20 | [-34.37, +25.97] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_512kb_3k_contexts_cpu | % cpu utilization | -12.01 | [-65.81, +41.80] | 1 | (metrics) (profiles) (logs) |
Bounds Checks: ✅ Passed
| perf | experiment | bounds_check_name | replicates_passed | observed_value | links |
|---|---|---|---|---|---|
| ✅ | quality_gates_rss_dsd_heavy | memory_usage | 10/10 | 120.44MiB ≤ 140MiB | (metrics) (profiles) (logs) |
| ✅ | quality_gates_rss_dsd_low | memory_usage | 10/10 | 39.52MiB ≤ 50MiB | (metrics) (profiles) (logs) |
| ✅ | quality_gates_rss_dsd_medium | memory_usage | 10/10 | 59.75MiB ≤ 75MiB | (metrics) (profiles) (logs) |
| ✅ | quality_gates_rss_dsd_ultraheavy | memory_usage | 10/10 | 177.15MiB ≤ 200MiB | (metrics) (profiles) (logs) |
| ✅ | quality_gates_rss_idle | memory_usage | 10/10 | 26.96MiB ≤ 40MiB | (metrics) (profiles) (logs) |
Explanation
Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%
Performance changes are noted in the perf column of each table:
- ✅ = significantly better comparison variant performance
- ❌ = significantly worse comparison variant performance
- ➖ = no significant change in performance
A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".
For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:
-
Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
-
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
-
Its configuration does not mark it "erratic".
| - unit-tests-miri-linux-arm64 | ||
| - check-deny | ||
| - check-licenses | ||
| - run-ground-truth |
There was a problem hiding this comment.
This was a dangling reference, there are no other references to this job on HEAD. I had it go git spelunking and it figured out that this was renamed to the first correctness test, which has been reproduced here. We probably want to add all of the correctness tests here instead?
… analysis The .iter() and fully-qualified Span::get_meta_field() calls were a workaround for a cascading type inference error caused by a bad import, not an actual ambiguity. Now that the import is correct, the original form works fine. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… modules Match the established panoramic convention of using absolute crate:: paths for cross-module imports. This also lets the collected module stay private (mod instead of pub(crate) mod). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| pub test_dir: PathBuf, | ||
| /// path to a test cases directory (can be specified multiple times) | ||
| #[argh(option, short = 'd')] | ||
| pub test_dirs: Vec<PathBuf>, |
There was a problem hiding this comment.
Enhancement, multiple test dirs can be specified and all discovered tests run. Example invocation:
./target/release/panoramic run -d $(pwd)/test/integration/cases -d $(pwd)/test/correctness
You can also list them instead:
❯ ./target/release/panoramic list -d $(pwd)/test/integration/cases -d $(pwd)/test/correctness
2026-04-14T19:37:04.137114Z INFO Panoramic starting...
2026-04-14T19:37:04.137142Z INFO Discovering test cases from: /Users/travis.thieman/dd/saluki/test/integration/cases, /Users/travis.thieman/dd/saluki/test/correctness...
2026-04-14T19:37:04.140539Z INFO Discovered 18 test case(s).
Available tests (18):
adp-config-stream (timeout: 90s)
adp-disabled-exit (timeout: 45s)
adp-no-pipelines-exit (timeout: 45s)
adp-rar-disabled (timeout: 90s)
adp-rar-registration (timeout: 90s)
basic-startup (timeout: 60s)
dogstatsd-enabled (timeout: 60s)
dsd-origin-detection (timeout: 1200s)
dsd-plain (timeout: 1200s)
otlp-metrics (timeout: 1200s)
otlp-traces (timeout: 1200s)
otlp-traces-enabled (timeout: 60s)
otlp-traces-ets (timeout: 1200s)
otlp-traces-ottl-filtering (timeout: 1200s)
otlp-traces-ottl-transform (timeout: 1200s)
otlp-traces-probabilistic (timeout: 1200s)
telemetry-endpoint (timeout: 60s)
unprivileged-api-endpoints (timeout: 60s)
2026-04-14T19:37:04.140613Z INFO Panoramic stopped.
Expose description through DiscoveredTest instead of suppressing the dead_code warning, and restore the description display in list_tests that was lost during the DiscoveredTest refactor. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
|
||
| Ok(test_cases) | ||
| /// Try to load a test case from a config file, attempting integration (panoramic) schema first, | ||
| /// then correctness schema. |
There was a problem hiding this comment.
Schemas are disjoint so this seemed simpler than adding an explicit type discriminator, though could do that if you'd prefer
| /// A discovered test, either an integration test or a correctness test. | ||
| pub enum DiscoveredTest { | ||
| /// An integration test case (panoramic schema). | ||
| Integration(TestCase), |
There was a problem hiding this comment.
TestCase isn't a very descriptive name now and should probably be changed to IntegrationTest or something more specific to integration, but didn't want to do that as part of this already-large PR. Let me know if I should do that as a follow-up?
| pub fn timeout(&self) -> Duration { | ||
| match self { | ||
| DiscoveredTest::Integration(tc) => tc.timeout.0, | ||
| DiscoveredTest::Correctness { .. } => Duration::from_secs(20 * 60), |
There was a problem hiding this comment.
Added a default 20m timeout to correctness tests, they didn't seem to have one or a space in their schemas to set one
|
|
||
| # run a single test case | ||
| make test-correctness-dsd-plain | ||
| make test-correctness-case CASE=dsd-plain |
There was a problem hiding this comment.
Confirmed both of these make commands work as expected
…CI failure DO NOT MERGE — this intentionally breaks correctness tests by flipping the comparison operator so every matching metric is reported as mismatched. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Failing test output example (did this on purpose and will now revert) https://gitlab.ddbuild.io/DataDog/saluki/-/jobs/1595346641 |
Reverts the intentional sabotage from the previous commit. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
This is very cool
I ran a couple of things: make test-integration worked.
[*] Running ADP integration tests...
10:30:20 Panoramic starting...
10:30:20 Container logs will be written to '/var/folders/dc/w_0dw8vs1n5cx5snqzx3k9yh0000gp/T/panoramic-20260415-103020'.
10:30:20 Running 10 test(s)...
10:30:20 Starting test 'adp-config-stream'...
10:30:20 Starting test 'adp-disabled-exit'...
10:30:20 Starting test 'adp-no-pipelines-exit'...
10:30:20 Starting test 'adp-rar-disabled'...
10:30:26 PASS adp-disabled-exit (1 assertions, 5.27s)
10:30:26 Starting test 'adp-rar-registration'...
10:30:26 PASS adp-no-pipelines-exit (1 assertions, 5.38s)
10:30:26 Starting test 'basic-startup'...
10:30:36 PASS adp-rar-disabled (3 assertions, 15.48s)
10:30:36 Starting test 'dogstatsd-enabled'...
10:30:37 PASS adp-config-stream (5 assertions, 17.12s)
10:30:37 Starting test 'otlp-traces-enabled'...
10:30:40 PASS basic-startup (3 assertions, 14.49s)
10:30:40 Starting test 'telemetry-endpoint'...
10:30:42 PASS adp-rar-registration (3 assertions, 16.78s)
10:30:42 Starting test 'unprivileged-api-endpoints'...
10:30:50 PASS dogstatsd-enabled (3 assertions, 14.64s)
10:30:52 PASS otlp-traces-enabled (4 assertions, 14.61s)
10:30:55 PASS telemetry-endpoint (5 assertions, 14.40s)
10:31:05 PASS unprivileged-api-endpoints (5 assertions, 22.45s)
10:31:05
10:31:05 PASSED 10 passed, 0 failed, 10 total (44.52s)
make test-correctness also worked as far as the harness is concerned but I did have two failures:
$ make test-correctness
[*] Building panoramic...
Finished `release` profile [optimized + debuginfo] target(s) in 0.36s
[*] Running correctness test suite...
10:32:31 Panoramic starting...
10:32:31 Container logs will be written to '/var/folders/dc/w_0dw8vs1n5cx5snqzx3k9yh0000gp/T/panoramic-20260415-103231'.
10:32:31 Running 8 test(s)...
10:32:31 Starting test 'dsd-origin-detection'...
10:32:31 Starting test 'dsd-plain'...
10:32:31 Starting test 'otlp-metrics'...
10:32:31 Starting test 'otlp-traces'...
10:33:36 FAIL otlp-traces (0/1 assertions passed, 65.44s)
10:33:36 Error: Detected mismatched spans between baseline and comparison targets.
10:33:36 - telemetry matches (65.19ms)
10:33:36 Detected mismatched spans between baseline and comparison targets.
10:33:36 Phase timings:
10:33:36 spawn_containers (1.21µs)
10:33:36 collect_data (65.37s)
10:33:36 analysis (65.19ms)
10:33:36 Starting test 'otlp-traces-ets'...
10:33:37 PASS dsd-origin-detection (1 assertions, 65.70s)
10:33:37 Starting test 'otlp-traces-ottl-filtering'...
10:33:41 PASS dsd-plain (1 assertions, 69.56s)
10:33:41 Starting test 'otlp-traces-ottl-transform'...
10:33:43 PASS otlp-metrics (1 assertions, 72.34s)
10:33:43 Starting test 'otlp-traces-probabilistic'...
10:35:36 FAIL otlp-traces-ets (0/1 assertions passed, 119.44s)
10:35:36 Error: Detected mismatched spans between baseline and comparison targets.
10:35:36 - telemetry matches (55.61ms)
10:35:36 Detected mismatched spans between baseline and comparison targets.
10:35:36 Phase timings:
10:35:36 spawn_containers (7.54µs)
10:35:36 collect_data (119.39s)
10:35:36 analysis (55.61ms)
10:35:36 PASS otlp-traces-ottl-filtering (1 assertions, 119.41s)
10:35:56 PASS otlp-traces-ottl-transform (1 assertions, 135.03s)
10:35:59 FAIL otlp-traces-probabilistic (0/1 assertions passed, 135.99s)
10:35:59 Error: Detected mismatched spans between baseline and comparison targets.
10:35:59 - telemetry matches (25.48ms)
10:35:59 Detected mismatched spans between baseline and comparison targets.
10:35:59 Phase timings:
10:35:59 spawn_containers (6.67µs)
10:35:59 collect_data (135.96s)
10:35:59 analysis (25.48ms)
10:35:59
10:35:59 FAILED 5 passed, 3 failed, 8 total (208.37s)
make: *** [test-correctness] Error 1
I think what might be missing from a UI perspective is to print out the path to the logs for failed tests, because it's a bit hard as a new user to find them.
It looks like I have them:
$ cd /tmp/panoramic-correctness && ls
3ftHbWEn d9779x51 dXnsOBSs f7nFjhpU hvqWYhjF Idh2RgkQ mQ9SNRKt nbiWWCLz SRuBJlxz TOZSgygm VnrvjEJD wsPm2g9c WuHwOJA5 Z2ezFUYz z5KMBfxP ZTPmk69h
But I don't know what's what. Based on timestamps I'd say these are all from the run I just did.
So main feedback is, I agree with something said in a slack thread, that we probably need a really easy way to surface the errors and see what happened.
That being said, I think with a big diff like this it might be better to land it provided test failure research is no harder now than it was before
|
Keep getting pulled into additional yak shaves, notes for continuing:
|
… changes - Resolve conflict in .gitlab/e2e.yml: keep PANORAMIC_* env vars, adopt configurable log dir pattern from #1400 as PANORAMIC_LOG_DIR="panoramic-logs" with artifact path updated to match - Resolve conflict in correctness/runner.rs: read PANORAMIC_LOG_DIR env var (default /tmp/panoramic) instead of hardcoded path Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…ture - Log dir now respects PANORAMIC_LOG_DIR env var as base, with a per-run timestamp subdir and per-test subdir ($PANORAMIC_LOG_DIR/panoramic-YYYYMMDD-HHMMSS/$TEST_NAME) - Correctness test group runners use "baseline"/"comparison" as log subdirs instead of random isolation group IDs - Failed tests show their log dir path in both TUI and no-TUI output - Metric mismatches now include up to 5 sample mismatches in the assertion message, visible in TUI mode (previously only emitted via tracing logs) - Multi-line error/assertion messages render correctly in TUI raw mode (split on lines to avoid cursor drift without \r) - Error summary and assertion detail no longer duplicate the same content Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Example TUI failing output (metrics): Example TUI failing output (traces): |
…ilure Mirror the metrics analysis treatment: collect up to 5 sample mismatches from the per-span and per-stats-group diff loops and embed them in the assertion message, so TUI mode shows actionable details instead of only the generic error count. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…ysis Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
| for line in lines { | ||
| self.add_line(format!(" {}", line)); | ||
| } | ||
| } |
There was a problem hiding this comment.
This is formatting handling for when the errors have multiple lines, without this alignment was out of whack and there ended up being a lot of extra whitespace at the beginning of each line
Summary
ground-truthcorrectness test runner intopanoramic, creating a single unified binary for both integration and correctness testsDiscoveredTestenum that auto-detects test type from config schema (integration vs correctness) — notypefield needed since schemas are disjoint-dflags to discover tests from multiple directories (e.g.-d test/integration/cases -d test/correctness)buffer_unorderedfor freeground-truthcrate entirely and updates all CI, Makefile targets, env vars (GROUND_TRUTH_*→PANORAMIC_*), docs, and referencesTest plan
cargo build --workspacesucceedscargo clippy --package panoramiccleancargo +nightly fmt --package panoramicno changescargo sort --workspace --checkpassesbasic-startuppassesdsd-plain(metrics mode) passesotlp-traces(traces mode) passesdsd-plain+otlp-metrics) passbasic-startup+dsd-plain) passpanoramic list -d test/correctnessdiscovers all 8 correctness testspanoramic list -d test/integration/cases -d test/correctnessdiscovers all 17 testsPANORAMIC_*env varsPANORAMIC_ALPINE_IMAGE🤖 Generated with Claude Code