Conversation
Replace the entire Go userspace with a Rust implementation using: - aya for eBPF program loading and management - tokio for async runtime - clap (derive) for CLI - axum for HTTP server (/metrics, /healthz, /debug/pprof/*) - clickhouse-rs for native TCP ClickHouse protocol - prometheus crate for metrics (40+ metrics preserved) - serde_yaml for config parsing (same YAML format) All external contracts preserved: - ClickHouse schema (31 tables, 62 DDL definitions) - Prometheus metric names and labels - HTTP endpoints and ports - YAML config format and defaults - Event constants (types 1-25, clients 0-11) - CLI behavior (--config, version subcommand) - BPF C code unchanged Architecture: - src/agent/ - orchestrator with 14-step startup sequence - src/tracer/ - BPF loading, event parsing, stats - src/sink/aggregated/ - time-windowed aggregation pipeline - src/export/ - ClickHouse writer, Prometheus health server - src/beacon/ - beacon node client - src/clock/ - Ethereum wall clock - src/pid/ - process discovery (name + cgroup) - src/migrate/ - embedded SQL migrations CI/CD updated for Rust toolchain (cargo build, cross for multi-arch, goreleaser with prebuilt binaries). 161 unit tests pass. Clippy clean. Formatted with rustfmt.
- Switch pprof from protobuf-codec to prost-codec, add prost dependency - Fix TrackedTidInfo type mismatch: re-export from tracer module in pid - Fix ring buffer access: use get_inner_mut() for AsyncFd guard - Fix ring_buffer_size u32/usize mismatch with safe conversion - Fix clippy lints: elide lifetimes, derive Default, simplify patterns - Remove unused re-exports from sink/aggregated/config.rs
pprof with prost-codec uses prost 0.12, so our direct prost dependency must match to share the same Message trait impl.
Adds #[allow(dead_code)] to public API items not yet consumed by application code paths (event structs, trait methods, config fields, health metrics, export helpers). Inlines 17 format arguments in bpf.rs to satisfy the uninlined_format_args lint.
Adds #[allow(dead_code)] to PortInfo, HealthMetrics, and Buffer structs whose fields are not yet consumed by application code paths. Cleans up redundant field-level annotations in health.rs.
Clang-14 (Debian Bookworm) defaults to DWARFv5, which aya-obj's ELF parser cannot handle, causing "error parsing ELF data" at runtime. Run llvm-strip -g after compilation to remove DWARF debug sections while preserving BTF (.BTF, .BTF.ext) for CO-RE relocations. This matches what Go's bpf2go does (cilium/ebpf#429).
990cebe to
8bca78e
Compare
- Log BPF object size and magic bytes before aya loading - build.rs prints object size after strip - Add unit test validating ELF structure via `object` crate - Add `object` dev-dependency for ELF validation testing
8bca78e to
7037e5c
Compare
The root cause of the "error parsing ELF data" crash was pointer alignment, not DWARF version. `include_bytes!` provides only 1-byte alignment, and aya-obj's `object` crate (without the `unaligned` feature) requires 8-byte aligned pointers for ELF header parsing. Whether the data happened to be aligned depended on the binary layout, which differed between clang versions. Switch to `aya::include_bytes_aligned!` which guarantees 32-byte alignment. Remove the diagnostic logging and `object` dev-dependency added for debugging.
8b2b656 to
5584a92
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Complete rewrite of the Go userspace in Rust while keeping BPF C programs and all external contracts identical.
Key changes:
aya(eBPF),tokio(async),axum(HTTP),clickhouse-rs(native TCP),prometheus(metrics)All external contracts preserved:
/metrics,/healthz,/debug/pprof/*on:9090endpointstylebpf/include/observoor.h)observoor --config <path>+observoor versionArchitecture:
src/agent/src/tracer/src/sink/aggregated/src/export/src/beacon/src/clock/src/pid/src/migrate/CI/CD:
test-build.yaml: Rust fmt + clippy + test + build + Docker build on PRsmaster-build.yaml: Multi-arch Docker push on master (unchanged flow)goreleaser.yaml:crossfor multi-arch builds, goreleaser with prebuilt binariese2e-client-detection.yml: Removed Go setup, usesmake docker-build(Rust)Design decisions:
Exporter::ClickHouse | Http) instead of trait objects for zero-cost asyncparking_lot::Mutexfor interior mutability on shared health metricsArc<tokio::sync::Mutex<BpfTracer>>for shared tracer access from PID monitor#[cfg(feature = "bpf")]) for macOS dev buildsaya::include_bytes_aligned!for BPF object embedding (fixes alignment-dependent ELF parse failures with clang-14)Notable fix: The E2E failures with "error parsing BPF object: error parsing ELF data" were caused by pointer alignment —
include_bytes!provides no alignment guarantee, and aya-obj'sobjectcrate requires 8-byte aligned pointers. Different clang versions produce different-sized BPF objects, shifting the data to different memory offsets. Switching toaya::include_bytes_aligned!(32-byte alignment) fixed it deterministically.Test plan
cargo fmt --checkpassescargo clippy --no-default-featurespasses (zero errors)cargo test --no-default-featurespasses (161 tests)cargo build --no-default-featuressucceedstest-buildworkflow passes (fmt + clippy + test + build + Docker)