Skip to content

feat: rewrite Go userspace in Rust#13

Merged
samcm merged 9 commits intomasterfrom
feat/rust-rewrite
Feb 8, 2026
Merged

feat: rewrite Go userspace in Rust#13
samcm merged 9 commits intomasterfrom
feat/rust-rewrite

Conversation

@samcm
Copy link
Member

@samcm samcm commented Feb 6, 2026

Summary

Complete rewrite of the Go userspace in Rust while keeping BPF C programs and all external contracts identical.

Key changes:

  • Replace Go with Rust using aya (eBPF), tokio (async), axum (HTTP), clickhouse-rs (native TCP), prometheus (metrics)
  • 94 files changed: 13,423 insertions, 9,599 deletions
  • 161 unit tests passing, clippy clean, rustfmt formatted

All external contracts preserved:

  • ClickHouse schema: 31 logical tables (62 DDL definitions) - same column names/types/order
  • Prometheus metrics: 40+ metric names and labels unchanged
  • HTTP endpoints: /metrics, /healthz, /debug/pprof/* on :9090
  • YAML config format: same field names, same defaults, same endpoint style
  • Event constants: types 1-25, clients 0-11 (matching bpf/include/observoor.h)
  • CLI: observoor --config <path> + observoor version

Architecture:

Module Description
src/agent/ Orchestrator with 14-step startup sequence
src/tracer/ BPF loading (required vs optional attach), zero-copy event parsing
src/sink/aggregated/ Time-windowed aggregation with histogram, dimension hashing
src/export/ ClickHouse writer (native TCP), Prometheus health server
src/beacon/ Beacon node client (genesis, spec, sync)
src/clock/ Ethereum wall clock (slot timing)
src/pid/ Composite PID discovery (process name + cgroup v2)
src/migrate/ Embedded SQL migrations (golang-migrate compatible)

CI/CD:

  • test-build.yaml: Rust fmt + clippy + test + build + Docker build on PRs
  • master-build.yaml: Multi-arch Docker push on master (unchanged flow)
  • goreleaser.yaml: cross for multi-arch builds, goreleaser with prebuilt binaries
  • e2e-client-detection.yml: Removed Go setup, uses make docker-build (Rust)

Design decisions:

  • Enum dispatch (Exporter::ClickHouse | Http) instead of trait objects for zero-cost async
  • parking_lot::Mutex for interior mutability on shared health metrics
  • Arc<tokio::sync::Mutex<BpfTracer>> for shared tracer access from PID monitor
  • Feature-gated BPF (#[cfg(feature = "bpf")]) for macOS dev builds
  • Raw SQL INSERT for ClickHouse (DateTime64/Tuple not supported in clickhouse-rs Block API)
  • aya::include_bytes_aligned! for BPF object embedding (fixes alignment-dependent ELF parse failures with clang-14)

Notable fix: The E2E failures with "error parsing BPF object: error parsing ELF data" were caused by pointer alignment — include_bytes! provides no alignment guarantee, and aya-obj's object crate requires 8-byte aligned pointers. Different clang versions produce different-sized BPF objects, shifting the data to different memory offsets. Switching to aya::include_bytes_aligned! (32-byte alignment) fixed it deterministically.

Test plan

  • cargo fmt --check passes
  • cargo clippy --no-default-features passes (zero errors)
  • cargo test --no-default-features passes (161 tests)
  • cargo build --no-default-features succeeds
  • CI test-build workflow passes (fmt + clippy + test + build + Docker)
  • Docker image builds successfully on Linux (with BPF + clang-14)
  • E2E Docker smoke test with Kurtosis Ethereum network
  • E2E Kubernetes smoke test with K3s + Kurtosis

samcm added 6 commits February 6, 2026 20:25
Replace the entire Go userspace with a Rust implementation using:
- aya for eBPF program loading and management
- tokio for async runtime
- clap (derive) for CLI
- axum for HTTP server (/metrics, /healthz, /debug/pprof/*)
- clickhouse-rs for native TCP ClickHouse protocol
- prometheus crate for metrics (40+ metrics preserved)
- serde_yaml for config parsing (same YAML format)

All external contracts preserved:
- ClickHouse schema (31 tables, 62 DDL definitions)
- Prometheus metric names and labels
- HTTP endpoints and ports
- YAML config format and defaults
- Event constants (types 1-25, clients 0-11)
- CLI behavior (--config, version subcommand)
- BPF C code unchanged

Architecture:
- src/agent/ - orchestrator with 14-step startup sequence
- src/tracer/ - BPF loading, event parsing, stats
- src/sink/aggregated/ - time-windowed aggregation pipeline
- src/export/ - ClickHouse writer, Prometheus health server
- src/beacon/ - beacon node client
- src/clock/ - Ethereum wall clock
- src/pid/ - process discovery (name + cgroup)
- src/migrate/ - embedded SQL migrations

CI/CD updated for Rust toolchain (cargo build, cross for
multi-arch, goreleaser with prebuilt binaries).

161 unit tests pass. Clippy clean. Formatted with rustfmt.
- Switch pprof from protobuf-codec to prost-codec, add prost dependency
- Fix TrackedTidInfo type mismatch: re-export from tracer module in pid
- Fix ring buffer access: use get_inner_mut() for AsyncFd guard
- Fix ring_buffer_size u32/usize mismatch with safe conversion
- Fix clippy lints: elide lifetimes, derive Default, simplify patterns
- Remove unused re-exports from sink/aggregated/config.rs
pprof with prost-codec uses prost 0.12, so our direct prost
dependency must match to share the same Message trait impl.
Adds #[allow(dead_code)] to public API items not yet consumed by
application code paths (event structs, trait methods, config fields,
health metrics, export helpers). Inlines 17 format arguments in bpf.rs
to satisfy the uninlined_format_args lint.
Adds #[allow(dead_code)] to PortInfo, HealthMetrics, and Buffer structs
whose fields are not yet consumed by application code paths. Cleans up
redundant field-level annotations in health.rs.
Clang-14 (Debian Bookworm) defaults to DWARFv5, which aya-obj's ELF
parser cannot handle, causing "error parsing ELF data" at runtime.
Run llvm-strip -g after compilation to remove DWARF debug sections
while preserving BTF (.BTF, .BTF.ext) for CO-RE relocations.

This matches what Go's bpf2go does (cilium/ebpf#429).
@samcm samcm force-pushed the feat/rust-rewrite branch from 990cebe to 8bca78e Compare February 6, 2026 11:58
- Log BPF object size and magic bytes before aya loading
- build.rs prints object size after strip
- Add unit test validating ELF structure via `object` crate
- Add `object` dev-dependency for ELF validation testing
@samcm samcm force-pushed the feat/rust-rewrite branch from 8bca78e to 7037e5c Compare February 6, 2026 12:02
The root cause of the "error parsing ELF data" crash was pointer alignment,
not DWARF version. `include_bytes!` provides only 1-byte alignment, and
aya-obj's `object` crate (without the `unaligned` feature) requires 8-byte
aligned pointers for ELF header parsing. Whether the data happened to be
aligned depended on the binary layout, which differed between clang versions.

Switch to `aya::include_bytes_aligned!` which guarantees 32-byte alignment.
Remove the diagnostic logging and `object` dev-dependency added for debugging.
@samcm samcm force-pushed the feat/rust-rewrite branch from 8b2b656 to 5584a92 Compare February 6, 2026 12:26
@samcm samcm merged commit d5724dc into master Feb 8, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant