Hi 👋
While evaluating crystal-pg for a Crystal project, I ran a full multi-dimension audit toward a 1.0 tag (correctness, concurrency / fiber-safety, error model, perf/memory/CPU, PostgreSQL protocol completeness, public API / SemVer surface, tests / CI). I want to lead with the honest part: the architecture is genuinely good. A clean two-layer split — the wire protocol in PQ::* cleanly separated from the crystal-db driver in PG::* — correct binary decoders for the large majority of types, SCRAM-SHA-256 + channel binding (rare and well done), logical replication (pgoutput), and a real round-trip spec suite against Postgres 14–18. The bones are solid.
This is meant as a constructive map of what stands between today and a 1.0 that can be promised — and I'm offering to contribute the fixes, starting with the blockers and the most-requested security/recovery gaps.
📄 Full audit (canonical, cited file:line): https://gist.github.com/n-rodriguez/4cbd8a46b0892adc8305156c70273a50
Summary by tier
| Tier |
Theme |
Headline |
| 0 |
Blockers |
PG::Interval#to_spans calls methods that don't exist (won't compile when called); .size instead of .bytesize corrupts the startup/password packet for non-ASCII; a stray pp debug print in replication; CI triggers on main while the default branch is master (push CI never fires); a decode error poisons a pooled connection silently |
| 1 |
Security |
verify-ca/verify-full are accepted but the TLS context is hard-coded VerifyMode::NONE — no cert/hostname verification (silent MITM exposure); an over-permissive .pgpass is warned about but read anyway; PGSSLMODE/PGSSL* env vars are ignored |
| 2 |
Correctness & concurrency |
RowDescription table_oid/column fields swapped + truncated; infinity/-infinity timestamps unhandled; Numeric#to_s fractional truncation is fragile; two array code paths disagree on non-1 lower bounds; the LISTEN/NOTIFY loop shares the socket read path with queries without a lock; @@location_cache is mutated cross-fiber without synchronization; two disjoint exception trees (PQ::PQError is not under PG::Error) |
| 3 |
Protocol completeness |
verify-* unimplemented; NegotiateProtocolVersion unhandled; no CancelRequest; missing types (time/timetz, inet/cidr/macaddr, money, ranges); connect_timeout/options not honored (a dead host hangs forever); replication: no slot creation, proto pinned to v1, ErrorFrame ignored |
| 4 |
Perf / memory / CPU |
decoder resolved by per-value OID hash lookup instead of per-column; no real prepared statements (Parse on every exec — the unnamed statement); Array(Slice?) + per-column copy per row; text param encoding allocates per value |
| 5 |
API / SemVer |
the whole PQ::* namespace is :nodoc: but public in practice (callbacks return PQ::Notification/Notice); PQ::Connection#soc + protocol I/O are public; sslmode is a Symbol; Geo::Path is inconsistent with its sibling records; PG::Error is :nodoc: yet is the root users must rescue; dead pre-Crystal-1.0 monkeypatches |
| 6 |
Tests / CI / packaging |
CI is nix-only with a single Crystal version; pq/frame.cr, pq/query.cr, ext/openssl.cr, pg/record.cr have zero coverage; auth & escape-error specs are disabled by default; depends on crystal-db which is itself pre-1.0 |
All headline compile / wire-format / security findings were verified by hand against the source.
How this maps to existing PRs and issues
Several long-standing issues corroborate these findings — they're not theoretical:
The takeaway: Tier 0 and Tier 1 have no competing PRs but multiple corroborating issues — that's where I'd like to start.
Proposed plan
I'll open one child issue per tier (linked below) so each is independently discussable, and I'd like to start with a PR for Tier 0 (the blockers) — each with a failing spec first, plus the CI branch fix so they can't recur — and then Tier 1 (verify-full, closing #240/#237/#231).
Two questions before I send code:
- Spec framework for new specs — stdlib
spec (what the suite uses today)?
- For the completeness gaps (Tier 3) and security (Tier 1): prefer to implement them (real TLS verification, missing types, query cancel), or to stop advertising them (reject
verify-* and unsupported sslmodes loudly) until they exist? Happy either way.
Thanks for building and maintaining this — it's the reference Postgres driver for Crystal, and I'd be glad to help push it toward 1.0. 🙏
ℹ️ Transparency: this audit was produced with Claude Opus 4.8 [1M] (multi-pass static review), then the headline compile / wire-format / security findings were verified by hand against the source.
Hi 👋
While evaluating
crystal-pgfor a Crystal project, I ran a full multi-dimension audit toward a1.0tag (correctness, concurrency / fiber-safety, error model, perf/memory/CPU, PostgreSQL protocol completeness, public API / SemVer surface, tests / CI). I want to lead with the honest part: the architecture is genuinely good. A clean two-layer split — the wire protocol inPQ::*cleanly separated from thecrystal-dbdriver inPG::*— correct binary decoders for the large majority of types, SCRAM-SHA-256 + channel binding (rare and well done), logical replication (pgoutput), and a real round-trip spec suite against Postgres 14–18. The bones are solid.This is meant as a constructive map of what stands between today and a
1.0that can be promised — and I'm offering to contribute the fixes, starting with the blockers and the most-requested security/recovery gaps.📄 Full audit (canonical, cited
file:line): https://gist.github.com/n-rodriguez/4cbd8a46b0892adc8305156c70273a50Summary by tier
PG::Interval#to_spanscalls methods that don't exist (won't compile when called);.sizeinstead of.bytesizecorrupts the startup/password packet for non-ASCII; a strayppdebug print in replication; CI triggers onmainwhile the default branch ismaster(push CI never fires); a decode error poisons a pooled connection silentlyverify-ca/verify-fullare accepted but the TLS context is hard-codedVerifyMode::NONE— no cert/hostname verification (silent MITM exposure); an over-permissive.pgpassis warned about but read anyway;PGSSLMODE/PGSSL*env vars are ignoredtable_oid/column fields swapped + truncated;infinity/-infinitytimestamps unhandled;Numeric#to_sfractional truncation is fragile; two array code paths disagree on non-1 lower bounds; the LISTEN/NOTIFY loop shares the socket read path with queries without a lock;@@location_cacheis mutated cross-fiber without synchronization; two disjoint exception trees (PQ::PQErroris not underPG::Error)verify-*unimplemented;NegotiateProtocolVersionunhandled; no CancelRequest; missing types (time/timetz,inet/cidr/macaddr,money, ranges);connect_timeout/optionsnot honored (a dead host hangs forever); replication: no slot creation, proto pinned to v1,ErrorFrameignoredArray(Slice?)+ per-column copy per row; text param encoding allocates per valuePQ::*namespace is:nodoc:but public in practice (callbacks returnPQ::Notification/Notice);PQ::Connection#soc+ protocol I/O are public;sslmodeis aSymbol;Geo::Pathis inconsistent with its siblingrecords;PG::Erroris:nodoc:yet is the root users must rescue; dead pre-Crystal-1.0 monkeypatchespq/frame.cr,pq/query.cr,ext/openssl.cr,pg/record.crhave zero coverage; auth & escape-error specs are disabled by default; depends oncrystal-dbwhich is itself pre-1.0All headline compile / wire-format / security findings were verified by hand against the source.
How this maps to existing PRs and issues
Several long-standing issues corroborate these findings — they're not theoretical:
sslmode=verify-full#240 (verify-full, help wanted), cockroach db #237, Unable to connect to Cockroachdb #231 (CockroachDB users blocked onverify-full+ theoptionsparam). No open PR addresses it.reset) are exactly the "decode/IO error doesn't mark the connection lost" finding; Error: can't cast to JSON::Any #263 (can't cast to JSON::Any) and Time seems to drop precision when passed in as an arg using at_end_of_day #252 (Timearg loses precision) are decoder/encoder findings. No open PR.The takeaway: Tier 0 and Tier 1 have no competing PRs but multiple corroborating issues — that's where I'd like to start.
Proposed plan
I'll open one child issue per tier (linked below) so each is independently discussable, and I'd like to start with a PR for Tier 0 (the blockers) — each with a failing spec first, plus the CI branch fix so they can't recur — and then Tier 1 (
verify-full, closing #240/#237/#231).Two questions before I send code:
spec(what the suite uses today)?verify-*and unsupported sslmodes loudly) until they exist? Happy either way.Thanks for building and maintaining this — it's the reference Postgres driver for Crystal, and I'd be glad to help push it toward 1.0. 🙏