Summary
When inserting into a PostgreSQL table that contains any array column (e.g. varchar[], text[], int[]), DuckDB's postgres extension always falls back to FORMAT TEXT for COPY, even when binary COPY should work. This results in ~2-2.5x slower bulk write throughput.
Root Cause
In postgres_table_entry.cpp, GetCopyFormat calls CopyRequiresText for each column. For LIST types, the check is:
case LogicalTypeId::LIST: {
D_ASSERT(pg_type.children.size() == 1);
auto &child_type = ListType::GetChildType(type);
if (child_type.id() != LogicalTypeId::LIST) {
if (!PostgresUtils::SupportedPostgresOid(child_type)) {
return true;
}
if (pg_type.children[0].oid != PostgresUtils::ToPostgresOid(child_type)) {
return true; // <-- always triggers
}
}
...
}
The problem is that PostgresType.oid is initialized to 0 and never populated — neither CreateEmptyPostgresType (used for DDL-created tables) nor TypeToLogicalType (used for catalog-read tables) ever assigns the oid field on child types.
So the check always evaluates as:
0 != PostgresUtils::ToPostgresOid(child_type) → 0 != 1043 → true → FORMAT TEXT
This means every table with any array column silently falls back to TEXT format, regardless of whether binary would work correctly.
Reproduction
-- Attach a PostgreSQL database
ATTACH 'postgresql://...' AS pg (TYPE postgres);
-- Create a table with a varchar[] column
CREATE TABLE pg.public.test_array (
id INTEGER,
name VARCHAR,
tags VARCHAR[]
);
-- Insert data — this will use FORMAT TEXT internally
INSERT INTO pg.public.test_array
SELECT 1, 'test', ['a', 'b']::VARCHAR[];
Observe in PostgreSQL logs:
COPY "public"."test_array" FROM STDIN (FORMAT TEXT, NULL '')
Expected:
COPY "public"."test_array" FROM STDIN (FORMAT BINARY)
Context
This was introduced in the fix for issue #206 (commit 4b227728, April 2024 / v0.10.2), which added the OID comparison to prevent a crash when binary COPY wrote VARCHAROID (1043) into a text[] column (which expects TEXTOID = 25). The fix correctly identified the OID mismatch problem, but the comparison relies on PostgresType.oid which is never set.
Suggested Fix
Populate PostgresType.oid from the PostgreSQL catalog when reading table metadata, and from ToPostgresOid() when creating tables via DDL. For the text[] vs varchar[] case specifically, the binary writer should use the target column's actual element OID rather than always emitting VARCHAROID.
Impact
Bulk writes to PostgreSQL tables with array columns are ~2-2.5x slower than they should be. In our case, ETL throughput dropped from ~300 MB/s to ~130 MB/s when upgrading past v0.10.2.
Environment
- DuckDB: v1.5.1
- postgres_scanner extension: v1.5.1 (commit c0e9256)
- PostgreSQL: Neon (PostgreSQL 17)
- OS: macOS (ARM64)
Summary
When inserting into a PostgreSQL table that contains any array column (e.g.
varchar[],text[],int[]), DuckDB's postgres extension always falls back toFORMAT TEXTfor COPY, even when binary COPY should work. This results in ~2-2.5x slower bulk write throughput.Root Cause
In
postgres_table_entry.cpp,GetCopyFormatcallsCopyRequiresTextfor each column. ForLISTtypes, the check is:The problem is that
PostgresType.oidis initialized to 0 and never populated — neitherCreateEmptyPostgresType(used for DDL-created tables) norTypeToLogicalType(used for catalog-read tables) ever assigns theoidfield on child types.So the check always evaluates as:
This means every table with any array column silently falls back to TEXT format, regardless of whether binary would work correctly.
Reproduction
Observe in PostgreSQL logs:
Expected:
Context
This was introduced in the fix for issue #206 (commit
4b227728, April 2024 / v0.10.2), which added the OID comparison to prevent a crash when binary COPY wroteVARCHAROID(1043) into atext[]column (which expectsTEXTOID= 25). The fix correctly identified the OID mismatch problem, but the comparison relies onPostgresType.oidwhich is never set.Suggested Fix
Populate
PostgresType.oidfrom the PostgreSQL catalog when reading table metadata, and fromToPostgresOid()when creating tables via DDL. For thetext[]vsvarchar[]case specifically, the binary writer should use the target column's actual element OID rather than always emittingVARCHAROID.Impact
Bulk writes to PostgreSQL tables with array columns are ~2-2.5x slower than they should be. In our case, ETL throughput dropped from ~300 MB/s to ~130 MB/s when upgrading past v0.10.2.
Environment