Skip to content

Binary COPY is never used for tables with array columns (PostgresType.oid is never populated) #431

@onhate

Description

@onhate

Summary

When inserting into a PostgreSQL table that contains any array column (e.g. varchar[], text[], int[]), DuckDB's postgres extension always falls back to FORMAT TEXT for COPY, even when binary COPY should work. This results in ~2-2.5x slower bulk write throughput.

Root Cause

In postgres_table_entry.cpp, GetCopyFormat calls CopyRequiresText for each column. For LIST types, the check is:

case LogicalTypeId::LIST: {
    D_ASSERT(pg_type.children.size() == 1);
    auto &child_type = ListType::GetChildType(type);
    if (child_type.id() != LogicalTypeId::LIST) {
        if (!PostgresUtils::SupportedPostgresOid(child_type)) {
            return true;
        }
        if (pg_type.children[0].oid != PostgresUtils::ToPostgresOid(child_type)) {
            return true;  // <-- always triggers
        }
    }
    ...
}

The problem is that PostgresType.oid is initialized to 0 and never populated — neither CreateEmptyPostgresType (used for DDL-created tables) nor TypeToLogicalType (used for catalog-read tables) ever assigns the oid field on child types.

So the check always evaluates as:

0 != PostgresUtils::ToPostgresOid(child_type)  →  0 != 1043  →  true  →  FORMAT TEXT

This means every table with any array column silently falls back to TEXT format, regardless of whether binary would work correctly.

Reproduction

-- Attach a PostgreSQL database
ATTACH 'postgresql://...' AS pg (TYPE postgres);

-- Create a table with a varchar[] column
CREATE TABLE pg.public.test_array (
    id INTEGER,
    name VARCHAR,
    tags VARCHAR[]
);

-- Insert data — this will use FORMAT TEXT internally
INSERT INTO pg.public.test_array
SELECT 1, 'test', ['a', 'b']::VARCHAR[];

Observe in PostgreSQL logs:

COPY "public"."test_array" FROM STDIN (FORMAT TEXT, NULL '')

Expected:

COPY "public"."test_array" FROM STDIN (FORMAT BINARY)

Context

This was introduced in the fix for issue #206 (commit 4b227728, April 2024 / v0.10.2), which added the OID comparison to prevent a crash when binary COPY wrote VARCHAROID (1043) into a text[] column (which expects TEXTOID = 25). The fix correctly identified the OID mismatch problem, but the comparison relies on PostgresType.oid which is never set.

Suggested Fix

Populate PostgresType.oid from the PostgreSQL catalog when reading table metadata, and from ToPostgresOid() when creating tables via DDL. For the text[] vs varchar[] case specifically, the binary writer should use the target column's actual element OID rather than always emitting VARCHAROID.

Impact

Bulk writes to PostgreSQL tables with array columns are ~2-2.5x slower than they should be. In our case, ETL throughput dropped from ~300 MB/s to ~130 MB/s when upgrading past v0.10.2.

Environment

  • DuckDB: v1.5.1
  • postgres_scanner extension: v1.5.1 (commit c0e9256)
  • PostgreSQL: Neon (PostgreSQL 17)
  • OS: macOS (ARM64)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions