Skip to content

Foreign Table: Persistent Postgres table for managing external data queried by DuckDB readers#951

Open
YuweiXiao wants to merge 12 commits intoduckdb:mainfrom
YuweiXiao:feat_ext_tbl
Open

Foreign Table: Persistent Postgres table for managing external data queried by DuckDB readers#951
YuweiXiao wants to merge 12 commits intoduckdb:mainfrom
YuweiXiao:feat_ext_tbl

Conversation

@YuweiXiao
Copy link
Copy Markdown
Contributor

@YuweiXiao YuweiXiao commented Oct 11, 2025

This PR introduces foreign table support, allowing users to persist an external file's view queried through DuckDB's readers (read_csv, read_parquet, and read_json).

Previously, users had to embed file locations and options directly in queries, and use r[xx] syntax for column reference. Foreign tables simplify this by defining file paths and reader options once at CREATE time, enabling clean SELECT statements without r[xx] syntax. This also opens room for access control on external files, such as fine-grained permissions like column-level visibility for different users.

CREATE TABLE Syntax

CREATE FOREIGN TABLE external_csv () 
  SERVER duckdb 
  OPTIONS (
    location = '../../data/iris.csv',
    format = 'csv',
    options = '{"header": true}'
);

-- Query like a regular table
SELECT * FROM external_csv;
SELECT "sepal.length" FROM external_csv;

-- Raw SQL way
SELECT r['sepal.length'] FROM read_parquet('../../data/iris.csv')

Features

  • DDL Support: CREATE FOREIGN TABLE, DROP FOREIGN TABLE, ALTER TABLE NAME
  • Auto Schema Inference: Column names and types inferred by DuckDB, persisted in Postgres catalog
  • Lazy loading: External tables are dynamically loaded as DuckDB views only when referenced in queries

@YuweiXiao YuweiXiao force-pushed the feat_ext_tbl branch 2 times, most recently from 1c1c595 to b5ffdc6 Compare October 11, 2025 08:34
@visardida
Copy link
Copy Markdown

visardida commented Oct 11, 2025

This is a much needed improvement @YuweiXiao , thank you!

Does this implementation of external table support handle partitioned Parquet datasets for example, when using wildcard paths or recursive directory patterns such as:

read_parquet('/path/to/data/**/*.parquet')

In other words, if I create an external table pointing to a directory of Parquet partitions, will it automatically discover and read all matching files, or does it only support a single file path per table definition?

@YuweiXiao
Copy link
Copy Markdown
Contributor Author

Does this implementation of external table support handle partitioned Parquet datasets for example, when using wildcard paths or recursive directory patterns such as:

read_parquet('/path/to/data/**/*.parquet')

In other words, if I create an external table pointing to a directory of Parquet partitions, will it automatically discover and read all matching files, or does it only support a single file path per table definition?

YES. External table tracks path / read_options in pg catalog. And file list is triggered for each query. Theoretically, all functionality supported by read_xxxx should also be available in external table.

@JelteF
Copy link
Copy Markdown
Collaborator

JelteF commented Oct 13, 2025

Thanks for the work on this! I also had something like this in mind, but I was thinking about using FOREIGN TABLES instead of table access methods for this. So I'm wondering why you went this route instead. (not saying that one is really better than the other, but I'm wondering what tradeoffs you considered)

@YuweiXiao
Copy link
Copy Markdown
Contributor Author

Thanks for the work on this! I also had something like this in mind, but I was thinking about using FOREIGN TABLES instead of table access methods for this. So I'm wondering why you went this route instead. (not saying that one is really better than the other, but I'm wondering what tradeoffs you considered)

Yes, FOREIGN TABLE would definitely work too. I didn’t have a strong tradeoff in mind — mainly wanted to reuse the existing codebase as much as possible, e.g., the DuckDB AM that’s already properly hooked and the registered triggers.

I’ll take another look at the FOREIGN TABLE approach — it has a better semantic fit (i.e., metadata only table).

@JelteF
Copy link
Copy Markdown
Collaborator

JelteF commented Oct 16, 2025

Thinking about it more, I do think FOREIGN TABLE is a better fit for this semantically. Because the CREATE TABLE command that you have now isn't actually creating the backing files. It's only registering some already existing external data in postgres.

@YuweiXiao
Copy link
Copy Markdown
Contributor Author

Thinking about it more, I do think FOREIGN TABLE is a better fit for this semantically. Because the CREATE TABLE command that you have now isn't actually creating the backing files. It's only registering some already existing external data in postgres.

Yeah. I will initiate a discussion thread and let's define the SQL interface (usage) before impl.

@AndrewJackson2020
Copy link
Copy Markdown
Contributor

The above change (or a similar change using FDW instead) would be great. One of the issues with the current syntax is that it does not play nice with ORM's which is a big annoyance for a lot of teams. Also I could see a usage pattern with pg_duckdb whereby you keep "live data" in postgres tables (or partitions) and "archive data" on s3/parquet. Would be great to be able to access both of these with a uniform interface.

@YuweiXiao
Copy link
Copy Markdown
Contributor Author

Hi @JelteF , the PR is ready for review:

  • Switched to the FOREIGN TABLE interface for table management.
  • A pre-created server (ddb_foreign_server) is added in the extension SQL — we can adjust the naming if needed.
  • Added support for table rename DDLs (both ALTER FOREIGN TABLE and ALTER TABLE).

namespace pgduckdb {

// The name of the foreign server that we use for DuckDB foreign tables.
#define DUCKDB_FOREIGN_SERVER_NAME "ddb_foreign_server"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just use duckdb for this. I think that will make the SQL look nicer:

CREATE FOREIGN TABLE external_parquet ()
  SERVER duckdb
  OPTIONS (
    location '../../data/iris.parquet'
  );
Suggested change
#define DUCKDB_FOREIGN_SERVER_NAME "ddb_foreign_server"
#define DUCKDB_FOREIGN_SERVER_NAME "duckdb"

@JelteF JelteF added this to the 1.2.0 milestone Dec 10, 2025
@YuweiXiao
Copy link
Copy Markdown
Contributor Author

hi @JelteF , can we target it for 1.1.0 release? will put in effect to address any feedback.

@JelteF
Copy link
Copy Markdown
Collaborator

JelteF commented Dec 10, 2025

I would love to include this in 1.1.0, but it's primarily my own time that's the bottleneck here (not you). I need to spend some quality time playing with this, and reading the code. Your two PRs (this one and the INSERT one) are definitely the number one features that I'd like to get released. But they're both non-trivial to review.

But the main branch has been accumulating small fixes that I just want to release somewhere in the next few days. So that's why I moved this to the 1.2.0.

@JelteF
Copy link
Copy Markdown
Collaborator

JelteF commented Dec 10, 2025

One thing I noticed now. Can you update the PR description to use the new syntax?

@YuweiXiao YuweiXiao changed the title External Tables: Persistent Postgres table for managing external data queried by DuckDB readers Foreign Table: Persistent Postgres table for managing external data queried by DuckDB readers Dec 11, 2025
@jan-swiatek
Copy link
Copy Markdown

I know that's though question, but when is version 1.2.0 roughly scheduled for release?

@visardida
Copy link
Copy Markdown

I am also very eager to get this feature, I was so hoping to get it in v1.1.0

@JelteF
Copy link
Copy Markdown
Collaborator

JelteF commented Dec 19, 2025

Could you fix the merge conflicts, that will make it easier to review this (which I'm planning to do early january).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants