Turn a folder of CSV / Parquet / JSON files into one SQL-queryable source for your AI agent.
Small businesses don't have a data warehouse — they have a folder full of exports: customers.csv, last month's orders.xlsx, a regions.json someone emailed over. tablebridge is an MCP server that points DuckDB at that folder, exposes each file as a SQL table, and lets your agent run read-only SQL — including JOINs across files — to answer questions over all of them at once. Scattered spreadsheets become one queryable source of truth.
It's read-only and sandboxed: files are loaded into an in-memory database, the data directory is the only thing it can see, and queries are validated so an agent can't write, escape to other paths, or call raw file functions.
- 🔗 One source over many files. JOIN
orders.csvtocustomers.csvtoregions.jsonin a single query — no ETL, no database to stand up. - 🦆 DuckDB-powered. Fast analytical SQL over CSV, TSV, Parquet, JSON/NDJSON.
- 🔒 Safe by design. Files are materialized into memory; queries are validated read-only; raw file-access functions and out-of-sandbox paths are rejected.
- 🤖 Agent-friendly.
list_sources→describe→queryis a natural flow the agent can follow on its own. - 🪶 Two dependencies (
mcp,duckdb), fully typed and tested.
uvx tablebridge # run directly
# or
pip install tablebridge # then run: tablebridgeTABLEBRIDGE_DATA_DIR=/path/to/your/data claude mcp add tablebridge -- uvx tablebridgeA Dockerfile is included. The server speaks MCP over stdio. Mount the
folder you want to query at /data (read-only is fine) and run interactively (-i):
docker build -t tablebridge .
docker run --rm -i -v /path/to/your/data:/data:ro tablebridge| Tool | Description |
|---|---|
list_sources |
List the tables (one per data file) with column counts — start here |
describe |
A table's columns and types |
preview |
First N rows of a table |
query |
Run read-only SQL (DuckDB dialect) across the tables, JOINs included |
refresh |
Re-scan the data directory for added/changed files |
server_info |
Effective config (data dir, row cap, supported formats) |
With a folder containing customers.csv, orders.csv, and regions.json:
You: Who are my top 3 customers by total spend, and what region are they in?
Agent: (calls
list_sources, thenquery)SELECT c.name, r.region, SUM(o.total) AS spend FROM customers c JOIN orders o ON o.customer_id = c.id JOIN regions r ON r.customer_id = c.id GROUP BY c.name, r.region ORDER BY spend DESC LIMIT 3;
| Variable | Default | Description |
|---|---|---|
TABLEBRIDGE_DATA_DIR |
. |
Directory of files to expose (the sandbox boundary) |
TABLEBRIDGE_MAX_ROWS |
1000 |
Max rows returned per query/preview |
TABLEBRIDGE_RECURSIVE |
1 |
Scan subdirectories too |
Supported formats: .csv, .tsv, .parquet, .json, .ndjson.
- Sandboxed to
TABLEBRIDGE_DATA_DIR— only files under it are loaded. - Materialized into an in-memory DuckDB, then external filesystem access is disabled — queries can't reach other paths.
- Validated SQL — a single read-only statement only; writes and raw file-reader functions are rejected.
git clone https://github.com/Michael-WhiteCapData/tablebridge-mcp
cd tablebridge-mcp
uv pip install -e ".[dev]"
ruff check .
pytest # uses real DuckDB over temp filesSee CONTRIBUTING.md.
MIT © Michael Tierney
{ "mcpServers": { "tablebridge": { "command": "uvx", "args": ["tablebridge"], "env": { "TABLEBRIDGE_DATA_DIR": "/path/to/your/data" } } } }