Skip to content

Commit 107ba4a

Browse files
authored
Merge pull request #211 from OreoYang/pg_ai_query
add:pg_ai_query
2 parents 0e327f0 + f125ed9 commit 107ba4a

File tree

6 files changed

+646
-1
lines changed

6 files changed

+646
-1
lines changed

CN/modules/ROOT/nav.adoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@
3131
**** xref:master/ecosystem_components/pg_cron.adoc[pg_cron]
3232
**** xref:master/ecosystem_components/pgsql_http.adoc[pgsql-http]
3333
**** xref:master/ecosystem_components/plpgsql_check.adoc[plpgsql_check]
34+
**** xref:master/ecosystem_components/pg_ai_query.adoc[pg_ai_query]
3435
**** xref:master/ecosystem_components/pgroonga.adoc[pgroonga]
3536
**** xref:master/ecosystem_components/pgaudit.adoc[pgaudit]
3637
**** xref:master/ecosystem_components/pgrouting.adoc[pgrouting]

CN/modules/ROOT/pages/master/ecosystem_components/ecosystem_overview.adoc

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,8 @@ IvorySQL 作为一款兼容 Oracle 且基于 PostgreSQL 的高级开源数据库
2020
| 7 | xref:master/ecosystem_components/pgroonga.adoc[pgroonga] | 4.0.4 | 提供​非英语语言全文搜索功能,满足高性能应用的需求 | 中日韩等语言的全文搜索功能
2121
| 8 | xref:master/ecosystem_components/pgaudit.adoc[pgaudit] | 18.0 | 提供细粒度的审计功能,记录数据库操作日志,便于安全审计和合规性检查 | 数据库安全审计、合规性检查、审计报告生成
2222
| 9 | xref:master/ecosystem_components/pgrouting.adoc[pgrouting] | 3.8.0 | 提供地理空间数据的路由计算功能,支持多种算法和数据格式 | 地理空间分析、路径规划、物流优化
23-
| 10 | xref:master/ecosystem_components/system_stats.adoc[system_stats] | 3.2 | 提供用于访问系统级统计信息的函数 | 系统监控
23+
| 10 | xref:master/ecosystem_components/system_stats.adoc[system_stats] | 3.2 | 提供用于访问系统级统计信息的函数 | 系统监控
24+
| 11 | xref:master/ecosystem_components/pg_ai_query.adoc[pg_ai_query] | 0.1.1 | AI驱动的自然语言转SQL扩展,支持多种大语言模型 | AI辅助查询、自然语言数据库交互
2425
|====
2526

2627
这些插件均经过 IvorySQL 团队的测试和适配,确保在 IvorySQL 环境下稳定运行。用户可以根据业务需求选择合适的插件,进一步提升数据库系统的能力和灵活性。
Lines changed: 320 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,320 @@
1+
:sectnums:
2+
:sectnumlevels: 5
3+
4+
= pg_ai_query
5+
6+
== 概述
7+
8+
pg_ai_query 是一个用于 IvorySQL/PostgreSQL 的 AI 驱动自然语言转 SQL 扩展。它利用大语言模型(LLM)将用户的自然语言描述直接转换为可执行的 SQL 查询语句,支持 OpenAI、Anthropic Claude 和 Google Gemini 等多种 AI 模型。
9+
10+
项目地址:<https://github.com/benodiwal/pg_ai_query/tree/main>
11+
12+
开源协议:Apache-2.0
13+
14+
主要特性:
15+
16+
* **自然语言转 SQL**:将文本语言描述转换为有效的 SQL 查询
17+
* **多模型支持**:支持 gpt-4o-mini、gpt-4o、gpt-5、claude-3-haiku-20240307、claude-sonnet-4-5-20250929、claude-4.5-opus 等大模型
18+
* **安全保护**:阻止访问 `information_schema` 和 `pg_catalog` 系统表
19+
* **作用域限制**:仅对用户表进行操作
20+
* **可配置限制**:内置行数限制强制执行
21+
* **API 密钥安全**:安全处理 API 凭证
22+
23+
== 快速开始
24+
下面以v0.1.1为例进行演示。
25+
26+
=== 安装
27+
28+
*依赖要求*
29+
30+
* PostgreSQL 14+ with development headers
31+
* CMake 3.16+
32+
* C++20 compatible compiler
33+
* API key from OpenAI, Anthropic, or Google (Gemini)
34+
35+
*安装依赖*
36+
37+
[source,bash]
38+
----
39+
sudo apt-get install libcurl4-openssl-dev
40+
----
41+
42+
*编译安装 IvorySQL*
43+
44+
如需从源码编译 IvorySQL,可参考以下配置:
45+
46+
[source,bash]
47+
----
48+
./configure \
49+
--prefix=$PWD/inst \
50+
--enable-cassert \
51+
--enable-debug \
52+
--enable-tap-tests \
53+
--enable-rpath \
54+
--enable-nls \
55+
--enable-injection-points \
56+
--with-tcl \
57+
--with-python \
58+
--with-gssapi \
59+
--with-pam \
60+
--with-ldap \
61+
--with-openssl \
62+
--with-libedit-preferred \
63+
--with-uuid=e2fs \
64+
--with-ossp-uuid \
65+
--with-libxml \
66+
--with-libxslt \
67+
--with-perl \
68+
--with-icu \
69+
--with-libnuma
70+
----
71+
72+
*编译安装 pg_ai_query*
73+
74+
[source,bash]
75+
----
76+
git clone --recurse-submodules https://github.com/benodiwal/pg_ai_query.git
77+
cd pg_ai_query
78+
mkdir build && cd build
79+
export PATH="$HOME/works/repo/ivorysql/IvorySQL/inst/bin:$PATH"
80+
cmake .. -DCMAKE_INSTALL_PREFIX=$HOME/works/repo/ivorysql/IvorySQL/inst
81+
make && sudo make install
82+
----
83+
84+
*创建扩展*
85+
86+
[source,sql]
87+
----
88+
CREATE EXTENSION pg_ai_query;
89+
----
90+
91+
=== 配置
92+
93+
在 home 目录下创建 `~/.pg_ai.config` 配置文件:
94+
95+
[source,ini]
96+
----
97+
[general]
98+
log_level = "INFO"
99+
enable_logging = false
100+
101+
[query]
102+
enforce_limit = true
103+
default_limit = 1000
104+
105+
[response]
106+
show_explanation = true
107+
show_warnings = true
108+
show_suggested_visualization = false
109+
use_formatted_response = false
110+
111+
[anthropic]
112+
# Your Anthropic API key (if using Claude)
113+
api_key = "******"
114+
115+
# Default model to use (options: claude-sonnet-4-5-20250929)
116+
default_model = "claude-sonnet-4-5-20250929"
117+
118+
# Custom API endpoint (optional) - for Anthropic-compatible APIs
119+
api_endpoint = "https://open.bigmodel.cn/api/anthropic"
120+
121+
[prompts]
122+
# Use file paths to read custom prompts
123+
system_prompt = /home/highgo/.pg_ai.prompts
124+
explain_system_prompt = /home/highgo/.pg_ai.explain.prompts
125+
----
126+
127+
更多示例请参考:<https://github.com/benodiwal/pg_ai_query/blob/main/docs/src/examples.md>
128+
129+
== 使用示例
130+
131+
=== 基本用法
132+
133+
[source,sql]
134+
----
135+
SELECT generate_query('找出所有的用户');
136+
----
137+
138+
输出示例:
139+
140+
----
141+
[INFO] Text generation successful - model: claude-sonnet-4-5-20250929, response_id: msg_20260209135507cc16362d5d324ccd
142+
143+
generate_query
144+
--------------------------------------------------------
145+
SELECT * FROM public.users LIMIT 1000;
146+
+
147+
-- Explanation:
148+
-- Retrieves all columns and rows from the users table.
149+
+
150+
-- Warning: INFO: Applied LIMIT 1000 to prevent large result sets. Remove LIMIT if you need all data.
151+
+
152+
-- Note: Row limit was automatically applied to this query for safety
153+
(1 row)
154+
----
155+
156+
执行查询:
157+
158+
[source,sql]
159+
----
160+
SELECT * FROM public.users LIMIT 1000;
161+
----
162+
163+
输出:
164+
165+
----
166+
id | name | email | age | created_at | city
167+
----+---------------+-------------------+-----+----------------------------+---------------
168+
1 | Alice Johnson | alice@example.com | 28 | 2026-02-04 15:47:55.208111 | New York
169+
2 | Bob Smith | bob@example.com | 35 | 2026-02-04 15:47:55.208111 | San Francisco
170+
3 | Carol Davis | carol@example.com | 31 | 2026-02-04 15:47:55.208111 | Chicago
171+
4 | David Wilson | david@example.com | 27 | 2026-02-04 15:47:55.208111 | Seattle
172+
5 | Eva Brown | eva@example.com | 33 | 2026-02-04 15:47:55.208111 | Boston
173+
(5 rows)
174+
----
175+
176+
=== generate_query 示例
177+
178+
*生成测试数据*
179+
180+
[source,sql]
181+
----
182+
SELECT generate_query('生成100条user数据,插入到users');
183+
----
184+
185+
输出:
186+
187+
----
188+
[INFO] Text generation successful - model: claude-sonnet-4-5-20250929, response_id: msg_2026021114092101601c5650864a2d
189+
190+
generate_query
191+
--------------------------------------------------------------------------------------------------------
192+
INSERT INTO public.users (name, email, age, city, status)
193+
SELECT 'User_' || generate_series AS name,
194+
'user' || generate_series || '@example.com' AS email,
195+
(18 + (generate_series % 50)) AS age,
196+
(ARRAY['Beijing','Shanghai','Guangzhou','Shenzhen','Hangzhou'])[1 + (generate_series % 5)] AS city,
197+
'active' AS status
198+
FROM generate_series(1, 100);
199+
+
200+
-- Explanation:
201+
-- 生成100条模拟用户数据并插入到users表中。数据包括自动生成的姓名、唯一邮箱、随机年龄(18-67岁)、随机城市和默认状态。
202+
+
203+
-- Warnings:
204+
-- 1. INFO: 依赖users表的id列有DEFAULT自增设置,未手动插入id。
205+
-- 2. INFO: 使用generate_series函数生成序列数据,这是PostgreSQL/IvorySQL的特性。
206+
-- 3. WARN: 确保在运行前users表为空或id序列不冲突,否则可能重复插入。
207+
-- 4. WARN: 邮箱格式为简单模拟,实际环境中可能需要更复杂的逻辑或去重检查。
208+
(1 row)
209+
----
210+
211+
*不区分大小写查询*
212+
213+
[source,sql]
214+
----
215+
SELECT generate_query('show users from beijing, beijing is non-Case insensitive');
216+
----
217+
218+
输出:
219+
220+
----
221+
[INFO] Text generation successful - model: claude-sonnet-4-5-20250929, response_id: msg_20260211142845878f5f1a5a2f44a7
222+
223+
generate_query
224+
-----------------------------------------
225+
SELECT id, name, email, age, created_at, city, status
226+
FROM public.users
227+
WHERE LOWER(city) = LOWER('beijing') LIMIT 100;
228+
+
229+
-- Explanation:
230+
-- Selects all user details for users located in Beijing, performing a case-insensitive match on the city column.
231+
+
232+
-- Warnings:
233+
-- 1. INFO: Using LOWER() on both sides ensures case-insensitive matching but may prevent the database from using a standard index on the city column if one exists.
234+
-- 2. INFO: Row limit of 100 applied to prevent large result sets.
235+
+
236+
-- Note: Row limit was automatically applied to this query for safety
237+
(1 row)
238+
----
239+
240+
=== explain_query 示例
241+
242+
[source,sql]
243+
----
244+
SELECT explain_query('SELECT * FROM orders WHERE user_id = 12');
245+
----
246+
247+
输出:
248+
249+
----
250+
[INFO] Text generation successful - model: claude-sonnet-4-5-20250929, response_id: msg_20260211175909d47a6871bcca4897
251+
252+
explain_query
253+
--------------------------------------------------------------------------------------------------------------
254+
1. 查询概述
255+
+
256+
- 该查询旨在从 orders 表中检索 user_id 等于 12 的所有记录(SELECT *)。
257+
- 这是一个典型的根据特定字段(user_id)筛选数据的查询。
258+
+
259+
2. 性能摘要
260+
+
261+
- 总执行时间: 0.021 毫秒
262+
- 规划时间: 0.430 毫秒
263+
- 总成本: 18.12
264+
- 返回行数: 0 行 (Actual Rows: 0)
265+
- 扫描行数: 0 行 (Rows Removed by Filter: 0)
266+
+
267+
3. 执行计划分析
268+
+
269+
- 关键步骤: 顺序扫描
270+
- 数据库对 orders 表执行了全表扫描操作。
271+
- 计划器预计会找到 3 行数据,但实际执行返回了 0 行。
272+
- 过滤条件: orders.user_id = 12,这意味着数据库必须读取表中的每一行来检查这个条件。
273+
+
274+
4. 性能问题
275+
+
276+
- 全表扫描风险: 虽然目前表的数据量很小(执行时间仅为 0.021ms),但使用了 Seq Scan(顺序扫描)意味着数据库没有使用索引。如果 orders 表随着时间推移增长到包含数百万行数据,这种查询方式将变得极其缓慢(高 I/O 消耗)。
277+
- 缺失索引: 计划显示没有使用任何索引来定位 user_id = 12 的行,这表明在 user_id 列上可能缺少必要的 B-Tree 索引。
278+
+
279+
5. 优化建议
280+
+
281+
- 主要建议: 在 user_id 列上创建索引以避免全表扫描。这将把查询从 O(N)(扫描所有行)转变为 O(log N)(索引查找)。
282+
- SQL 优化示例:
283+
+
284+
CREATE INDEX idx_orders_user_id ON orders(user_id);
285+
+
286+
6. 索引建议
287+
+
288+
- 推荐索引: 在 orders 表的 user_id 列上创建 B-Tree 索引。
289+
- 理由: 查询条件基于 user_id 的等值比较 (=)。创建索引后,IvorySQL (PostgreSQL) 将能够利用索引快速定位数据,显著减少查询时间和资源消耗,特别是在数据量大的情况下。
290+
(1 row)
291+
----
292+
293+
== 最佳实践
294+
295+
=== 提示词(Prompt)编写建议
296+
297+
* **使用英语**:虽然 AI 支持多种语言,但英语效果最佳
298+
* **了解数据库结构**:对数据库结构理解越深入,生成的查询越准确
299+
* **迭代优化**:从宽泛的开始,然后逐步添加细节以改进结果
300+
* **明确指定**:如果知道特定的表或列,请在提示中提及,这有助于 AI 生成精确的查询
301+
302+
=== 错误处理示例
303+
304+
当查询中引用的表不存在时,系统会返回错误信息:
305+
306+
[source,sql]
307+
----
308+
SELECT generate_query('列出所有的商品和价格');
309+
----
310+
311+
错误输出:
312+
313+
----
314+
[INFO] Text generation successful - model: claude-sonnet-4-5-20250929, response_id: msg_20260209135642777cbc5c82ca4a85
315+
316+
ERROR: Query generation failed: Cannot generate query. Referenced table(s) for 'products' or 'goods' do not exist in the database. Available tables: public.orders, public.student_scores, public.users, sys.dual
317+
----
318+
319+
在这种情况下,AI 会告知可用的表列表,帮助用户调整查询。
320+

EN/modules/ROOT/nav.adoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@
3030
*** xref:master/ecosystem_components/pg_cron.adoc[pg_cron]
3131
*** xref:master/ecosystem_components/pgsql_http.adoc[pgsql-http]
3232
*** xref:master/ecosystem_components/plpgsql_check.adoc[plpgsql_check]
33+
*** xref:master/ecosystem_components/pg_ai_query.adoc[pg_ai_query]
3334
*** xref:master/ecosystem_components/pgroonga.adoc[pgroonga]
3435
*** xref:master/ecosystem_components/pgaudit.adoc[pgaudit]
3536
*** xref:master/ecosystem_components/pgrouting.adoc[pgrouting]

EN/modules/ROOT/pages/master/ecosystem_components/ecosystem_overview.adoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ IvorySQL, as an advanced open-source database compatible with Oracle and based o
2222
|*8*| xref:master/ecosystem_components/pgaudit.adoc[pgaudit] | 18.0 | Provides fine-grained auditing, recording database operation logs to support security auditing and compliance checks | Database security auditing, compliance checks, audit report generation
2323
|*9*| xref:master/ecosystem_components/pgrouting.adoc[pgrouting] | 3.8.0 | Provides routing computation for geospatial data, supporting multiple algorithms and data formats | Geospatial analysis, route planning, logistics optimization
2424
|*10*| xref:master/ecosystem_components/system_stats.adoc[system_stats] | 3.2 | Provide functions for accessing system-level statistics. | system monitor
25+
|*11*| xref:master/ecosystem_components/pg_ai_query.adoc[pg_ai_query] | 0.1.1 | AI-driven natural language to SQL extension supporting multiple LLMs | AI-assisted querying, natural language database interaction
2526
|====
2627

2728
These plugins have all been tested and adapted by the IvorySQL team to ensure stable operation in the IvorySQL environment. Users can select appropriate plugins based on business needs to further enhance the capabilities and flexibility of the database system.

0 commit comments

Comments
 (0)