Skip to content

Commit 64eb931

Browse files
fix shuffle join for recursive cte (#22055)
### **User description** ## What type of PR is this? - [ ] API-change - [ ] BUG - [ ] Improvement - [ ] Documentation - [ ] Feature - [ ] Test and CI - [ ] Code Refactoring ## Which issue(s) this PR fixes: issue #matrixorigin/MO-Cloud#5764 ## What this PR does / why we need it: fix shuffle join for recursive cte ___ ### **PR Type** Bug fix ___ ### **Description** - Fix shuffle join handling for recursive CTE operations - Add proper batch termination check for Last() batches - Add test case for recursive CTE with large dataset ___ ### **Changes walkthrough** 📝 <table><thead><tr><th></th><th align="left">Relevant files</th></tr></thead><tbody><tr><td><strong>Bug fix</strong></td><td><table> <tr> <td> <details> <summary><strong>shuffle.go</strong><dd><code>Add Last batch handling in shuffle</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> pkg/sql/colexec/shuffle/shuffle.go <li>Remove empty line import formatting<br> <li> Add condition to handle <code>bat.Last()</code> case by returning result directly<br> <li> Prevent processing of last batches in shuffle operations </details> </td> <td><a href="https://github.com/matrixorigin/matrixone/pull/22055/files#diff-dd874410fa94e53ef1b1a5d4d599903dd307785afc25ad7d334874218f8cf73f">+2/-1</a>&nbsp; &nbsp; &nbsp; </td> </tr> </table></td></tr><tr><td><strong>Tests</strong></td><td><table> <tr> <td> <details> <summary><strong>recursive_cte.result</strong><dd><code>Update test results for recursive CTE</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> test/distributed/cases/recursive_cte/recursive_cte.result <li>Add newline at end of file<br> <li> Add test results for new recursive CTE test case with large dataset<br> <li> Show expected count result of 0 for the new test </details> </td> <td><a href="https://github.com/matrixorigin/matrixone/pull/22055/files#diff-3e8dee7ccef2614ba5e9bc5d56dd46d2386b1144c89cdffc45ac3479bb3dcd86">+8/-1</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td> <details> <summary><strong>recursive_cte.sql</strong><dd><code>Add recursive CTE test case</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> test/distributed/cases/recursive_cte/recursive_cte.sql <li>Add new test case with table creation and large dataset insertion<br> <li> Add recursive CTE query with specific ID filtering and tenant <br>conditions<br> <li> Add table cleanup statements </details> </td> <td><a href="https://github.com/matrixorigin/matrixone/pull/22055/files#diff-9c73dbc8fa464a2a39025aec6058d797dabfb60f413e80d49e0ba667cda50bc2">+5/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> </table></td></tr></tr></tbody></table> ___ > <details> <summary> Need help?</summary><li>Type <code>/help how to ...</code> in the comments thread for any questions about Qodo Merge usage.</li><li>Check out the <a href="https://qodo-merge-docs.qodo.ai/usage-guide/">documentation</a> for more information.</li></details>
1 parent c55a2c9 commit 64eb931

File tree

3 files changed

+15
-2
lines changed

3 files changed

+15
-2
lines changed

pkg/sql/colexec/shuffle/shuffle.go

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,6 @@ package shuffle
1616

1717
import (
1818
"bytes"
19-
2019
"github.com/matrixorigin/matrixone/pkg/container/batch"
2120
"github.com/matrixorigin/matrixone/pkg/container/types"
2221
"github.com/matrixorigin/matrixone/pkg/container/vector"
@@ -90,6 +89,8 @@ SENDLAST:
9089
if bat == nil {
9190
ap.ctr.ending = true
9291
goto SENDLAST
92+
} else if bat.Last() {
93+
return result, nil
9394
} else if !bat.IsEmpty() {
9495
if ap.ShuffleType == int32(plan.ShuffleType_Hash) {
9596
bat, err = hashShuffle(ap, bat, proc)

test/distributed/cases/recursive_cte/recursive_cte.result

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -91,4 +91,11 @@ Bob 1
9191
Charlie 1
9292
David 2
9393
Eve 2
94-
Frank 2
94+
Frank 2
95+
drop table if exists t1;
96+
create table t1(id bigint primary key, parent_id bigint, tenant_id varchar(50));
97+
insert into t1 select *,*,* from generate_series(1000000) g;
98+
WITH recursive tb (id, parent_id) AS (SELECT id,parent_id FROM t1 WHERE id IN ( 1937478033946447874, 1,2,3) AND tenant_id != '000000' UNION ALL SELECT c.id, c.parent_id FROM t1 c JOIN tb t ON c.id = t.parent_id WHERE c.tenant_id != '000000') select count(*) from tb;
99+
count(*)
100+
6
101+
drop table if exists t1;

test/distributed/cases/recursive_cte/recursive_cte.sql

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,3 +28,8 @@ CREATE TABLE employees_hierarchy (id INT PRIMARY KEY, name VARCHAR(50),manager_i
2828
INSERT INTO employees_hierarchy (id, name, manager_id) VALUES(1, 'Alice', NULL), (2, 'Bob', 1),(3, 'Charlie', 1),(4, 'David', 2),(5, 'Eve', 2),(6, 'Frank', 3);
2929
WITH RECURSIVE employee_hierarchy_cte (id, name, manager_id, level) AS (SELECT id, name, manager_id, 0 FROM employees_hierarchy WHERE name = 'Alice' UNION ALL SELECT e.id, e.name, e.manager_id, eh.level + 1 FROM employees_hierarchy AS e JOIN employee_hierarchy_cte AS eh ON e.manager_id = eh.id) SELECT name, level FROM employee_hierarchy_cte;
3030
WITH RECURSIVE employee_hierarchy_cte (id, name, manager_id, level) AS (SELECT id, name, manager_id, 0 FROM employees_hierarchy WHERE name = 'Alice' UNION ALL SELECT e.id, e.name, e.manager_id, eh.level + 1 FROM employees_hierarchy AS e JOIN employee_hierarchy_cte AS eh ON e.manager_id = eh.id) SELECT t.name, t.level FROM employee_hierarchy_cte as t;
31+
drop table if exists t1;
32+
create table t1(id bigint primary key, parent_id bigint, tenant_id varchar(50));
33+
insert into t1 select *,*,* from generate_series(1000000) g;
34+
WITH recursive tb (id, parent_id) AS (SELECT id,parent_id FROM t1 WHERE id IN ( 1937478033946447874, 1,2,3) AND tenant_id != '000000' UNION ALL SELECT c.id, c.parent_id FROM t1 c JOIN tb t ON c.id = t.parent_id WHERE c.tenant_id != '000000') select count(*) from tb;
35+
drop table if exists t1;

0 commit comments

Comments
 (0)