Skip to content

Conversation

@oerling
Copy link
Contributor

@oerling oerling commented Oct 23, 2025

When we import an already placed table or tables into a build side for cardinality reduction as a semijoin:

  • The colums on the right side of the semijoin must not be added to downstreamColumns. A semijoin (exists) or antijoin (not exists) does not produce columns, except maybe a mark. Also, if the join has filters, the columns in the filter that come from the right side are not downstream columns.

  • When a semijoin right side is a join, that join may itself contain existences. The right sides of these existences are not suitable for joining to the table or dt being reduced, since a semijoin right side does not produce columns. So, do not consider these for joining the reducing existence to the thing being reduced.

Fixes #530

When we import an already placed table or tables into a build side for cardinality reduction as a semijoin:

- The colums on the right side of the semijoin must not be added to
  downstreamColumns. A semijoin (exists) or antijoin (not exists)
  does not produce columns, except maybe a mark. Also, if the join has
  filters, the columns in the filter that come from the right side are
  not downstream columns.

- When a semijoin right side is a join, that join may itself contain
  existences. The right sides of these existences are not suitable for
  joining to the table or dt being reduced, since a semijoin right
  side does not produce columns. So, do not consider these for joining
  the reducing existence to the thing being reduced.
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 23, 2025
@mbasmanova mbasmanova requested a review from MBkkt October 23, 2025 21:35
compositeEdge = JoinEdge::makeInner(nullptr, joined);
if (joined == join->rightTable()) {
for (auto i = 0; i < join->numKeys(); ++i) {
compositeEdge->addEquality(
Copy link
Collaborator

@MBkkt MBkkt Oct 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we check we don't make edges with a lot of duplicates?
When I run tpch q9 I noticed there was duplicates in join keys (in velox execution plan)

mbasmanova added a commit to mbasmanova/verax that referenced this pull request Oct 24, 2025
Summary:
Partial fix for facebookincubator#530

Adapted from Orri's facebookincubator#551

When we import an already placed table or tables into a build side for cardinality reduction as a semijoin:

The colums on the right side of the semijoin must not be added to downstreamColumns. A semijoin (exists) or antijoin (not exists) does not produce columns, except maybe a mark. Also, if the join has filters, the columns in the filter that come from the right side are not downstream columns.

Co-authored-by: oerling

Differential Revision: D85464350
meta-codesync bot pushed a commit that referenced this pull request Oct 25, 2025
Summary:
Pull Request resolved: #560

Partial fix for #530

Adapted from Orri's #551

When we import an already placed table or tables into a build side for cardinality reduction as a semijoin:

The colums on the right side of the semijoin must not be added to downstreamColumns. A semijoin (exists) or antijoin (not exists) does not produce columns, except maybe a mark. Also, if the join has filters, the columns in the filter that come from the right side are not downstream columns.

Reviewed By: xiaoxmeng

Differential Revision: D85464350

fbshipit-source-id: 2d8503f51fbe87e551bf3413bd0d607b884681dc

Co-authored-by: oerling
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Adding reducing semi joins may produce invalid plans

2 participants