Skip to content

Conversation

@MBkkt
Copy link
Collaborator

@MBkkt MBkkt commented Oct 2, 2025

We don't read from this variable except to string.
And we never write to this variable except one case.

So in most case to string will produce incorrect information.

Also this variable isn't really needed, I think it's better to remove it
Even for partitioned write/etc, it's not really used in any ongoing PR

@MBkkt MBkkt requested a review from mbasmanova October 2, 2025 10:35
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 2, 2025
@MBkkt MBkkt changed the title fix: Remove unused code that can confuse reader fix: Remove unused confusing code Oct 2, 2025
@MBkkt
Copy link
Collaborator Author

MBkkt commented Oct 3, 2025

@mbasmanova will you review, please?

@mbasmanova mbasmanova changed the title fix: Remove unused confusing code refactor: Remove unused confusing code Oct 3, 2025
/// Distribution of data.
/// There is copartitioning if the DistributionType is the same on both sides
/// and both sides have an equal number of 1:1 type matched partitioning keys.
struct DistributionType {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DistributionType is incomplete, but removing numPartitions is not going to fix that. It needs more information about partitioning. It looks like @hdikeman is wrapping up Connector API changes for Table Write and I should have bandwidth to work on adding TableWrite support to the optimizer. As part of that work I expect to come and revisit DistributionType struct.

Copy link
Collaborator Author

@MBkkt MBkkt Oct 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I made it complete in different PR, where partitioning is a function.
But it's quite nontrivial PR that contains different parts, so I decided to start with simple part.

Number of partitions not needed, because count of partitions controlled by optimizer options (numDrivers/numWorkers) when we plan partition.
And it doesn't needed for broadcast/gather, because they have specific behavior.
The only case where is it needed it's write table, but for this case it should be implemented in more abstract way with partition type/function (that internally have number of buckets for hive for an example).
Also table scan, but it's not implement partitioning now

About @hdikeman work, I have PR that implements TableWrite in optimizer, I plan to rebase it after Henry PR with connector api changes will be merged.
So this PR will contains implementation for TableWrite in optimizer.
It will be for TestConnector and for LocalHiveConnectorMetadata but with copartition for hive disabled because it requires some changes in runner.

Does it sounds ok to you?

@mbasmanova
Copy link
Contributor

I have PR that implements TableWrite in optimizer, I plan to rebase it after Henry PR with connector api changes will be merged.

@MBkkt Any chance you could rebase it now? I'd like to start reading it without waiting for Henry's PR to land.

@MBkkt
Copy link
Collaborator Author

MBkkt commented Oct 14, 2025

I close this PR because this change was accounted in my other, more complete PR: #498

@MBkkt MBkkt closed this Oct 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants