-
Notifications
You must be signed in to change notification settings - Fork 13
feat: upgrade iceberg sdk #105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR upgrades core SDK dependencies including Apache Iceberg, DataFusion, and related libraries. The upgrade adapts the codebase to breaking API changes in these dependencies, primarily focusing on writer builder patterns, error handling, and transaction operations.
Key changes:
- Upgraded iceberg-rust from
1d79e119to2a5201729ca170da906d2f6cc13b6328fc5806adwith associated API adaptations - Updated DataFusion from v45 to v50 and Parquet from v54 to v56.2
- Removed dependency on
iceberg-catalog-memorythroughout the codebase
Reviewed changes
Copilot reviewed 12 out of 13 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| Cargo.toml | Updated workspace dependencies for async-trait, datafusion, iceberg, and parquet; commented out iceberg-catalog-memory |
| core/Cargo.toml | Commented out iceberg-catalog-memory dependency |
| integration-tests/Cargo.toml | Commented out iceberg-catalog-memory dependency |
| core/src/lib.rs | Removed re-export of iceberg_catalog_memory |
| core/src/file_selection/mod.rs | Removed with_delete_file_processing_enabled() call from scan builder |
| core/src/file_selection/strategy.rs | Added unwrap_or_default() for optional equality_ids field |
| core/src/executor/mod.rs | Replaced dir_path with location_generator in RewriteFilesRequest struct |
| core/src/executor/iceberg_writer/rolling_iceberg_writer.rs | Added partition_key support to writer and builder patterns |
| core/src/executor/datafusion/mod.rs | Refactored writer construction to use TaskWriter and RollingFileWriterBuilder |
| core/src/executor/datafusion/iceberg_file_task_scan.rs | Removed .await from synchronous read operation; boxed ArrowError parameters |
| core/src/executor/datafusion/datafusion_processor.rs | Improved handling of equality_ids with explicit error for missing field |
| core/src/compaction/mod.rs | Updated transaction API calls to use new setter methods and apply pattern |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| let task_stream = futures::stream::iter(vec![Ok(task)]).boxed(); | ||
| let arrow_reader_builder = ArrowReaderBuilder::new(file_io.clone()).with_batch_size(max_record_batch_rows); | ||
| let mut batch_stream = arrow_reader_builder.build() | ||
| .read(task_stream) |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The removal of .await from line 319 appears incorrect. The read() method returns a Future that must be awaited to obtain the batch stream. Without .await, the code attempts to call map_err() on a Future rather than on the Result it produces.
| .read(task_stream) | |
| .read(task_stream) | |
| .await |
| iceberg::spec::DataFileFormat::Parquet, | ||
| ); | ||
|
|
||
| // Noop wrapper for `DataFileWriterBuilder` |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected spelling of 'Noop' to 'No-op' (standard hyphenation for 'no operation').
| // Noop wrapper for `DataFileWriterBuilder` | |
| // no-op wrapper for `DataFileWriterBuilder` |
No description provided.