refactor#35
Merged
Merged
Conversation
Closed
vinulw
reviewed
May 28, 2026
| | `OasisDaskReader` | Dask | CSV, Parquet | | ||
| | `OasisPyarrowReader` | PyArrow | Parquet | | ||
|
|
||
| Format-specific subclasses (`OasisPandasReaderCSV`, `OasisDaskReaderParquet`, etc.) are also available. |
Contributor
There was a problem hiding this comment.
minor but I don't think the config has the OasisPyarrowReaderParquet, it's redundant but might be worth including for correctness of this line in the README.
Contributor
There was a problem hiding this comment.
In oasis_data_manager/df_reader/backends/pandas.py you have OasisPandasReader, OasisPandasReaderCSV and OasisPandasReaderParquet.
In oasis_data_manager/df_reader/backends/pyarrow.py only the OasisPyarrowReader is there, I think according the the README you expect there to also be a OasisPyarrowReaderParquet and this needs to be added to the aliases in the config parsing section.
Contributor
Author
There was a problem hiding this comment.
Oooh complete misread from me I was looking at the daskreader, will update
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR cleans up the OasisDataManager library with ergonomic improvements, bug fixes, and expanded test coverage. Changes span the public API, internal reader logic, storage backends, and exception naming.
Public API improvement
Storage backend module renames
filestore/backends/aws_s3.py→filestore/backends/aws.py(canonical path)filestore/backends/azure_abfs.py→filestore/backends/azure.py(canonical path)DeprecationWarningon import, so existing code continues to work.Exception rename
OasisExceptionrenamed toOasisDataManagerExceptionto better reflect the library it belongs to and avoid confusion with the same name used elsewhere in the Oasis platform.OasisExceptionis kept as a backward-compatible alias (same class object) so nothing breaks.MissingInputsExceptionupdated to subclassOasisDataManagerException.Aliases for easier use in config
Bug fixes
Dask
RecursionErroron parquet readsOasisReader._read()now setshas_read = Truebefore callingread_parquet()/read_csv(), wrapped in a try/except that resets the flag on failure.read_parquet()to re-enter_read()viaself.df(a property that calls_read()), producing infinite recursion.Dask
copy_with_dftype mismatchOasisDaskReader.copy_with_df()now converts any incoming pandas DataFrame to a Dask DataFrame before passing it to the base implementation.self._dfas pandas, causingAttributeErrorwhenas_pandas()called.compute()on it.Double
_read()callsOasisReader.filter()andOasisReader.as_pandas()now accessself._dfdirectly instead of going through theself.dfproperty, eliminating a redundant second_read()call.Code quality
for/else/breakloop in_read()with a one-lineany()expression.super()calls: Updated tosuper()(no arguments) inAwsS3Storage,AzureABFSStorage,MissingInputsException, andOasisDataManagerException..format()call inMissingInputsException.__init__with an f-string.delete_file()anddelete_dir()inBaseStoragenow callself.logger.info()instead of the bare module-levellogging.info(), consistent with the rest of the class. Fixed a "Unknwon" typo in the log message.AwsS3Storage.config_optionsserialization: AWS and Azure backends now store the originalroot_dirargument (self._root_dir_arg) before joining it onto the bucket/container path, and use it inconfig_options. This avoids a fragilePath.relative_to()reverse-computation that could fail if the paths didn't align.ComplexData.run()clarity: Added a comment explaining thefetch_requiredlogic — CSV and Parquet files are read directly by the df_reader, sofetch()is only needed for formats the reader cannot handle directly.Test coverage
New test files
tests/df_reader/test_pyarrow.py: PyArrow backend tests covering parquet reads, column selection, and filter predicates.tests/filestorage/test_storage_utils.py: Tests forBaseStorage.create_traceback()andAwsS3Storage._strip_signing_parameters().New tests in existing files
test_read_csv.py/test_read_parquet.py:OasisReader.query(),copy_with_df(), andOasisDaskReader.read_from_dataframe().test_from_dataframe.py: Passing a pandas DataFrame viadataframe=to a Dask reader.test_caching.py:OasisDataManagerExceptionbackward-compat alias verification.Test fixes
query()test: added.compute()call for lazy scalar results (e.g.frame["D"].sum()).test_complex/test_base.py: guarded dask import withpytest.importorskipso the file is skipped cleanly when Dask is not installed.type: ignorecomments on optional-dependency fallback assignments updated to suppress the correct error codes.Readme creation