Skip to content

NPI-4453 Framework for DataFrame hashing & test baselining#110

Merged
treefern merged 5 commits intomainfrom
NPI-4453-implement-hash-baselining-unit-tests
Feb 18, 2026
Merged

NPI-4453 Framework for DataFrame hashing & test baselining#110
treefern merged 5 commits intomainfrom
NPI-4453-implement-hash-baselining-unit-tests

Conversation

@treefern
Copy link
Collaborator

@treefern treefern commented Feb 5, 2026

Introduces a framework for baselining lists of DataFrames (and in future, other object types) produced by unit tests, then checking for regression against these in subsequent runs.

Workflow

Baselining mode

The exact DataFrame outputs of a unit test can be 'baselined' within unit tests:

df_1 = load_some_data()
df_2 = transform_something()
self.assertEqual(df_1, some_value)
...
df_list = [df_1, df_2]
UnitTestBaseliner.mode = "baseline"
UnitTestBaseliner.create_baseline(df_list)

The baseline is comprised of two files (a sha256 hash, and a pickled list[object]) which can be committed along with relevant changes.
To prevent accidental baselines being created, UnitTestBaseliner.mode = "baseline" must be set, which turns off verify mode and raises warnings to reduce the risk of this state being committed.

Verification mode

Subsequent unit test runs can call UnitTestBaseliner.verify(), passing a list[object], of the outputs they have produced.

df_1 = load_some_data()
df_2 = transform_something()
self.assertEqual(df_1, some_value)
...
df_list = [df_1, df_2]
UnitTestBaseliner.verify(df_list)

This compares the current unit test output against the baseline on file (using just hash for detection).

This will allow detection of regressions too subtle to be found by our existing unit tests.

Troubleshooting regressions with DataFrame diffs

When a unit test invokes verify() and it fails (hash not valid), the verify() function can load the baselined list[object] from the pickle file, and print diffs between these, and the current output of the unit test.

  • Currently this is only supported for Dataframe and will fail if other data types are included.

⚠️ NOTE: Due to the security implications of deserialization, it must be explicitly enabled when needed, with UnitTestBaseliner.enable_unpickling = True

Baseline file storage

Baseline files are stored at:
gnssanalysis/tests/unittest_baselines/<class_name>/<unittest_function_name>.{pickledlist,pickledlistsha256}

Note: When create_baseline() or verify() is invoked, the names of the calling class and function are determined automatically using frame inspection.

This means simply invoking them from within a unit test, will cause a corresponding directory and baseline files to be written or read at the path noted above.

@treefern treefern requested a review from ronaldmaj February 5, 2026 17:20
@treefern treefern self-assigned this Feb 5, 2026
… to support any object type, not just DataFrames
@treefern treefern changed the title NPI-4453 DataFrame hashing & test baselining NPI-4453 Framework for DataFrame hashing & test baselining Feb 9, 2026
Copy link
Collaborator

@ronaldmaj ronaldmaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a quick read through the code, high-level understand that this introduces a framework for having data which we can regression test against - if answers change, we need to investigate.

I think this is a nice framework, as it allows us to update the hashes (and pickled versions of the data) in case the new values are actually better - having to do manual overwrite is a feature.

Ran the tests locally as well, and in it's current form works fine.
I peppered in some comments but it's nothing major and I think it is good to go in it's current form 👍

try:
df = DataFrame(["a", "b", "c"])

# Baseline (do not commit uncommented!) Note: every function needs its own baseline, becuase the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: becuase

# likely fail).

# We're only testing it with the verify function below, but both verify and baseline functions use the same
# caller check logic, and store the caller record statically in a class variable. ?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the question mark here for a reason?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure on the specific terminology. Checked and updated with a clarification.

"DF / object list verification should succeed here (unless baseline files are missing, or baselining has been turned on)",
)

# The local variable df still points to the same DF, so now the list contains [a,b,b]. This should be an error.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure where the dataframe [a,b,b] came from? You mean ["b", "c", "d"] which is what the df var now points at?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was intended to be shorthand for the different DataFrame objects, rather than their content.
This check doesn't care about the data the object stores, just the object's memory address.


df = DataFrame(["a", "b", "c"])

# Baseline (every function needs its own baseline, becuase the function name determines the filename,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: becuase

)

# The local variable df still points to the same DF, so now the list contains [a,b,b]. This should be an error.
objects_to_hash.extend([df])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are trying to add the same dataframe here to objects_to_hash and that is what is going to cause the error when verifying right? Because you have a duplicate?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. This checks for duplicate references to the same objects at the top level (as a safety check). It's not recursive, but the top-level check is arguably the most important.

self.fail("DF / object list verification should fail on *second*/repeated calls from a function.")

def test_duplicate_object_rejection(self):

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No description of the test here, whereas the other previous has an intro to what is being tested


class TestUnitTestBaseliner(unittest.TestCase):

def test_verify_refusal_in_wrong_mode(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Short intro to what is being tested would be good here (kinda implied in the name, but could be good to add in)

@treefern treefern merged commit 9bafcae into main Feb 18, 2026
4 checks passed
@treefern treefern deleted the NPI-4453-implement-hash-baselining-unit-tests branch February 18, 2026 06:09
@treefern
Copy link
Collaborator Author

treefern commented Feb 19, 2026

Minor changes suggested above, are now on: #111

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants