Log a hash upon vp-setupfit of the fk tables by jekoorn · Pull Request #2476 · NNPDF/nnpdf

jekoorn · 2026-05-21T09:25:55Z

No description provided.

jekoorn · 2026-05-21T12:56:50Z

I have one question and one issue:

It is somewhat non-trivial to get the metadata from pineappl. I tried:
fk = fk_table.FkTable.read(str(fk_tablepath))
meta = fk.metadata
But somehow this gives me TypeError: object supporting the buffer API required. Do we really want the metadata to be hashes, or is the full fk table hashed ok?

Hashing the full fk table does work, but then I have no idea to access this file_content from within the run. Somehow, config_class doesn't have it? This I don't fully understand. Any ideas? I'm still getting used to working with this structuring.

scarlehoff · 2026-05-21T13:01:48Z

 # To change the configuration of the NNPDF code framework

-nnpdf_share: ~/.local/share/NNPDF
+nnpdf_share: /data/theorie/jkoorn/hep/share/NNPDF


You can have your own nnprofile.yaml in ~/.config/NNPDF/nnprofile.yaml with only some of the keys and those will overwrite the NNPDF standard profile.

scarlehoff · 2026-05-21T13:02:05Z

+    @property
+    def theories_path(self):
+        return self._theories_path
+


When you load a theory you have access to the path, you don't need that here.

scarlehoff · 2026-05-21T13:14:54Z


 SETUPFIT_FIXED_CONFIG = dict(
    actions_=['datacuts check_t0pdfset', 'theory evolven3fit_checks_action']
 )


The elegant way of doing this is creating a validphys action that takes the theory and the data and then you add them here.

I think it makes sense to make it part of the checks of n3fit (n3fit/checks.py) since you will be checking that you can load all the theories. For instance look at check_consistent_basis. You can have something like

def fktable_hasher(theoryid, dataset_inputs):

The variable theoryid here will already be parsed by validphys and will contain the path to the theory if you need to (theoryid.path).

But in this case, you don't even need that, because the data already contains a reference to the fktables if you let validphys do its thing:

def fktable_hasher(data): for dataset in data.datasets: fkspecs = dataset.fkspecs for fk in fkspecs: print(fk.fkpath)

You can test this with the validphys API:

from validphys.api import API dataset_inputs = [{"dataset": "ATLAS_Z0_8TEV_HIMASS_M-Y"}] # you can put as many as you want, I'm just testing with one # The API basically receives the same thing that the runcard will give and it is very useful for testing data = API.data(dataset_inputs=dataset_inputs, theoryid = 40_000_000, use_cuts="internal") for dataset in data.datasets: fkspecs = dataset.fkspecs for fk in fkspecs: print(fk.fkpath)

Note that fk.fkpath is still a tuple of paths since many fktables are separated into different bins.

I think what we want to have at the end of the day is a log file like

theory_40000000/fastkernel/ATLASDY2D8TEV-aMCfast_obs_0.pineappl.lz4 - hash from the metadata

So this means we write the hash in this n3fit/checks.py? That feels like a weird place to write it from. I don't fully understand why we would want to make it a check as opposed to something we do in vp-setupfit. I understand the argument for 'outsourcing' the work of loading the dataset names and fk table to validphys, but why in checks?

You can also create a new module, for instance under io, that contains the function that does the hash.

It's not outsourcing, vp-setupfit calls validphys actions that do a few dufferent things, among which the hash

scarlehoff · 2026-05-25T09:44:07Z

 # To change the configuration of the NNPDF code framework

-nnpdf_share: ~/.local/share/NNPDF
+nnpdf_share: ~/.local/share/NNPDF/theories


Suggested change

nnpdf_share: ~/.local/share/NNPDF/theories

nnpdf_share: ~/.local/share/NNPDF

scarlehoff · 2026-05-25T09:45:25Z



 @n3fit.checks.check_consistent_basis
+@n3fit.checks.fktable_hasher


I didn't mean to add it as a check but rather to add it in that file (because in my mind, for organizational purposes it made sense)
but given the confusion perhaps it was a bad idea, and it is better to add a new module to n3fit/src/n3fit/io/ something like hash_creator.py or whatever.

scarlehoff · 2026-05-25T09:47:20Z

+    for dataset in data.datasets:
+        fkspecs = dataset.fkspecs
+        for fk in fkspecs:
+            fkpath = fk.fkpath[0]


Why fk.fkpath[0]?

With respect to the comment below, note that you can get, as argument to this function, also output_path which is a path to the runfolder, so your file can be

mdf5fk_path = output_path / "md5fk"

becuase fk.fkpath is a tuple
if the goal is to hash all fk tables where there are multiple for 1 dataset (e.g. WP/WM, or multiple bins), I will have to change this logic. But this is a shortcut. My logic was that if an fk table is recomputed, all are recomputed for that dataset. Afaik this is how pineko works, or is it not?

never mind, i am working around it now and hashing each fk table individually.

It is how pineko works (by default) but I think we want this feature precisely for when weird things happen so better to have all of them!

exaclty! Fully agreed

jekoorn · 2026-05-26T13:55:36Z

My most recent push works and provides a nice file with hashes of all fk tables, bin by bin:

LHCB_WP_8TEV 298b18d4a89533f706647adab34c14c5
LHCB_WM_8TEV 10c3715775919c22aaa31e92a0d5d685
LHCB_WP_8TEV 298b18d4a89533f706647adab34c14c5
LHCB_WM_8TEV 10c3715775919c22aaa31e92a0d5d685
LHCB_DY_8TEV 1a23d3ed5dbd3306caf6cbb4dc96cc31
LHCB_DY_13TEV_DIMUON f562e5f1987fcfd356d0c8fc6dbe5305
LHCB_DY_13TEV_DIELECTRON 18f346386f6af426aa0d4f162caa8b94
ATLAS_TTBAR_13P6TEV_TOT_X-SEC e0ece1da0bf96a99bdfe89b8aad0606c
ATLAS_TTBAR_5TEV_TOT_X-SEC 90b7981fec44cfe03909997ce9a2abf2
ATLAS_TTBAR_13TEV_TOT_X-SEC c42b007dd6fe15c1c842330ce74d498b
CMS_TTBAR_13TEV_TOT_X-SEC c42b007dd6fe15c1c842330ce74d498b
CMS_TTBAR_13TEV_TOT_X-SEC c42b007dd6fe15c1c842330ce74d498b
CMS_TTBAR_13P6TEV_TOT_X-SEC 11d0210f4b045884104c560ad3f17f9a
H1_1JET_319GEV_290PB-1_DIF_PTQ2_BIN1 21282a3bd61323ed59a43d63a12e9860
H1_1JET_319GEV_290PB-1_DIF_PTQ2_BIN2 d48533eb8bc8815bbc67d08bffd09c91
H1_1JET_319GEV_290PB-1_DIF_PTQ2_BIN3 f52c386c8d68153e62fc7fb09afd7637

scarlehoff

Just a few more organizational points about actions, checks and so.

scarlehoff · 2026-05-26T14:12:23Z

+
+            # and save the fk hashes
+#            fk_hashes = self.config_class["_fk_hashes"]
+#            self.environment.save_fk_md5(fk_hashes)


Please dont' forget to remove the comments and the dead code (the old save_fk_md5)

scarlehoff · 2026-05-26T14:13:23Z

You can add fktable_hasher as an action here

scarlehoff · 2026-05-26T14:14:49Z

        log.error(f"No eko found for {theoryid}")
+
+
+@make_argcheck


You can remove make_argcheck from here and add this function to the n3fit_checks_providers. Then it will be understood as an action by reportengine and will be run after the other vp-setupfit actions.

If I do this, it tells me that fktable_hasher is missing output_path as a positional argument. I have removed the make_argcheck and added it to the n3fit_checks_providers. So concretely, what do you mean by "add this function to the n3fit_checks_providers"?

Sure, don't add it as a check anywhere. The function fktable_hasher is an action by itself. You can put this function in n3fit_checks_providers.

Then, since you added it as an action to vp_setupfit (btw, you need to tell it about the theory, just like with evolven3fit_checks_action next to it) reportengine will call fktable_hasher and will find out which arguments does it need.

Basically, remove all @n3fit.checks.fktable_hasher

I am now telling it about the theory in vp_setupfit, and I put the function in n3fit_checks_provider, and it doesn't error anymore like it did earlier, but nothing happens, i.e. it doesn't run the function for some reason. I genuinely have no idea what I am then missing.

The function that you left in n3fit_checks_provider is not the function that was in checks.py. It is a function that does nothing.

ahh, so the idea was to put the whole thing in n3fit_checks_provider? I don't fully understand this.

Anyway, works now.

The file n3fit_checks_provider is registered as a provider of actions in vp-setupfit. Therefore, any function there can be called as an action. The file n3fit_checks_provider is nothing magic, it just happen to be already registered as a provider. You could also put it in a separate module and register that.

When you try to call a "reportengine action" what reportengine does is it tries to build a graph from your desired action (and its dependences) to the input (your runcard). If it manages to (for instance, you ask for the covmat, the runcard contains datasets, but from that reportengine knows how to do the covmat) then all ok and continues running. Otherwise it fails and complains.

scarlehoff

I don't think this version works (the failing tests give some hints why).

Btw, make sure to run pre-commit on the files you changed:

git add <file>
pre-commit

(also, please rebase on top of master, being a rebase much preferred with respect to a merge)

scarlehoff · 2026-05-27T11:59:26Z

+    theory and dataset are numerically identical.
+    """
+    MD5FK_FILENAME = "md5fk"
+    md5fk_path = output_path / MD5FK_FILENAME


I guess it makes sense to define this constant as a module level variable instead of here.

scarlehoff · 2026-05-27T12:20:28Z

+                for fkpath, table_name in zip(fk.fkpath, table_names):
+                    for fkpath, table_name in zip(fk.fkpath, table_names):
+                        fkhash = hashlib.md5(fkpath.read_bytes()).hexdigest()
+                        f.write(f"{table_name} {fkhash}\n")


I don't think these loops are correct.

First there's the double loop at the end that looks repeated to me.but most importantly, the loop above,

table_names = [name for group in fk.metadata.FK_tables for name in group]

is using the metadata for the names but I think this is "rewinding" part of the work you are already doing by doing fk?

I think you can simplify this a lost by doing something like:

for fk in fkspecs: # Make a list of the FK tables for fkpath in fk.fkpath: fkhash = hashlib.md5(fkpath.read_bytes()).hexdigest() f.write(f"{fkpath.name} {fkhash}\n")

True, the name will include ".pineappl.lz4" but that's fine. You can remove that if you want. This version should be correct (but please check) for both compound fktables and also things with operations in the middle (I don't think your loop would deal with them correctly?)

The only thing I'm not sure is old fktables because those were slightly different, to be safe you can do:

for fk in fkspecs: # Make a list of the FK tables fkpaths = fk.fkpath if not isinstance(fkpaths, tuple): fkpaths = (fk.fkpath,) for fkpath in fk.fkpath: fkhash = hashlib.md5(fkpath.read_bytes()).hexdigest() f.write(f"{fkpath.name} {fkhash}\n")

I think with this it will work also for the tests that are now failing (the old fktables), but please check.

jekoorn · 2026-05-28T12:41:29Z

I rebased, used pre-commit, and implemented these changes. Then I checked that for both 4.1 and 4.0 theories, all fk tables are hashed. Let's see what the tests do.

scarlehoff · 2026-05-28T12:46:42Z

Thanks!

I think you rebased perhaps on an old copy of master? Somehow you took many changes from other branches that should already be merged.

With

git rebase -i

you can select which commits you actually want (other wise you can always cherry-pick)

jekoorn · 2026-05-28T12:49:08Z

Yeah I am a github noob, sorry. I'll try to tidy it up

jekoorn · 2026-05-28T13:58:59Z

fixed now

jekoorn · 2026-05-28T16:01:14Z

Why is it failing the test?
The version of nnpdf I rebased on, https://github.com/NNPDF/nnpdf/actions/runs/26447954188/job/77879523594?pr=2411, does pass al tests (obviously), so I wonder whether the fact that I rebased on a version with new data, makes the fit test go crazy. Since this must change the fit somehow.

scarlehoff · 2026-05-28T17:29:05Z

In both cases due to a certain randomness on where the fits are running. There are PRs and issues deañing with the issue already but I guess it might still take somr weeks.
For now the only you can do (short of taking over said issues) is to rerun the tests. There's a button for that in the interface.

jekoorn added 2 commits May 21, 2026 11:25

initial commit

75e8ce3

implemented fk hash

893f4dd

removed pdb.set_trace

67a26b5

scarlehoff reviewed May 21, 2026

View reviewed changes

reset nnprofile_example

237958d

scarlehoff reviewed May 21, 2026

View reviewed changes

jekoorn added 4 commits May 21, 2026 15:17

theory path using built in TheoryIDSpec

77817c7

add jcm suggestion

f4f8ec8

hash functionality

2b81d95

compute hash

e54d83b

scarlehoff reviewed May 25, 2026

View reviewed changes

jekoorn added 2 commits May 26, 2026 15:50

write hash

d7eb505

fixed loop that flattened one time too much

cd55a2a

scarlehoff reviewed May 26, 2026

View reviewed changes

jekoorn added 6 commits May 26, 2026 16:25

clean up code

0689c34

remove make_argcheck

3f8c2b5

remove pdb

ee081de

changes

5485eb3

moved fktable_hasher to n3fit_checks_provider

2acc738

removed redundant line

9bdc949

scarlehoff requested changes May 27, 2026

View reviewed changes

corrected loop

fa4e290

corrected loop

0a515c9

jekoorn force-pushed the vpsetupfit_fkhash branch from a789325 to 0a515c9 Compare May 28, 2026 13:58

	nnpdf_share: ~/.local/share/NNPDF/theories
	nnpdf_share: ~/.local/share/NNPDF



		@n3fit.checks.check_consistent_basis
		@n3fit.checks.fktable_hasher

Conversation

jekoorn commented May 21, 2026

Uh oh!

jekoorn commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jekoorn commented May 26, 2026

Uh oh!

scarlehoff left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

scarlehoff left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jekoorn commented May 28, 2026

Uh oh!

scarlehoff commented May 28, 2026

Uh oh!

jekoorn commented May 28, 2026

Uh oh!

jekoorn commented May 28, 2026

Uh oh!

jekoorn commented May 28, 2026

Uh oh!

scarlehoff commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jekoorn commented May 21, 2026 •

edited

Loading