perf[next-dace]: Enhance LoopBlocking pass by iomaganaris · Pull Request #2578 · GridTools/gt4py

iomaganaris · 2026-04-17T17:50:43Z

Added the following features to the LoopBlocking pass:

Memlets that have independent data are promoted in the outer Map so that they only have to be read once
Added option blocking_independent_node_threshold. LoopBlocking is only going to be applied to Maps that have more than the threshold number independent variables

philip-paul-mueller

I like the changes but there some things that needs to be addressed.

philip-paul-mueller · 2026-04-19T07:31:44Z

        if self.map_entry.map.gpu_block_size is not None:
            return False
+        if self.map_entry.map.gpu_maxnreg > 0:
+            return False


I am actually not sure if this is needed, as maxnreg is only set if it is not set.
However, I assume this changes avoids some other problem, that I don't fully get.

With this check I want to make sure that if the gpu_maxnreg is set for the map from some other transformation we don't change it here but the check in the can_be_applied is wrong so I'm going to change it

philip-paul-mueller · 2026-04-19T07:32:53Z

        default=True,
        desc="If 'True' then blocking is only applied if there are independent nodes.",
    )
+    promote_independent_memlets = dace_properties.Property(


You also need to add them to the doc string of the class, there I would also add what promotion means, probably creating a scalar.

philip-paul-mueller · 2026-04-19T08:13:39Z

        block_var_idx = map_params.index(block_var)
        map_range_size = map_range.size()
+
+        if all(map_range_size_i == 1 for map_range_size_i in map_range_size):


There could be a Sympy issue here, so you have to use:

Suggested change

if all(map_range_size_i == 1 for map_range_size_i in map_range_size):

if all((map_range_size_i == 1) == True for map_range_size_i in map_range_size):

philip-paul-mueller · 2026-04-19T08:17:35Z

    # Set of nodes that are independent of the blocking parameter.
    _independent_nodes: Optional[set[dace_nodes.AccessNode]]
    _dependent_nodes: Optional[set[dace_nodes.AccessNode]]
+    _memlet_to_promote: Optional[set[dace.Memlet]]


This is confusing (and also very strange), but the class Memlet only the payload of the edge in the dataflow graph, although everyone is pretending that it is the actual edge.
Thus the correct type signature is:

Suggested change

_memlet_to_promote: Optional[set[dace.Memlet]]

_memlet_to_promote: Optional[set[dace_graph.MultiConnectorEdge[dace.Memlet]]]

However, as far as I understand it should be possible to change the set into a list.

Since iterate over _memlet_to_promote it should be a list indeed. And you're also right for the type of _memlet_to_promote. Not sure how the syntax checks didn't catch this issue

DaCe is not typechecked, so every type from it is apparently handled as Any.
But you have to ask Enrique if you want more information.

philip-paul-mueller · 2026-04-19T08:54:14Z

        if not self.partition_map_output(graph, sdfg):
            return False
        self._independent_nodes = None
        self._dependent_nodes = None


Currently you couple can_be_applied() with apply() because you do not set self._memlet_to_promote to None.
Thus you require that can_be_applied() has run immediate before apply() has been called.

To solve this you have to reset _memlet_to_promote here and in apply() populate it again.
You can also do it in self._prepare_independent_memlets(), see my other note.

philip-paul-mueller · 2026-04-19T10:00:09Z

+            ]
+            independent_outer_map_as_range = dace_subsets.Range(
+                ranges=[
+                    in_edge.data.subset.ranges[outer_map_entry.map.params.index(idx)]


Since this mapping is fix, you could create a dict outside the loop.

philip-paul-mueller · 2026-04-19T10:25:55Z

+            original_dst_of_in_edge = in_edge.dst
+            original_dst_conn_of_in_edge = in_edge.dst_conn
+            original_dst_other_subset_of_in_edge = in_edge.data.other_subset
+            # Redirect the memlet to the temporary AccessNode
+            dace_helpers.redirect_edge(
+                state=state,
+                edge=in_edge,
+                new_dst=promoted_anode,
+                new_dst_conn=None,
+                new_memlet=dace.Memlet(
+                    data=in_edge.data.data,
+                    subset=in_edge.data.subset,
+                    other_subset=dace_subsets.Range.from_array(sdfg.arrays[promoted_name]),
+                ),
+            )


This might fail in very strange ways if the consumer is an AccessNode.
The following code should handle that case too:

assert in_edge.data.wcr is None simple_direction = not (isinstance(in_edge.dst, dace_nodes.AccessNode) and in_edge.data.data != in_edge.dst.data) first_new_memlet = deepcopy(in_edge.data) if simple_direction: first_new_memlet = dace.Memlet( data=in_edge.data.data, subset=deepcopy(in_edge.data.subset), other_subset=deepcopy(new_subset_as_range), ) else: first_new_memlet = dace.Memlet( data=promoted_name, subset=deepcopy(new_subset_as_range), other_subset=deepcopy(in_edge.data.other_subset), ) second_new_memlet = deepcopy(in_edge.data) if simple_direction: second_new_memlet.data = promoted_name second_name_memlet.subset = deepcopy(new_subset_as_range) else: second_name_memlet.other_subset = deepcopy(new_subset_as_range) state.add_edge( in_edge.src, in_edge.src_conn, promoted_anode, None, first_new_memlet, ) state.add_edge( promoted_anode, None, in_edge.dst, in_edge.dst_conn, second_new_memlet, ) state.remove_edge(in_edge)

This might fail in very strange ways if the consumer is an AccessNode.

If it's an AccessNode then it would be an independent one, so it shouldn't be handled here. Maybe we should add an assert instead

You are right I did not consider that.
However, instead of an assert I would raise a NotImplementedErrror.

philip-paul-mueller · 2026-04-19T10:37:53Z

+                            "Independent memlets should only be inputs to maps that have a single parameter. "
+                            "Those should always be neighbor reductions."
+                        )
+                        edge.data.subset = next(iter(original_dst_of_in_edge.params))


It is correct that you have to update subset here, but does does not make much sense to me.

Not sure I understood your comment. Should I change something?

I mean original_dst_of_in_edge is a MapEntry and it does not have an attribute called params, the Map has this, i.e. original_dst_of_in_edge.map.params would exists.
Map::params stores the iteration variables (ordered accordingly to this function).
However, now you take the first iteration variable, which at this point should be horizontal dimension.
So I do not understand the logic that is applied here.

philip-paul-mueller · 2026-04-19T10:39:10Z

+                            "Those should always be neighbor reductions."
+                        )
+                        edge.data.subset = next(iter(original_dst_of_in_edge.params))
+                        edge.data.other_subset = None


You can not set this to None you must keep it.
Instead there is an else branch missing, there you need to update other_subset (you need to set it to the same value you used for .subset in the then branch.

I thought that if the edge memlet data is not the one that I have touched then I won't have to update anything because then the edge data will be the output data so the subset would be correct. I don't know if there could be a case that the other_subset refers to an outside node data for an edge that stems from a MapEntry

If there is an AccessNode inside a Map and the Memlet inside the Map scope refers to the AccessNode inside, then outher_subset stores what is read from the data outside.
Note that the data attribute of the Memlet outside the Map will refer to the data that is outside.
However, this is a case that is unlikely but not impossible to appear.

Here you have to keep other_subset in case the Memlet ends in an AccessNode.

philip-paul-mueller · 2026-04-19T10:40:36Z

+            elif isinstance(original_dst_of_in_edge, dace_nodes.NestedSDFG):
+                raise NotImplementedError("Promotion of memlets to NestedSDFG not implemented yet.")
+            elif isinstance(original_dst_of_in_edge, dace_nodes.LibraryNode):
+                raise NotImplementedError(
+                    "Promotion of memlets to LibraryNode not implemented yet."
+                )


I think you can remove them, since original_dst_of_in_edge is always classified as a dependent node.

Not sure why I could remove these checks. Couldn't we have a dependent NestedSDFG or LibraryNode?

philip-paul-mueller · 2026-04-21T08:22:26Z

+            for subset_range in in_edge.data.subset.ranges:
+                if subset_range not in independent_outer_map_as_range.ranges:
+                    new_subset.append(subset_range)


This looks a bit brittle, because you look at sizes.
However, currently I do not have a better idea, beside looking at the subset and checking if it contains the blocking parameter, but I am not sure if this is better.

philip-paul-mueller · 2026-04-21T08:22:59Z

+            assert len(new_subset) > 0, (
+                "After removing the independent dimensions there should be at least one dimension left to promote."
+            )


I think the number of dimensions to promote should be one less than before?

philip-paul-mueller · 2026-04-22T07:07:48Z

+            )
+
+            if isinstance(original_dst_of_in_edge, dace_nodes.MapEntry):
+                for edge in state.out_edges(original_dst_of_in_edge):


I did not saw this before, but iterating over the out edges is not enough, as there could be nested Maps.
I think there is something in utils, the reroute or so that can help you do it or at least give you some hints on how to do it.

philip-paul-mueller · 2026-04-22T07:09:01Z

+
+            if isinstance(original_dst_of_in_edge, dace_nodes.MapEntry):
+                for edge in state.out_edges(original_dst_of_in_edge):
+                    if edge.data.data == in_edge.data.data:


The else branch is missing.
There you have to update other_subset.

iomaganaris and others added 25 commits November 21, 2025 16:18

Fix k blocking for non independent option

32476d7

PoC for moving temporary outside kloop

8547aba

Working version

e265480

Enable maxnreg setting

0cb5f5b

Make sure that the inner loop doesn't get unrolled

dcc74d4

Enable promote_independent_memlets option

d0d567f

Enable loop blocking if there's any independent memlet to promote

ec11475

Added initial test and fixed pass

0846ba8

Fixing test

8c7ce09

If we don't require_independent_nodes always apply

2ac4aa7

Fix most of the tests

b8686f9

Fix tests

21f053b

Fix memlet promotion number

b80dc12

Added option for independent node promotion threshold

56e390f

Extended the tests and fixes

93e84ab

Skip maps with single sizes

0305aaa

Make formatting happy

da784a2

Set better block size and gpu_maxnreg for kblocking

329803d

Make formatting happy

cf87306

Improve some hacks for subsets

b1cce65

Merge remote-tracking branch 'origin/main' into extend_loopblocking

0a12898

Merge remote-tracking branch 'origin/main' into extend_loopblocking

729dca7

Fix unique_name call

180d777

Merge remote-tracking branch 'origin/main' into extend_loopblocking

5391431

Remove NVIDIA related options for the loop in kblocking

73a8ac0

iomaganaris requested review from edopao and philip-paul-mueller April 17, 2026 17:50

philip-paul-mueller reviewed Apr 19, 2026

View reviewed changes

iomaganaris added 2 commits April 19, 2026 18:40

Don't change the maxnreg of a map if it's already set

87fc4bf

Handling comments from Philip

eabbbb1

philip-paul-mueller reviewed Apr 21, 2026

View reviewed changes

philip-paul-mueller reviewed Apr 22, 2026

View reviewed changes

	if all(map_range_size_i == 1 for map_range_size_i in map_range_size):
	if all((map_range_size_i == 1) == True for map_range_size_i in map_range_size):

	_memlet_to_promote: Optional[set[dace.Memlet]]
	_memlet_to_promote: Optional[set[dace_graph.MultiConnectorEdge[dace.Memlet]]]

Conversation

iomaganaris commented Apr 17, 2026

Uh oh!

philip-paul-mueller left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants