Skip to content

perf[next-dace]: Enhance LoopBlocking pass#2578

Open
iomaganaris wants to merge 27 commits intoGridTools:mainfrom
iomaganaris:extend_loopblocking
Open

perf[next-dace]: Enhance LoopBlocking pass#2578
iomaganaris wants to merge 27 commits intoGridTools:mainfrom
iomaganaris:extend_loopblocking

Conversation

@iomaganaris
Copy link
Copy Markdown
Contributor

Added the following features to the LoopBlocking pass:

  • Memlets that have independent data are promoted in the outer Map so that they only have to be read once
  • Added option blocking_independent_node_threshold. LoopBlocking is only going to be applied to Maps that have more than the threshold number independent variables

Copy link
Copy Markdown
Contributor

@philip-paul-mueller philip-paul-mueller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the changes but there some things that needs to be addressed.

if self.map_entry.map.gpu_block_size is not None:
return False
if self.map_entry.map.gpu_maxnreg > 0:
return False
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am actually not sure if this is needed, as maxnreg is only set if it is not set.
However, I assume this changes avoids some other problem, that I don't fully get.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this check I want to make sure that if the gpu_maxnreg is set for the map from some other transformation we don't change it here but the check in the can_be_applied is wrong so I'm going to change it

default=True,
desc="If 'True' then blocking is only applied if there are independent nodes.",
)
promote_independent_memlets = dace_properties.Property(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You also need to add them to the doc string of the class, there I would also add what promotion means, probably creating a scalar.

block_var_idx = map_params.index(block_var)
map_range_size = map_range.size()

if all(map_range_size_i == 1 for map_range_size_i in map_range_size):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There could be a Sympy issue here, so you have to use:

Suggested change
if all(map_range_size_i == 1 for map_range_size_i in map_range_size):
if all((map_range_size_i == 1) == True for map_range_size_i in map_range_size):

# Set of nodes that are independent of the blocking parameter.
_independent_nodes: Optional[set[dace_nodes.AccessNode]]
_dependent_nodes: Optional[set[dace_nodes.AccessNode]]
_memlet_to_promote: Optional[set[dace.Memlet]]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is confusing (and also very strange), but the class Memlet only the payload of the edge in the dataflow graph, although everyone is pretending that it is the actual edge.
Thus the correct type signature is:

Suggested change
_memlet_to_promote: Optional[set[dace.Memlet]]
_memlet_to_promote: Optional[set[dace_graph.MultiConnectorEdge[dace.Memlet]]]

However, as far as I understand it should be possible to change the set into a list.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since iterate over _memlet_to_promote it should be a list indeed. And you're also right for the type of _memlet_to_promote. Not sure how the syntax checks didn't catch this issue

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DaCe is not typechecked, so every type from it is apparently handled as Any.
But you have to ask Enrique if you want more information.

if not self.partition_map_output(graph, sdfg):
return False
self._independent_nodes = None
self._dependent_nodes = None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently you couple can_be_applied() with apply() because you do not set self._memlet_to_promote to None.
Thus you require that can_be_applied() has run immediate before apply() has been called.

To solve this you have to reset _memlet_to_promote here and in apply() populate it again.
You can also do it in self._prepare_independent_memlets(), see my other note.

]
independent_outer_map_as_range = dace_subsets.Range(
ranges=[
in_edge.data.subset.ranges[outer_map_entry.map.params.index(idx)]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this mapping is fix, you could create a dict outside the loop.

Comment on lines +737 to +751
original_dst_of_in_edge = in_edge.dst
original_dst_conn_of_in_edge = in_edge.dst_conn
original_dst_other_subset_of_in_edge = in_edge.data.other_subset
# Redirect the memlet to the temporary AccessNode
dace_helpers.redirect_edge(
state=state,
edge=in_edge,
new_dst=promoted_anode,
new_dst_conn=None,
new_memlet=dace.Memlet(
data=in_edge.data.data,
subset=in_edge.data.subset,
other_subset=dace_subsets.Range.from_array(sdfg.arrays[promoted_name]),
),
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might fail in very strange ways if the consumer is an AccessNode.
The following code should handle that case too:

assert in_edge.data.wcr is None
simple_direction = not (isinstance(in_edge.dst, dace_nodes.AccessNode) and in_edge.data.data != in_edge.dst.data)
first_new_memlet = deepcopy(in_edge.data)
if simple_direction:
    first_new_memlet = dace.Memlet(
        data=in_edge.data.data,
        subset=deepcopy(in_edge.data.subset),
        other_subset=deepcopy(new_subset_as_range),
    )
else:
    first_new_memlet = dace.Memlet(
        data=promoted_name,
        subset=deepcopy(new_subset_as_range),
        other_subset=deepcopy(in_edge.data.other_subset),
    )

second_new_memlet = deepcopy(in_edge.data)
if simple_direction:
    second_new_memlet.data = promoted_name
    second_name_memlet.subset = deepcopy(new_subset_as_range)
else:
    second_name_memlet.other_subset = deepcopy(new_subset_as_range)

state.add_edge(
    in_edge.src,
    in_edge.src_conn,
    promoted_anode,
    None,
    first_new_memlet,
)
state.add_edge(
    promoted_anode,
    None,
    in_edge.dst,
    in_edge.dst_conn,
    second_new_memlet,
)
state.remove_edge(in_edge)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might fail in very strange ways if the consumer is an AccessNode.

If it's an AccessNode then it would be an independent one, so it shouldn't be handled here. Maybe we should add an assert instead

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right I did not consider that.
However, instead of an assert I would raise a NotImplementedErrror.

"Independent memlets should only be inputs to maps that have a single parameter. "
"Those should always be neighbor reductions."
)
edge.data.subset = next(iter(original_dst_of_in_edge.params))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is correct that you have to update subset here, but does does not make much sense to me.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understood your comment. Should I change something?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean original_dst_of_in_edge is a MapEntry and it does not have an attribute called params, the Map has this, i.e. original_dst_of_in_edge.map.params would exists.
Map::params stores the iteration variables (ordered accordingly to this function).
However, now you take the first iteration variable, which at this point should be horizontal dimension.
So I do not understand the logic that is applied here.

"Those should always be neighbor reductions."
)
edge.data.subset = next(iter(original_dst_of_in_edge.params))
edge.data.other_subset = None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can not set this to None you must keep it.
Instead there is an else branch missing, there you need to update other_subset (you need to set it to the same value you used for .subset in the then branch.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that if the edge memlet data is not the one that I have touched then I won't have to update anything because then the edge data will be the output data so the subset would be correct. I don't know if there could be a case that the other_subset refers to an outside node data for an edge that stems from a MapEntry

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is an AccessNode inside a Map and the Memlet inside the Map scope refers to the AccessNode inside, then outher_subset stores what is read from the data outside.
Note that the data attribute of the Memlet outside the Map will refer to the data that is outside.
However, this is a case that is unlikely but not impossible to appear.

Here you have to keep other_subset in case the Memlet ends in an AccessNode.

Comment on lines +776 to +781
elif isinstance(original_dst_of_in_edge, dace_nodes.NestedSDFG):
raise NotImplementedError("Promotion of memlets to NestedSDFG not implemented yet.")
elif isinstance(original_dst_of_in_edge, dace_nodes.LibraryNode):
raise NotImplementedError(
"Promotion of memlets to LibraryNode not implemented yet."
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can remove them, since original_dst_of_in_edge is always classified as a dependent node.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why I could remove these checks. Couldn't we have a dependent NestedSDFG or LibraryNode?

Comment on lines +725 to +727
for subset_range in in_edge.data.subset.ranges:
if subset_range not in independent_outer_map_as_range.ranges:
new_subset.append(subset_range)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks a bit brittle, because you look at sizes.
However, currently I do not have a better idea, beside looking at the subset and checking if it contains the blocking parameter, but I am not sure if this is better.

Comment on lines +728 to +730
assert len(new_subset) > 0, (
"After removing the independent dimensions there should be at least one dimension left to promote."
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the number of dimensions to promote should be one less than before?

)

if isinstance(original_dst_of_in_edge, dace_nodes.MapEntry):
for edge in state.out_edges(original_dst_of_in_edge):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not saw this before, but iterating over the out edges is not enough, as there could be nested Maps.
I think there is something in utils, the reroute or so that can help you do it or at least give you some hints on how to do it.


if isinstance(original_dst_of_in_edge, dace_nodes.MapEntry):
for edge in state.out_edges(original_dst_of_in_edge):
if edge.data.data == in_edge.data.data:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The else branch is missing.
There you have to update other_subset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants