perf[next-dace]: Enhance LoopBlocking pass#2578
perf[next-dace]: Enhance LoopBlocking pass#2578iomaganaris wants to merge 27 commits intoGridTools:mainfrom
Conversation
philip-paul-mueller
left a comment
There was a problem hiding this comment.
I like the changes but there some things that needs to be addressed.
| if self.map_entry.map.gpu_block_size is not None: | ||
| return False | ||
| if self.map_entry.map.gpu_maxnreg > 0: | ||
| return False |
There was a problem hiding this comment.
I am actually not sure if this is needed, as maxnreg is only set if it is not set.
However, I assume this changes avoids some other problem, that I don't fully get.
There was a problem hiding this comment.
With this check I want to make sure that if the gpu_maxnreg is set for the map from some other transformation we don't change it here but the check in the can_be_applied is wrong so I'm going to change it
| default=True, | ||
| desc="If 'True' then blocking is only applied if there are independent nodes.", | ||
| ) | ||
| promote_independent_memlets = dace_properties.Property( |
There was a problem hiding this comment.
You also need to add them to the doc string of the class, there I would also add what promotion means, probably creating a scalar.
| block_var_idx = map_params.index(block_var) | ||
| map_range_size = map_range.size() | ||
|
|
||
| if all(map_range_size_i == 1 for map_range_size_i in map_range_size): |
There was a problem hiding this comment.
There could be a Sympy issue here, so you have to use:
| if all(map_range_size_i == 1 for map_range_size_i in map_range_size): | |
| if all((map_range_size_i == 1) == True for map_range_size_i in map_range_size): |
| # Set of nodes that are independent of the blocking parameter. | ||
| _independent_nodes: Optional[set[dace_nodes.AccessNode]] | ||
| _dependent_nodes: Optional[set[dace_nodes.AccessNode]] | ||
| _memlet_to_promote: Optional[set[dace.Memlet]] |
There was a problem hiding this comment.
This is confusing (and also very strange), but the class Memlet only the payload of the edge in the dataflow graph, although everyone is pretending that it is the actual edge.
Thus the correct type signature is:
| _memlet_to_promote: Optional[set[dace.Memlet]] | |
| _memlet_to_promote: Optional[set[dace_graph.MultiConnectorEdge[dace.Memlet]]] |
However, as far as I understand it should be possible to change the set into a list.
There was a problem hiding this comment.
Since iterate over _memlet_to_promote it should be a list indeed. And you're also right for the type of _memlet_to_promote. Not sure how the syntax checks didn't catch this issue
There was a problem hiding this comment.
DaCe is not typechecked, so every type from it is apparently handled as Any.
But you have to ask Enrique if you want more information.
| if not self.partition_map_output(graph, sdfg): | ||
| return False | ||
| self._independent_nodes = None | ||
| self._dependent_nodes = None |
There was a problem hiding this comment.
Currently you couple can_be_applied() with apply() because you do not set self._memlet_to_promote to None.
Thus you require that can_be_applied() has run immediate before apply() has been called.
To solve this you have to reset _memlet_to_promote here and in apply() populate it again.
You can also do it in self._prepare_independent_memlets(), see my other note.
| ] | ||
| independent_outer_map_as_range = dace_subsets.Range( | ||
| ranges=[ | ||
| in_edge.data.subset.ranges[outer_map_entry.map.params.index(idx)] |
There was a problem hiding this comment.
Since this mapping is fix, you could create a dict outside the loop.
| original_dst_of_in_edge = in_edge.dst | ||
| original_dst_conn_of_in_edge = in_edge.dst_conn | ||
| original_dst_other_subset_of_in_edge = in_edge.data.other_subset | ||
| # Redirect the memlet to the temporary AccessNode | ||
| dace_helpers.redirect_edge( | ||
| state=state, | ||
| edge=in_edge, | ||
| new_dst=promoted_anode, | ||
| new_dst_conn=None, | ||
| new_memlet=dace.Memlet( | ||
| data=in_edge.data.data, | ||
| subset=in_edge.data.subset, | ||
| other_subset=dace_subsets.Range.from_array(sdfg.arrays[promoted_name]), | ||
| ), | ||
| ) |
There was a problem hiding this comment.
This might fail in very strange ways if the consumer is an AccessNode.
The following code should handle that case too:
assert in_edge.data.wcr is None
simple_direction = not (isinstance(in_edge.dst, dace_nodes.AccessNode) and in_edge.data.data != in_edge.dst.data)
first_new_memlet = deepcopy(in_edge.data)
if simple_direction:
first_new_memlet = dace.Memlet(
data=in_edge.data.data,
subset=deepcopy(in_edge.data.subset),
other_subset=deepcopy(new_subset_as_range),
)
else:
first_new_memlet = dace.Memlet(
data=promoted_name,
subset=deepcopy(new_subset_as_range),
other_subset=deepcopy(in_edge.data.other_subset),
)
second_new_memlet = deepcopy(in_edge.data)
if simple_direction:
second_new_memlet.data = promoted_name
second_name_memlet.subset = deepcopy(new_subset_as_range)
else:
second_name_memlet.other_subset = deepcopy(new_subset_as_range)
state.add_edge(
in_edge.src,
in_edge.src_conn,
promoted_anode,
None,
first_new_memlet,
)
state.add_edge(
promoted_anode,
None,
in_edge.dst,
in_edge.dst_conn,
second_new_memlet,
)
state.remove_edge(in_edge)
There was a problem hiding this comment.
This might fail in very strange ways if the consumer is an AccessNode.
If it's an AccessNode then it would be an independent one, so it shouldn't be handled here. Maybe we should add an assert instead
There was a problem hiding this comment.
You are right I did not consider that.
However, instead of an assert I would raise a NotImplementedErrror.
| "Independent memlets should only be inputs to maps that have a single parameter. " | ||
| "Those should always be neighbor reductions." | ||
| ) | ||
| edge.data.subset = next(iter(original_dst_of_in_edge.params)) |
There was a problem hiding this comment.
It is correct that you have to update subset here, but does does not make much sense to me.
There was a problem hiding this comment.
Not sure I understood your comment. Should I change something?
There was a problem hiding this comment.
I mean original_dst_of_in_edge is a MapEntry and it does not have an attribute called params, the Map has this, i.e. original_dst_of_in_edge.map.params would exists.
Map::params stores the iteration variables (ordered accordingly to this function).
However, now you take the first iteration variable, which at this point should be horizontal dimension.
So I do not understand the logic that is applied here.
| "Those should always be neighbor reductions." | ||
| ) | ||
| edge.data.subset = next(iter(original_dst_of_in_edge.params)) | ||
| edge.data.other_subset = None |
There was a problem hiding this comment.
You can not set this to None you must keep it.
Instead there is an else branch missing, there you need to update other_subset (you need to set it to the same value you used for .subset in the then branch.
There was a problem hiding this comment.
I thought that if the edge memlet data is not the one that I have touched then I won't have to update anything because then the edge data will be the output data so the subset would be correct. I don't know if there could be a case that the other_subset refers to an outside node data for an edge that stems from a MapEntry
There was a problem hiding this comment.
If there is an AccessNode inside a Map and the Memlet inside the Map scope refers to the AccessNode inside, then outher_subset stores what is read from the data outside.
Note that the data attribute of the Memlet outside the Map will refer to the data that is outside.
However, this is a case that is unlikely but not impossible to appear.
Here you have to keep other_subset in case the Memlet ends in an AccessNode.
| elif isinstance(original_dst_of_in_edge, dace_nodes.NestedSDFG): | ||
| raise NotImplementedError("Promotion of memlets to NestedSDFG not implemented yet.") | ||
| elif isinstance(original_dst_of_in_edge, dace_nodes.LibraryNode): | ||
| raise NotImplementedError( | ||
| "Promotion of memlets to LibraryNode not implemented yet." | ||
| ) |
There was a problem hiding this comment.
I think you can remove them, since original_dst_of_in_edge is always classified as a dependent node.
There was a problem hiding this comment.
Not sure why I could remove these checks. Couldn't we have a dependent NestedSDFG or LibraryNode?
| for subset_range in in_edge.data.subset.ranges: | ||
| if subset_range not in independent_outer_map_as_range.ranges: | ||
| new_subset.append(subset_range) |
There was a problem hiding this comment.
This looks a bit brittle, because you look at sizes.
However, currently I do not have a better idea, beside looking at the subset and checking if it contains the blocking parameter, but I am not sure if this is better.
| assert len(new_subset) > 0, ( | ||
| "After removing the independent dimensions there should be at least one dimension left to promote." | ||
| ) |
There was a problem hiding this comment.
I think the number of dimensions to promote should be one less than before?
| ) | ||
|
|
||
| if isinstance(original_dst_of_in_edge, dace_nodes.MapEntry): | ||
| for edge in state.out_edges(original_dst_of_in_edge): |
There was a problem hiding this comment.
I did not saw this before, but iterating over the out edges is not enough, as there could be nested Maps.
I think there is something in utils, the reroute or so that can help you do it or at least give you some hints on how to do it.
|
|
||
| if isinstance(original_dst_of_in_edge, dace_nodes.MapEntry): | ||
| for edge in state.out_edges(original_dst_of_in_edge): | ||
| if edge.data.data == in_edge.data.data: |
There was a problem hiding this comment.
The else branch is missing.
There you have to update other_subset.
Added the following features to the
LoopBlockingpass:Memlets that have independent data are promoted in the outerMapso that they only have to be read onceblocking_independent_node_threshold.LoopBlockingis only going to be applied toMaps that have more than the threshold number independent variables