Skip to content

Conversation

@fzi-hielscher
Copy link
Contributor

@fzi-hielscher fzi-hielscher commented Nov 9, 2025

Spill hw.array values on the stack at their definition site instead of their use sites to reduce the number of redundant alloca + store ops created. This should mitigate the negative effects of #9172, fixing #9211.

This PR adds the HWToLLVMArraySpillCache helper to the HWToLLVM conversion patterns. Certain operations on !hw.array typed values need to "spill" (i.e., copy into a buffer, usually stack allocated) their input arrays in order to access dynamically indexed elements. The cache is used to avoid spilling the array for each such user separately. Instead,operations that create an array value which needs spilling allocate a buffer that can be shared by the value's users. HW Dialect operations that inherently allocate a buffer for their output arrays (AggregateConstantOp, ArrayInjectOp, ArraySliceOp) use the cache to associate their output value with a pointer to the buffer, so it does not need to be spilled again. Non-HW Dialect producers of HW array values (e.g., arc.state_load) need to be handled pre conversion. This is done by invoking spillNonHWOps.

Use of the cache is optional and can be disabled by passing spill-arrays-early=false to convert-hw-to-llvm or using a nullopt value for the cache argument of populateHWToLLVMConversionPatterns. In fact, conversions passes that do not disable allowPatternRollback must not use the cache since a buffer added to the cache once cannot be removed in case of a rollback.

@fzi-hielscher fzi-hielscher force-pushed the hwtollvm-early-array-spill branch from 3c40229 to 1e486ed Compare November 12, 2025 23:56
@fzi-hielscher fzi-hielscher changed the title [WIP][HWToLLVM][ArcToLLVM] Spill array values early [HWToLLVM][ArcToLLVM] Spill array values early Nov 14, 2025
@fzi-hielscher fzi-hielscher marked this pull request as ready for review November 14, 2025 11:40
@fzi-hielscher
Copy link
Contributor Author

@SimonEbner, could you check if this actually fixes the compile time regression in your example?

@SimonEbner
Copy link
Contributor

Thank you. I tested 120dd53 vs your head:
120dd53: 12.59s, HEAD: 1.86s

@fzi-hielscher
Copy link
Contributor Author

Thanks for checking.
I suspect the remaining overhead is due to arc.state_load ops now being spilled instead of accessing the heap buffer directly. We could probably implement some heuristic allowing array accesses without spilling if we can be certain that the underlying memory remains unchanged between the arc.state_load and the hw.array_get op. But at the moment I'd rather not add that complexity if it is only to reduce the load on the LLVM optimizations.

@SimonEbner
Copy link
Contributor

Sure, I think this patch is great, more than good enough. Thanks for looking into this

Copy link
Member

@maerhart maerhart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's great! Thanks!

Comment on lines 344 to 352
auto oneC = LLVM::ConstantOp::create(
rewriter, op->getLoc(), IntegerType::get(rewriter.getContext(), 32),
rewriter.getI32IntegerAttr(1));
arrPtr = LLVM::AllocaOp::create(
rewriter, op->getLoc(),
LLVM::LLVMPointerType::get(rewriter.getContext()),
adaptor.getInput().getType(), oneC,
/*alignment=*/4);
LLVM::StoreOp::create(rewriter, op->getLoc(), adaptor.getInput(), arrPtr);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: as this occurs a few times, we could factor it out into a helper function.

@fzi-hielscher fzi-hielscher force-pushed the hwtollvm-early-array-spill branch from 89a8174 to dceae1b Compare November 20, 2025 23:19
@fzi-hielscher fzi-hielscher merged commit d33142e into llvm:main Nov 21, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants