Skip to content

Conversation

@mn416
Copy link
Collaborator

@mn416 mn416 commented Nov 7, 2025

This PR introduces an ArrayIndexAnalysis class to determine whether or not distinct iterations of a given loop can generate conflicting array accesses (if not, the loop can potentially be parallelised). It formulates the problem as a set of SMT constraints over array indices which are then are passed to the third-party Z3 solver.

The PR also introduces a new option use_smt_array_index_analysis to DependencyTools and ParallelLoopTrans which enables use of the new analysis in place of DepdencyTools's current array conflict analysis.

To install the Z3 solver:

$ pip install z3-solver

The analysis is broadly explained in source code comments.

There is also a set of examples showing what the analysis is capable of (and not capable of), available here. None of these examples are solved by PSyclone's existing loop analysis, and all can be solved by the SMT-based analysis.

Matthew Naylor added 4 commits November 5, 2025 13:59
@arporter
Copy link
Member

arporter commented Nov 7, 2025

This feels like one for @hiker to look at as he did DependencyTools - without looking at the code my feeling is that it would make sense for extra sophistication to go in there rather than into a specific PSyIR node.

EDIT: my mistake - I see you say you've added a new class to encapsulate this functionality. I'd still like to see whether it makes sense to merge it/include it with DependencyTools though.

@mn416
Copy link
Collaborator Author

mn416 commented Nov 10, 2025

Thanks @arporter. I could call the new analysis from DependencyTools rather than from ParallelLoopTrans to make it more widely accessible. That might be simpler/clearer too. I will have a think about this. It probably makes sense to keep the analysis as a separate class/module though as the problem it solves can be understood in isolation, but maybe I am missing something.

@codecov
Copy link

codecov bot commented Nov 10, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.90%. Comparing base (761be74) to head (9eef149).
⚠️ Report is 187 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3213      +/-   ##
==========================================
- Coverage   99.90%   99.90%   -0.01%     
==========================================
  Files         376      375       -1     
  Lines       53156    53460     +304     
==========================================
+ Hits        53104    53407     +303     
- Misses         52       53       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@hiker
Copy link
Collaborator

hiker commented Nov 11, 2025

Thanks a lot for this contribution. TBH, ever since I used an SAT (elsewhere), I was thinking to try using an SMT for this - so thanks a lot. And indeed (besides others), the current implementation of the dependency analysis does not use the loop indices (which was on my todo list ... but given that this code is hardly ever seen in reality, I never even started :) ), and I wanted to improve its handling cause by tiling loops :)

Now, this PR is rather 'difficult'. Let me try to put it into context:

  • I would prefer not having two different implementations of code that basically does the same thing. So, I would want to check if we could replace the current implementation entirely (sympy would still stick around for other uses I'd guess).
  • I had a look at your code, and you basically have a conversion from PSyIR expression to a Z3 expression. I wonder if it might be clearer/simpler to use the 'standard' approach of creating a PSyIR backend similar to the sympy writer (I don't think we need a frontend to go the way back to PSyIR, so quite a bit of the state stored in the sympy writer will not be required for you). The advantage would be that it makes the Z3 solver more accessible in other parts (i.e. future use cases, e.g. better verification of loop fusion etc). But admittedly, I am not sure if this would actually work for you, since you need to create new variables while running? You would know that better than me :)
  • Also, I think there are some special Fortran expressions that your solver would not handle. Following the sympy writer approach should give you some examples (precision syntax like 3.0_5 or 3.1_wp), which is also allowed for integer variables. Admittedly, probably not important for index expressions, but just to be complete. You should be able to just drop the _... part.
  • What about structure accesses (I admit I didn't check in full detail or try it out, but I didn't notice it being handled)? For sympy, we have a rather complicated handling of expressions using a%b. The UM uses structures to provide loop indices quite frequently to determine lower and upper bound (do j = tdims%j_start, tdims%j_end), and I believe atm the code would map both to just the first symbol (tdims).

And I agree with Andy that it should go into the dependency tools (making it somehow configurable which version to use.

Two additional concerns/questions I have:

  1. I noticed that you added a time out. I see a potential risk in this that it might depend on the power of the hardware that PSyclone is running out to decide it a loop is parallelisable or not :( On a slow (or highly loaded) machine, the time out is reached and a loop is not parallelised, but no a faster (not highly loaded) machine, a loop would be parallelised. That's ... really not something we should be doing.
  2. You also had to tweak additional Z3 settings (e.g. the example where Z3 would assume integer overflow). Again, this makes it hard to use Z3 generically if these tweaks are required depending on case. Do you think you could identify common settings (e.g. which kind of integer variables we have) that work in all cases? E.g. I am pretty certain that we can ignore any case of integer overflows for deciding if a loop can be parallelised.

Now the bad news: I am pretty busy with updating my PSyclone training (I need to give it in 3 weeks time, and there are quite a few changes in PSyclone), and we are planning a release soon. You might also be affected by Sergi's changes (that will soon go onto master) that replaced the component indices (I think you are probably not affected by that), so it will be a bit before I can really dig into that.

Till I get back to that, could you check if your Z3 solver would be able to handle all other cases we support atm (i.e. could replace the existing code)? And some indication of the timing would be useful (atm you are using a timeout of 5 seconds. What is the typical runtime for a 'normal' case? E.g. if it's 2 seconds, then imho we would be affected too much by load of the machine it's running on and the performance of that machine. If it's 0.1, then it's probably all good).

@arporter , @sergisiso , Z3 takes up around 45MB on my laptop, so less than sympy (66MB - plus other dependencies sympy has). And strangely, it contains the same library twice, so could be reduced to 23MB.

I also did a bit of search if there is a SMT solver that integrates better with SymPy (i.e. if we could use sympy expressions to feed them into the solver, instead of converting psyir to z3), but that appears to be not the case.

@mn416
Copy link
Collaborator Author

mn416 commented Nov 12, 2025

@hiker thank you very much for these comments and suggestions, and your time. Below, I respond to specific points and then summarise next steps.

I would prefer not having two different implementations of code that basically does the same thing. So, I would want to check if we could replace the current implementation entirely

I envisaged that we'd want both. The new analysis is only useful when combined with the existing one. It only considers array index conflicts and assumes the existing analysis has already established that there are no scalar conflicts, for example. (I currently combine the two in ParallelLoopTrans but, as mentioned above, I think it would be better for DependencyTools to call ArrayIndexAnalysis directly when a flag is provided. I've noted a TODO for this.)

In addition, the solving time of the SMT-based analysis is quite unpredictable. In simple cases, it's probably overkill and might make PSyclone noticably slower when processing large codebases. I think it's best used as a fall-back option when the default (more efficient) analysis fails and you want PSyclone to try harder in specific files or routines.

I wonder if it might be clearer/simpler to use the 'standard' approach of creating a PSyIR backend similar to the sympy writer

Yes, this should be possible. I will have a look.

I think there are some special Fortran expressions that your solver would not handle. Following the sympy writer approach should give you some examples (precision syntax like 3.0_5 or 3.1_wp)

Thanks, I'll take a TODO to handle these. At the moment, the analysis only translates logical variables and integer variables of unspecified kind to SMT. If this is insufficient to prove absence of conflicts, then the analysis will conservatively report that a conflict exists.

What about structure accesses

Good point! This crossed my mind but I forgot about it. I'll take another TODO for this (to handle both scalar members and array members).

I noticed that you added a time out. I see a potential risk in this that it might depend on the power of the hardware that PSyclone is running out to decide it a loop is parallelisable or not

That's correct. Unfortunately, I don't know of a good solution. I think it is just a drawback of using powerful, general-purpose solvers.

Do you think you could identify common settings (e.g. which kind of integer variables we have) that work in all cases? E.g. I am pretty certain that we can ignore any case of integer overflows for deciding if a loop can be parallelised.

Yes, I'll change the default settings so that integer overflow is ignored. That still leaves the choice between using the integer solver and the bit vector solver. A simple solution here would be to try one then the other (or both in parallel).

Now the bad news: I am pretty busy with updating my PSyclone training

I appreciate that, which makes me all the more thankful for the feedback you've already given. I'm also unsure how much time I will have for this work, with the NG-ARCH project coming to an end in a few weeks (I hope I will get time for it, and other PSyclone ideas, but that's unclear at the moment). In any case, I hope the work serves as an interesting proof-of-concept, even if it doesn't get merged in.

Here is a summary of things I'll aim for in the short term:

  1. Call ArrayIndexAnalysis from DependencyTools if the user requests to do so. (Essentially, move my existing changes of ParallelLoopTrans into DependencyTools.)

  2. Deal with precision syntax, such as 3.0_5 or 3.1_wp, and check sympy_writer for other unusual constructs being handled.

  3. Handle both scalar members and array members of structs correctly.

  4. Try to move the examples from my personal repo into PSyclone's examples directory. I may need to tweak some of these to avoid copying unlicensed code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants