-
Notifications
You must be signed in to change notification settings - Fork 350
feat(GridIntersect): add array support, fast point locating and support for points with z-coordinates #2657
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
- deprecate method kwarg - remove keepzerolengths from intersect() - clean up deprecated methods
…ts() - allow arrays of numpy arrays in intersects(), returns all candidate cells for intersection per geometry - add predicate to query_grid() - simplify names ofintersection methods, deprecate old internal methods - move logic to translate node number to rowcol to separate method
- flip arguments in filter_query_result to be consistent with other class methods - remove get_gridshapes method() - move imports to top of file
- add points_to_cellids() for fast 1:1 mapping between points and grid cells, return nan when outside of grid
- rename return_cellids to return_nodenumbers for clarity (node numbers always one number, cellid is somewhat ambiguous) - use shapely plotting for linestrings and points
- add handle_z options: "ignore", "drop" and "return" to intersect() and points_to_cellids() - add method to compute layer position for z coordinates - improve _intersect_point() method by using intersects as starting point.
- remove tests for deprecated methods - add array tests for intersects() and points_to_cellids() - activate+improve tests for 3D points
- make staticmethods accept shapely geometries as input instead of rec-arrays - update notebook
|
Tests failing because MF6 examples still use method kwarg. Getting tests to pass, will require a change in modflow6-examples repo: |
- replace 3 instances where new nodenumber to rowcol method should be used
|
@dbrakenhoff First off, apologies for overlapping with your work... searchsorted (structured) and kdtree (vertex, unstructured) in my PR already provides 1:1 point-to-cell mapping - each query point returns exactly one cell (or nan if outside, can change from nan). We could add wrappers or extend grid.intersect to use this approach while maintaining the existing API... So I'm a bit confused by the timing of this PR. The key difference from shapely: shapely handles non-convex cell geometries, whereas our approach assumes convex hulls. This assumption is valid for typical MODFLOW grids (rectangles, Voronoi cells, triangles) and enables significant simplifications and speedups. More importantly, the convex hull approach generalizes cleanly to higher-dimensional shapes - the same hyperplane equations can work for lines, polygons, and polyhedrons without needing shapely at all. I can work with you in the future once I'm done with the point to cell PR for higher-dimensional shapes. |
No worries, there were already 2 places in flopy with grid intersection operations, so our simultaneous work hasn't changed that, but they have become slightly more similar now that they both support arrays of points.
I realize there is some overlap with your work, especially between the new So the questions now are, do we continue to let the two live side-by-side? Or do we integrate them and implement |
I think this is fine. 4.x is a natural time for consolidation. at some point we'll need to make the decision which to delegate to for xarray spatial queries e.g. stuff like
Other comments
Clear to me, I like it
Yes, but re:
while I like the cellid/nn toggle, I wonder if it should return the elements of cellid as separate columns. While people may be used to seeing cell ID tuples as a single recarray column we have had a lot of internal discussion about how awkward it is and are thinking of always flattening them into separate columns in 4.x. as this parameter is new I'm tempted to say we are free to shed old baggage here Relatedly, maybe singular instead of plural naming for the columns i.e.
Just wondering if there is a motivation for "ignore" besides backwards-compatibility? I guess that is sufficient cause but it seems nicer if this could be a bool switch instead of a string. maybe something for 4.x |
|
maybe also consider returning |
|
Thanks for the comments @wpbonelli! I'll prepare some fixes based on your comments:
And yea, it makes sense to defer the integration to 4.x. I think @aacovski's work more naturally fits with the xarray implementation, whereas As for this code: |
Yeah since dataframe syntax is basically a superset of recarray syntax I think this is fine. Sounds good to me though I will admit I am not a frequent user of these APIs so am not personally much affected.
If you think it will be a common case to want to drop NaNs I can see the case for the string, no strong feelings on this. But still a bool feels cleaner and it is easy for the caller to drop them.
Ah, ok- yeah, then that's for 4.x.
Do you mean smack on the boundary? In that case could we just use the same lowest ID tie-breaking rule? Or am I misunderstanding- how is a refinement boundary different from any shared cell boundary in this regard? |
|
Ahh. thanks. nearest cell center. I was confusing that with some notion of "nearest cell" which I recognize now is ill-defined absent a projection scheme I guess.. |
- Add cellid to result and remove return_nodenumbers option. cellid now always includes cellid. - use cellid (not cellids) result internally for all operation. - Add row and col to structured result - maintain cellids column for backward compatibility
- add warning about future default - for new method points_to_cellids, default to dataframe.
- with some copilot help
- add df_toggle to silence warnings - switch handle_z to bool
|
I think this is ready for review? Some brief comments on final changes:
|
I don't know what the conventional way to handle this is, but I assume we aren't the first to run into the problem. I guess the pure-numpy alternatives are sentinal values and masked arrays. Sentinels are awkward and feel wrong. Masked arrays seem like a decent solution but maybe those have limitations I'm not aware of? Once we start returning pandas by default, It looks like the elements of the legacy |
Good catch, fix incoming. On that note, it's probably also a good idea to suppress dataframe warnings everywhere by passing
I think that is an acceptable solution. The only place we allow NaNs in the result is in On a related note, for the "layer" field, I also use floats to allow input of +inf and -inf to indicate points lie above/below the model grid. Ideally this column would be |
- fix legacy cellids to return int - use -1 as inernal invalid index number - set invalid indices as pd.NA when returning as dataframes - return masked recarray in points_to_cellids when indices contain NaNs - modify tests to use pd.isna() - update example notebook
Seems like a good compromise, pass
Is this bit of information valuable for typical use cases? Or is it usually enough to know the point is outside the grid? I don't feel like I have the user perspective to have strong opinions either way |
Yeah, 4.x makes sense to me.
I think you're right. Returning floats is just awkward. So I think I'll make it an int dtype and mask invalid values and leave the further investigations to the user. |
- mask result when recarray, return NA when dataframe - update tests/nb
|
Modflow 6 examples update is here: MODFLOW-ORG/modflow6-examples#318 |
wpbonelli
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all looks good to me! I know @jlarsen-usgs was planning on having a look too though.

Major update of GridIntersect module.
@Reviewer(s), I have a few things I wonder about, especially the naming of things:
points_to_cellids()for locating points in the grid. Is this name clear? And iscellidthe correct name for(row, column)or(cellid,)output?return_nodenumbers=True, the intersects result can also return node numbers instead of (row, columns) for structured grids.Support for numpy arrays of shapely geometries
Introduces support for arrays of shapely geometries in
GridIntersect.intersects()method.Returns rec-array with a
shp_idfield containing the index of the original shape inshapesand the cellids of the candidate cells the polygons intersect with, for example:So in that example output polygon 0 intersects with 2 grid cells: (1, 0) and (1, 1), and polygon 1 intersects with 2 grid cells: (0, 0), and (0, 1).
Fast point locating
GridIntersect was missing a method to obtain one cellid for each input point. The implementation of
GridIntersect.intersects()drops all points outside the model grid, and potentially returns all intersection results for each point, meaning there is no 1:1 output between input points and output cellids. TheGridIntersect.intersect()method allows a MultiPoint as input, but is inefficient, and groups points in the same cell as MultiPoints, also losing the 1:1 relation.So a new method is introduced
GridIntersect.points_to_cellids(). This method guaranteeslen(input) == len(output). Cellids are returned for each point and NaN is returned if the point is outside the model grid. For points on boundaries the lowest grid cellid is returned.Support for points with z-coordinates
Adds support for points with z-coordinates in
GridIntersect.intersect()andGridIntersect.points_to_cellids()with thehandle_z-kwarg."ignore"which ignores the z-coordinates, similar to what GridIntersect did before."drop"which removes the intersection results if the point lies above or below the model grid. This option is only included inintersect()."return"which returns the computed layer position for each point. This returns NaN if the point is outside the model domain, +inf if the point is above the model and -inf the point is below the model.Deprecations
Deprecates everything related to
method="structured"Other stuff