feat(GridIntersect): add array support, fast point locating and support for points with z-coordinates #2657

dbrakenhoff · 2025-11-27T12:02:13Z

Major update of GridIntersect module.

@Reviewer(s), I have a few things I wonder about, especially the naming of things:

A new method called points_to_cellids() for locating points in the grid. Is this name clear? And is cellid the correct name for (row, column) or (cellid,) output?
I use "cellids" as a field name in the result but when return_nodenumbers=True, the intersects result can also return node numbers instead of (row, columns) for structured grids.
Anyway, a critical look at the naming of stuff in the module would be greatly appreciated!

Support for numpy arrays of shapely geometries

Introduces support for arrays of shapely geometries in GridIntersect.intersects() method.

gi = GridIntersect(grid)
shapes = shapely.polygons(...)
gi.intersects(shapes)

Returns rec-array with a shp_id field containing the index of the original shape in shapes and the cellids of the candidate cells the polygons intersect with, for example:

rec.array([(0, (np.int64(1), np.int64(0))),
           (0, (np.int64(1), np.int64(1))),
           (1, (np.int64(0), np.int64(0))),
           (1, (np.int64(0), np.int64(1)))],
          dtype=[('shp_ids', '<i8'), ('cellids', 'O')])

So in that example output polygon 0 intersects with 2 grid cells: (1, 0) and (1, 1), and polygon 1 intersects with 2 grid cells: (0, 0), and (0, 1).

Fast point locating

GridIntersect was missing a method to obtain one cellid for each input point. The implementation of GridIntersect.intersects() drops all points outside the model grid, and potentially returns all intersection results for each point, meaning there is no 1:1 output between input points and output cellids. The GridIntersect.intersect() method allows a MultiPoint as input, but is inefficient, and groups points in the same cell as MultiPoints, also losing the 1:1 relation.

So a new method is introduced GridIntersect.points_to_cellids(). This method guarantees len(input) == len(output). Cellids are returned for each point and NaN is returned if the point is outside the model grid. For points on boundaries the lowest grid cellid is returned.

Support for points with z-coordinates

Adds support for points with z-coordinates in GridIntersect.intersect() and GridIntersect.points_to_cellids() with the handle_z-kwarg.

The default is "ignore" which ignores the z-coordinates, similar to what GridIntersect did before.
The second option is "drop" which removes the intersection results if the point lies above or below the model grid. This option is only included in intersect().
The final option is "return" which returns the computed layer position for each point. This returns NaN if the point is outside the model domain, +inf if the point is above the model and -inf the point is below the model.

Deprecations

Deprecates everything related to method="structured"

Other stuff

Cleaned up the module somewhat with some renaming since the structured methods dropped out
Used shapely plotting routines for linestrings and points to simplify plotting calls, make static plot methods accept shapely geometries instead of rec-array results so they can also be used to plot input geometries.
Update tests to test new features and clean up old tests
Update example notebook.

- deprecate method kwarg - remove keepzerolengths from intersect() - clean up deprecated methods

…ts() - allow arrays of numpy arrays in intersects(), returns all candidate cells for intersection per geometry - add predicate to query_grid() - simplify names ofintersection methods, deprecate old internal methods - move logic to translate node number to rowcol to separate method

- flip arguments in filter_query_result to be consistent with other class methods - remove get_gridshapes method() - move imports to top of file

- add points_to_cellids() for fast 1:1 mapping between points and grid cells, return nan when outside of grid

- rename return_cellids to return_nodenumbers for clarity (node numbers always one number, cellid is somewhat ambiguous) - use shapely plotting for linestrings and points

- add handle_z options: "ignore", "drop" and "return" to intersect() and points_to_cellids() - add method to compute layer position for z coordinates - improve _intersect_point() method by using intersects as starting point.

- remove tests for deprecated methods - add array tests for intersects() and points_to_cellids() - activate+improve tests for 3D points

- make staticmethods accept shapely geometries as input instead of rec-arrays - update notebook

dbrakenhoff · 2025-11-27T16:21:09Z

Tests failing because MF6 examples still use method kwarg. Getting tests to pass, will require a change in modflow6-examples repo:

MODFLOW-ORG/modflow6-examples#315

- replace 3 instances where new nodenumber to rowcol method should be used

aacovski · 2025-11-27T20:13:20Z

@dbrakenhoff First off, apologies for overlapping with your work...

searchsorted (structured) and kdtree (vertex, unstructured) in my PR already provides 1:1 point-to-cell mapping - each query point returns exactly one cell (or nan if outside, can change from nan). We could add wrappers or extend grid.intersect to use this approach while maintaining the existing API... So I'm a bit confused by the timing of this PR.

The key difference from shapely: shapely handles non-convex cell geometries, whereas our approach assumes convex hulls. This assumption is valid for typical MODFLOW grids (rectangles, Voronoi cells, triangles) and enables significant simplifications and speedups. More importantly, the convex hull approach generalizes cleanly to higher-dimensional shapes - the same hyperplane equations can work for lines, polygons, and polyhedrons without needing shapely at all. I can work with you in the future once I'm done with the point to cell PR for higher-dimensional shapes.

dbrakenhoff · 2025-11-28T08:34:53Z

@dbrakenhoff First off, apologies for overlapping with your work...

No worries, there were already 2 places in flopy with grid intersection operations, so our simultaneous work hasn't changed that, but they have become slightly more similar now that they both support arrays of points.

searchsorted (structured) and kdtree (vertex, unstructured) in my PR already provides 1:1 point-to-cell mapping - each query point returns exactly one cell (or nan if outside, can change from nan). We could add wrappers or extend grid.intersect to use this approach while maintaining the existing API... So I'm a bit confused by the timing of this PR.

I realize there is some overlap with your work, especially between the new points_to_cellids() method and the array-based grid.intersect() methods you built for grids. The inefficient method for intersecting points in GridIntersect has bugged me for quite some time, plus as you can see in the list of issues and the TODOs that were fixed in the code itself, there was a bunch of work we still wanted to do on the module, which is all included in this PR. Also I felt it was weird for GridIntersect to have good support for linestrings and polygons but not have a decent implementation for points. Yes I could have just used self.mfgrid.intersect() but since the whole GridIntersect module was written around shapely, I wanted to stay with shapely for now, and see what I could achieve with those tools. Shapely's approach might have some limitations, but it seems very fast, and very flexible for the most common intersection operations.

So the questions now are, do we continue to let the two live side-by-side? Or do we integrate them and implement grid.intersect() in GridIntersect for point intersection calls (I would be curious to see how the performance compares?) ? But this might also be more of a question for flopy4, where a significant redesign of flopy will take place. And in the meantime we can enjoy having two intersection methods :).

wpbonelli · 2025-12-01T14:34:15Z

do we continue to let the two live side-by-side?

I think this is fine. 4.x is a natural time for consolidation. at some point we'll need to make the decision which to delegate to for xarray spatial queries e.g. stuff like model.grid.head.sel(x=250, y=650, z=50, method='nearest') right? but unless you guys want to try the integration sooner I agree with @dbrakenhoff that

we can enjoy having two intersection methods :)

Other comments

A new method called points_to_cellids() for locating points in the grid. Is this name clear?

Clear to me, I like it

And is cellid the correct name for (row, column) or (cellid,) output?

Yes, but re:

"cellids" as a field name in the result but when return_nodenumbers=True, the intersects result can also return node numbers instead of (row, columns) for structured grids

while I like the cellid/nn toggle, I wonder if it should return the elements of cellid as separate columns. While people may be used to seeing cell ID tuples as a single recarray column we have had a lot of internal discussion about how awkward it is and are thinking of always flattening them into separate columns in 4.x. as this parameter is new I'm tempted to say we are free to shed old baggage here

Relatedly, maybe singular instead of plural naming for the columns i.e. shp_ids -> shp_id etc?

Adds support for points with z-coordinates in GridIntersect.intersect() and GridIntersect.points_to_cellids() with the handle_z-kwarg. The default is "ignore" which ignores the z-coordinates, similar to what GridIntersect did before.

Just wondering if there is a motivation for "ignore" besides backwards-compatibility? I guess that is sufficient cause but it seems nicer if this could be a bool switch instead of a string. maybe something for 4.x

wpbonelli · 2025-12-01T14:42:17Z

maybe also consider returning pd.DataFrame instead of np.recarray as pandas is now a core dependency?

dbrakenhoff · 2025-12-02T07:55:09Z

Thanks for the comments @wpbonelli! I'll prepare some fixes based on your comments:

return structured result with additional "row" and "col" columns, vertex result with "cellid" column. I will continue to return the original result in "cellids" so old code doesn't break...
return pd.DataFrame instead of recarray. This is something of a breaking change since the return type will change, but since the syntax for accessing the columns is still sort of the same, i think it is okay to default to pandas DataFrames? So basically we flip the dataframe. Perhaps I can set the kwarg to None and give a DeprecationWarning and return a DataFrame, allow silencing through dataframe=True and old behavior through dataframe=False.
I can switch handle_z to be boolean and effectively only support "ignore" and "return". I added "drop" to automatically drop the NaNs for points above/below the model grid, but we can leave that to the user. Which should be easy when the return object is a pandas.DataFrame.
I will convert "shp_id" to singular naming, but the others were already plural, so changing them will break peoples scripts... So I guess those others: "cellids"--> "cellid", "ixshapes" --> "geometry" are something for 4.x?

And yea, it makes sense to defer the integration to 4.x. I think @aacovski's work more naturally fits with the xarray implementation, whereas GridIntersect() might continue as a utility with a vector based approach for linestrings and polygons. For point intersections the two are very similar, so it would make sense to maybe defer those operations to the new modelgrid object. If there are performance gains when computing grid cell intersection candidates using the new modelgrid object, we can use the grid to filter the candidates, and then run the intersection computations. But we'll figure that out when we get there :).

As for this code: model.grid.head.sel(x=250, y=650, z=50, method='nearest') is great for getting the nearest cell center, but doesn't guarantee you're selecting the cell the point (x,y) is in, e.g. in quadtree grids at the refinement boundaries, so that might be something to think about.

wpbonelli · 2025-12-02T11:05:44Z

return pd.DataFrame instead of recarray. This is something of a breaking change since the return type will change, but since the syntax for accessing the columns is still sort of the same, i think it is okay to default to pandas DataFrames? So basically we flip the dataframe. Perhaps I can set the kwarg to None and give a DeprecationWarning and return a DataFrame, allow silencing through dataframe=True and old behavior through dataframe=False.

Yeah since dataframe syntax is basically a superset of recarray syntax I think this is fine. Sounds good to me though I will admit I am not a frequent user of these APIs so am not personally much affected.

I can switch handle_z to be boolean and effectively only support "ignore" and "return". I added "drop" to automatically drop the NaNs for points above/below the model grid, but we can leave that to the user. Which should be easy when the return object is a pandas.DataFrame.

If you think it will be a common case to want to drop NaNs I can see the case for the string, no strong feelings on this. But still a bool feels cleaner and it is easy for the caller to drop them.

I will convert "shp_id" to singular naming, but the others were already plural, so changing them will break peoples scripts... So I guess those others: "cellids"--> "cellid", "ixshapes" --> "geometry" are something for 4.x?

Ah, ok- yeah, then that's for 4.x.

As for this code: model.grid.head.sel(x=250, y=650, z=50, method='nearest') is great for getting the nearest cell center, but doesn't guarantee you're selecting the cell the point (x,y) is in, e.g. in quadtree grids at the refinement boundaries, so that might be something to think about.

Do you mean smack on the boundary? In that case could we just use the same lowest ID tie-breaking rule? Or am I misunderstanding- how is a refinement boundary different from any shared cell boundary in this regard?

dbrakenhoff · 2025-12-02T11:21:50Z

Do you mean smack on the boundary? In that case could we just use the same lowest ID tie-breaking rule? Or am I misunderstanding- how is a refinement boundary different from any shared cell boundary in this regard?

I meant this situation. Not necessarily a problem but getting the nearest value and getting the exact cellid your point lies in can be different.

wpbonelli · 2025-12-02T11:27:47Z

Ahh. thanks. nearest cell center. I was confusing that with some notion of "nearest cell" which I recognize now is ill-defined absent a projection scheme I guess..

- Add cellid to result and remove return_nodenumbers option. cellid now always includes cellid. - use cellid (not cellids) result internally for all operation. - Add row and col to structured result - maintain cellids column for backward compatibility

- add warning about future default - for new method points_to_cellids, default to dataframe.

- with some copilot help

- add df_toggle to silence warnings - switch handle_z to bool

dbrakenhoff · 2025-12-17T12:55:22Z

I think this is ready for review? Some brief comments on final changes:

Results now contain cellid column for vertex grids, and cellid, row and col for structured grids. This means the earlier return_nodenumbers option became unnecessary, since the cellid column always contains the cellid/node number.
The legacy cellids column is kept for backwards compatibility and contains (row, col) tuples for structured grids
handle_z is now a boolean
I chose to set the dataframe return options to None, and raise a warning that defaults will change. For now it maintains the old behavior, to ensure no code is broken. Setting the dataframe keyword to True/False silences the warning. This may cause people to see quite a few warnings with the next version of flopy if they're using GridIntersect. Let me know if this is undesirable and maybe we can think of slightly less noisy way of doing this?
Columns with indices are set to dtype int where possible. For points_to_cellids the dtype remains float to allow for NaNs (point outside grid) in the result.

wpbonelli · 2025-12-17T13:38:10Z

Columns with indices are set to dtype int where possible. For points_to_cellids the dtype remains float to allow for NaNs (point outside grid) in the result.

I don't know what the conventional way to handle this is, but I assume we aren't the first to run into the problem. I guess the pure-numpy alternatives are sentinal values and masked arrays. Sentinels are awkward and feel wrong. Masked arrays seem like a decent solution but maybe those have limitations I'm not aware of?

Once we start returning pandas by default, pd.NA (which works for integer arrays) becomes an option?

It looks like the elements of the legacy cellids column are now floats too. This is why the model splitter test is failing.

dbrakenhoff · 2025-12-17T15:39:21Z

It looks like the elements of the legacy cellids column are now floats too. This is why the model splitter test is failing.

Good catch, fix incoming.

On that note, it's probably also a good idea to suppress dataframe warnings everywhere by passing dataframe=False everywhere gridintersect is being used.

Masked arrays seem like a decent solution but maybe those have limitations I'm not aware of?

I think that is an acceptable solution. The only place we allow NaNs in the result is in points_to_cellids(), which is a new method, where we default to returning a dataframe. If someone really wants a recarray, they now get a masked recarray.

On a related note, for the "layer" field, I also use floats to allow input of +inf and -inf to indicate points lie above/below the model grid. Ideally this column would be int as well, but using a similar approach to above would mean we would lose information about points lying above/below the model. What are your thoughts there?

- fix legacy cellids to return int - use -1 as inernal invalid index number - set invalid indices as pd.NA when returning as dataframes - return masked recarray in points_to_cellids when indices contain NaNs - modify tests to use pd.isna() - update example notebook

wpbonelli · 2025-12-17T16:01:56Z

On that note, it's probably also a good idea to suppress dataframe warnings everywhere by passing dataframe=False everywhere gridintersect is being used.

Seems like a good compromise, pass False internally but leave the default None so directly using the APIs shows the warning.. A question is, when do we start returning dataframe over recarray. If this is a breaking change, maybe better to wait til 4.x?

On a related note, for the "layer" field, I also use floats to allow input of +inf and -inf to indicate points lie above/below the model grid. Ideally this column would be int as well, but using a similar approach to above would mean we would lose information about points lying above/below the model. What are your thoughts there?

Is this bit of information valuable for typical use cases? Or is it usually enough to know the point is outside the grid? I don't feel like I have the user perspective to have strong opinions either way

dbrakenhoff · 2025-12-17T16:22:03Z

Seems like a good compromise, pass False internally but leave the default None so directly using the APIs shows the warning.. A question is, when do we start returning dataframe over recarray. If this is a breaking change, maybe better to wait til 4.x?

Yeah, 4.x makes sense to me.

Is this bit of information valuable for typical use cases? Or is it usually enough to know the point is outside the grid? I don't feel like I have the user perspective to have strong opinions either way

I think you're right. Returning floats is just awkward. So I think I'll make it an int dtype and mask invalid values and leave the further investigations to the user.

- mask result when recarray, return NA when dataframe - update tests/nb

dbrakenhoff · 2025-12-17T17:23:08Z

Modflow 6 examples update is here: MODFLOW-ORG/modflow6-examples#318

wpbonelli

all looks good to me! I know @jlarsen-usgs was planning on having a look too though.

dbrakenhoff added 11 commits November 27, 2025 09:54

refactor(GridIntersect): deprecations

af7b92d

- deprecate method kwarg - remove keepzerolengths from intersect() - clean up deprecated methods

refactor(GridIntersect): consistency and cleanup

3dc27c4

- flip arguments in filter_query_result to be consistent with other class methods - remove get_gridshapes method() - move imports to top of file

feat(GridIntersect): fast point location method

79faf9f

- add points_to_cellids() for fast 1:1 mapping between points and grid cells, return nan when outside of grid

refactor(GridIntersect): improve naming and plotting

4107bff

- rename return_cellids to return_nodenumbers for clarity (node numbers always one number, cellid is somewhat ambiguous) - use shapely plotting for linestrings and points

remove deprecated class from util init

cbcf9c7

test(GridIntersect): update tests

bbc5ac3

- remove tests for deprecated methods - add array tests for intersects() and points_to_cellids() - activate+improve tests for 3D points

remove option handle_z="drop" from points_to_cellids()

6cddfc8

feat(GridIntersect): improve plotting and update example nb

407378f

- make staticmethods accept shapely geometries as input instead of rec-arrays - update notebook

ruff

3968ce3

dbrakenhoff self-assigned this Nov 27, 2025

dbrakenhoff added the enhancement label Nov 27, 2025

dbrakenhoff mentioned this pull request Nov 27, 2025

feature: support array with shapely geometries in GeospatialUtils #2649

Closed

This was linked to issues Nov 27, 2025

feature: faster and better point intersection in GridIntersect #2650

Open

bug: intersect method of gridintersect class not working #2603

Open

bug: GridIntersect improvements #2344

Open

dbrakenhoff mentioned this pull request Nov 27, 2025

feature: faster and better point intersection in GridIntersect #2650

Open

dbrakenhoff added 4 commits November 27, 2025 13:34

fix tests

ae92d24

ruff

2a8070e

test(GridIntersect): remove all references to deprecated method kwarg

3df8f4b

forgot one

a2bc699

refactor(GridIntersect): more cleanup

6953bfb

- replace 3 instances where new nodenumber to rowcol method should be used

wpbonelli requested a review from jlarsen-usgs December 1, 2025 14:34

dbrakenhoff added 6 commits December 17, 2025 13:36

Improve intersect output results:

d967150

- Add cellid to result and remove return_nodenumbers option. cellid now always includes cellid. - use cellid (not cellids) result internally for all operation. - Add row and col to structured result - maintain cellids column for backward compatibility

Switch output as dataframe flag to None

a20dd2d

- add warning about future default - for new method points_to_cellids, default to dataframe.

switch handle_z to boolean

78a48a1

improve gridintersect docstrings

2c751e5

- with some copilot help

improve gridintersect tests:

2efa446

- add df_toggle to silence warnings - switch handle_z to bool

Update grid intersect notebook

9b0f36b

Merge branch 'develop' into update_gridintersect

d3ad9a1

dbrakenhoff added 2 commits December 17, 2025 16:43

set dataframe kwarg in gridintersect to suppress warnings

a3db6cb

make layer column dtype int

7db052d

- mask result when recarray, return NA when dataframe - update tests/nb

wpbonelli approved these changes Dec 18, 2025

View reviewed changes

feat(GridIntersect): add array support, fast point locating and support for points with z-coordinates #2657

Are you sure you want to change the base?

feat(GridIntersect): add array support, fast point locating and support for points with z-coordinates #2657

Uh oh!

Conversation

dbrakenhoff commented Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Support for numpy arrays of shapely geometries

Fast point locating

Support for points with z-coordinates

Deprecations

Other stuff

Uh oh!

dbrakenhoff commented Nov 27, 2025

Uh oh!

aacovski commented Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dbrakenhoff commented Nov 28, 2025

Uh oh!

wpbonelli commented Dec 1, 2025

Uh oh!

wpbonelli commented Dec 1, 2025

Uh oh!

dbrakenhoff commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wpbonelli commented Dec 2, 2025

Uh oh!

dbrakenhoff commented Dec 2, 2025

Uh oh!

wpbonelli commented Dec 2, 2025

Uh oh!

dbrakenhoff commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wpbonelli commented Dec 17, 2025

Uh oh!

dbrakenhoff commented Dec 17, 2025

Uh oh!

wpbonelli commented Dec 17, 2025

Uh oh!

dbrakenhoff commented Dec 17, 2025

Uh oh!

dbrakenhoff commented Dec 17, 2025

Uh oh!

wpbonelli left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dbrakenhoff commented Nov 27, 2025 •

edited

Loading

aacovski commented Nov 27, 2025 •

edited

Loading

dbrakenhoff commented Dec 2, 2025 •

edited

Loading

dbrakenhoff commented Dec 17, 2025 •

edited

Loading