Skip to content

Conversation

@dbrakenhoff
Copy link
Contributor

@dbrakenhoff dbrakenhoff commented Nov 27, 2025

Major update of GridIntersect module.

@Reviewer(s), I have a few things I wonder about, especially the naming of things:

  • A new method called points_to_cellids() for locating points in the grid. Is this name clear? And is cellid the correct name for (row, column) or (cellid,) output?
  • I use "cellids" as a field name in the result but when return_nodenumbers=True, the intersects result can also return node numbers instead of (row, columns) for structured grids.
  • Anyway, a critical look at the naming of stuff in the module would be greatly appreciated!

Support for numpy arrays of shapely geometries

Introduces support for arrays of shapely geometries in GridIntersect.intersects() method.

gi = GridIntersect(grid)
shapes = shapely.polygons(...)
gi.intersects(shapes)

Returns rec-array with a shp_id field containing the index of the original shape in shapes and the cellids of the candidate cells the polygons intersect with, for example:

rec.array([(0, (np.int64(1), np.int64(0))),
           (0, (np.int64(1), np.int64(1))),
           (1, (np.int64(0), np.int64(0))),
           (1, (np.int64(0), np.int64(1)))],
          dtype=[('shp_ids', '<i8'), ('cellids', 'O')])

So in that example output polygon 0 intersects with 2 grid cells: (1, 0) and (1, 1), and polygon 1 intersects with 2 grid cells: (0, 0), and (0, 1).

Fast point locating

GridIntersect was missing a method to obtain one cellid for each input point. The implementation of GridIntersect.intersects() drops all points outside the model grid, and potentially returns all intersection results for each point, meaning there is no 1:1 output between input points and output cellids. The GridIntersect.intersect() method allows a MultiPoint as input, but is inefficient, and groups points in the same cell as MultiPoints, also losing the 1:1 relation.

So a new method is introduced GridIntersect.points_to_cellids(). This method guarantees len(input) == len(output). Cellids are returned for each point and NaN is returned if the point is outside the model grid. For points on boundaries the lowest grid cellid is returned.

Support for points with z-coordinates

Adds support for points with z-coordinates in GridIntersect.intersect() and GridIntersect.points_to_cellids() with the handle_z-kwarg.

  • The default is "ignore" which ignores the z-coordinates, similar to what GridIntersect did before.
  • The second option is "drop" which removes the intersection results if the point lies above or below the model grid. This option is only included in intersect().
  • The final option is "return" which returns the computed layer position for each point. This returns NaN if the point is outside the model domain, +inf if the point is above the model and -inf the point is below the model.

Deprecations

Deprecates everything related to method="structured"

Other stuff

  • Cleaned up the module somewhat with some renaming since the structured methods dropped out
  • Used shapely plotting routines for linestrings and points to simplify plotting calls, make static plot methods accept shapely geometries instead of rec-array results so they can also be used to plot input geometries.
  • Update tests to test new features and clean up old tests
  • Update example notebook.

- deprecate method kwarg
- remove keepzerolengths from intersect()
- clean up deprecated methods
…ts()

- allow arrays of numpy arrays in intersects(), returns all candidate cells for intersection per geometry
- add predicate to query_grid()
- simplify names ofintersection methods, deprecate old internal methods
- move logic to translate node number to rowcol to separate method
- flip arguments in filter_query_result to be consistent with other class methods
- remove get_gridshapes method()
- move imports to top of file
- add points_to_cellids() for fast 1:1 mapping between points and grid cells, return nan when outside of grid
- rename return_cellids to return_nodenumbers for clarity (node numbers always one number, cellid is somewhat ambiguous)
- use shapely plotting for linestrings and points
- add handle_z options: "ignore", "drop" and "return" to intersect() and points_to_cellids()
- add method to compute layer position for z coordinates
- improve _intersect_point() method by using intersects as starting point.
- remove tests for deprecated methods
- add array tests for intersects() and points_to_cellids()
- activate+improve tests for 3D points
- make staticmethods accept shapely geometries as input instead of rec-arrays
- update notebook
@dbrakenhoff
Copy link
Contributor Author

Tests failing because MF6 examples still use method kwarg. Getting tests to pass, will require a change in modflow6-examples repo:

MODFLOW-ORG/modflow6-examples#315

- replace 3 instances where new nodenumber to rowcol method should be used
@aacovski
Copy link
Contributor

aacovski commented Nov 27, 2025

@dbrakenhoff First off, apologies for overlapping with your work...

searchsorted (structured) and kdtree (vertex, unstructured) in my PR already provides 1:1 point-to-cell mapping - each query point returns exactly one cell (or nan if outside, can change from nan). We could add wrappers or extend grid.intersect to use this approach while maintaining the existing API... So I'm a bit confused by the timing of this PR.

The key difference from shapely: shapely handles non-convex cell geometries, whereas our approach assumes convex hulls. This assumption is valid for typical MODFLOW grids (rectangles, Voronoi cells, triangles) and enables significant simplifications and speedups. More importantly, the convex hull approach generalizes cleanly to higher-dimensional shapes - the same hyperplane equations can work for lines, polygons, and polyhedrons without needing shapely at all. I can work with you in the future once I'm done with the point to cell PR for higher-dimensional shapes.

@dbrakenhoff
Copy link
Contributor Author

@dbrakenhoff First off, apologies for overlapping with your work...

No worries, there were already 2 places in flopy with grid intersection operations, so our simultaneous work hasn't changed that, but they have become slightly more similar now that they both support arrays of points.

searchsorted (structured) and kdtree (vertex, unstructured) in my PR already provides 1:1 point-to-cell mapping - each query point returns exactly one cell (or nan if outside, can change from nan). We could add wrappers or extend grid.intersect to use this approach while maintaining the existing API... So I'm a bit confused by the timing of this PR.

I realize there is some overlap with your work, especially between the new points_to_cellids() method and the array-based grid.intersect() methods you built for grids. The inefficient method for intersecting points in GridIntersect has bugged me for quite some time, plus as you can see in the list of issues and the TODOs that were fixed in the code itself, there was a bunch of work we still wanted to do on the module, which is all included in this PR. Also I felt it was weird for GridIntersect to have good support for linestrings and polygons but not have a decent implementation for points. Yes I could have just used self.mfgrid.intersect() but since the whole GridIntersect module was written around shapely, I wanted to stay with shapely for now, and see what I could achieve with those tools. Shapely's approach might have some limitations, but it seems very fast, and very flexible for the most common intersection operations.

So the questions now are, do we continue to let the two live side-by-side? Or do we integrate them and implement grid.intersect() in GridIntersect for point intersection calls (I would be curious to see how the performance compares?) ? But this might also be more of a question for flopy4, where a significant redesign of flopy will take place. And in the meantime we can enjoy having two intersection methods :).

@wpbonelli
Copy link
Member

do we continue to let the two live side-by-side?

I think this is fine. 4.x is a natural time for consolidation. at some point we'll need to make the decision which to delegate to for xarray spatial queries e.g. stuff like model.grid.head.sel(x=250, y=650, z=50, method='nearest') right? but unless you guys want to try the integration sooner I agree with @dbrakenhoff that

we can enjoy having two intersection methods :)

Other comments

A new method called points_to_cellids() for locating points in the grid. Is this name clear?

Clear to me, I like it

And is cellid the correct name for (row, column) or (cellid,) output?

Yes, but re:

"cellids" as a field name in the result but when return_nodenumbers=True, the intersects result can also return node numbers instead of (row, columns) for structured grids

while I like the cellid/nn toggle, I wonder if it should return the elements of cellid as separate columns. While people may be used to seeing cell ID tuples as a single recarray column we have had a lot of internal discussion about how awkward it is and are thinking of always flattening them into separate columns in 4.x. as this parameter is new I'm tempted to say we are free to shed old baggage here

Relatedly, maybe singular instead of plural naming for the columns i.e. shp_ids -> shp_id etc?

Adds support for points with z-coordinates in GridIntersect.intersect() and GridIntersect.points_to_cellids() with the handle_z-kwarg. The default is "ignore" which ignores the z-coordinates, similar to what GridIntersect did before.

Just wondering if there is a motivation for "ignore" besides backwards-compatibility? I guess that is sufficient cause but it seems nicer if this could be a bool switch instead of a string. maybe something for 4.x

@wpbonelli
Copy link
Member

maybe also consider returning pd.DataFrame instead of np.recarray as pandas is now a core dependency?

@dbrakenhoff
Copy link
Contributor Author

dbrakenhoff commented Dec 2, 2025

Thanks for the comments @wpbonelli! I'll prepare some fixes based on your comments:

  • return structured result with additional "row" and "col" columns, vertex result with "cellid" column. I will continue to return the original result in "cellids" so old code doesn't break...
  • return pd.DataFrame instead of recarray. This is something of a breaking change since the return type will change, but since the syntax for accessing the columns is still sort of the same, i think it is okay to default to pandas DataFrames? So basically we flip the dataframe. Perhaps I can set the kwarg to None and give a DeprecationWarning and return a DataFrame, allow silencing through dataframe=True and old behavior through dataframe=False.
  • I can switch handle_z to be boolean and effectively only support "ignore" and "return". I added "drop" to automatically drop the NaNs for points above/below the model grid, but we can leave that to the user. Which should be easy when the return object is a pandas.DataFrame.
  • I will convert "shp_id" to singular naming, but the others were already plural, so changing them will break peoples scripts... So I guess those others: "cellids"--> "cellid", "ixshapes" --> "geometry" are something for 4.x?

And yea, it makes sense to defer the integration to 4.x. I think @aacovski's work more naturally fits with the xarray implementation, whereas GridIntersect() might continue as a utility with a vector based approach for linestrings and polygons. For point intersections the two are very similar, so it would make sense to maybe defer those operations to the new modelgrid object. If there are performance gains when computing grid cell intersection candidates using the new modelgrid object, we can use the grid to filter the candidates, and then run the intersection computations. But we'll figure that out when we get there :).

As for this code: model.grid.head.sel(x=250, y=650, z=50, method='nearest') is great for getting the nearest cell center, but doesn't guarantee you're selecting the cell the point (x,y) is in, e.g. in quadtree grids at the refinement boundaries, so that might be something to think about.

@wpbonelli
Copy link
Member

return pd.DataFrame instead of recarray. This is something of a breaking change since the return type will change, but since the syntax for accessing the columns is still sort of the same, i think it is okay to default to pandas DataFrames? So basically we flip the dataframe. Perhaps I can set the kwarg to None and give a DeprecationWarning and return a DataFrame, allow silencing through dataframe=True and old behavior through dataframe=False.

Yeah since dataframe syntax is basically a superset of recarray syntax I think this is fine. Sounds good to me though I will admit I am not a frequent user of these APIs so am not personally much affected.

I can switch handle_z to be boolean and effectively only support "ignore" and "return". I added "drop" to automatically drop the NaNs for points above/below the model grid, but we can leave that to the user. Which should be easy when the return object is a pandas.DataFrame.

If you think it will be a common case to want to drop NaNs I can see the case for the string, no strong feelings on this. But still a bool feels cleaner and it is easy for the caller to drop them.

I will convert "shp_id" to singular naming, but the others were already plural, so changing them will break peoples scripts... So I guess those others: "cellids"--> "cellid", "ixshapes" --> "geometry" are something for 4.x?

Ah, ok- yeah, then that's for 4.x.

As for this code: model.grid.head.sel(x=250, y=650, z=50, method='nearest') is great for getting the nearest cell center, but doesn't guarantee you're selecting the cell the point (x,y) is in, e.g. in quadtree grids at the refinement boundaries, so that might be something to think about.

Do you mean smack on the boundary? In that case could we just use the same lowest ID tie-breaking rule? Or am I misunderstanding- how is a refinement boundary different from any shared cell boundary in this regard?

@dbrakenhoff
Copy link
Contributor Author

Do you mean smack on the boundary? In that case could we just use the same lowest ID tie-breaking rule? Or am I misunderstanding- how is a refinement boundary different from any shared cell boundary in this regard?

I meant this situation. Not necessarily a problem but getting the nearest value and getting the exact cellid your point lies in can be different.

image

@wpbonelli
Copy link
Member

Ahh. thanks. nearest cell center. I was confusing that with some notion of "nearest cell" which I recognize now is ill-defined absent a projection scheme I guess..

- Add cellid to result and remove return_nodenumbers option. cellid now always includes cellid.
- use cellid (not cellids) result internally for all operation.
- Add row and col to structured result
- maintain cellids column for backward compatibility
- add warning about future default
- for new method points_to_cellids, default to dataframe.
- with some copilot help
- add df_toggle to silence warnings
- switch handle_z to bool
@dbrakenhoff
Copy link
Contributor Author

dbrakenhoff commented Dec 17, 2025

I think this is ready for review? Some brief comments on final changes:

  • Results now contain cellid column for vertex grids, and cellid, row and col for structured grids. This means the earlier return_nodenumbers option became unnecessary, since the cellid column always contains the cellid/node number.
  • The legacy cellids column is kept for backwards compatibility and contains (row, col) tuples for structured grids
  • handle_z is now a boolean
  • I chose to set the dataframe return options to None, and raise a warning that defaults will change. For now it maintains the old behavior, to ensure no code is broken. Setting the dataframe keyword to True/False silences the warning. This may cause people to see quite a few warnings with the next version of flopy if they're using GridIntersect. Let me know if this is undesirable and maybe we can think of slightly less noisy way of doing this?
  • Columns with indices are set to dtype int where possible. For points_to_cellids the dtype remains float to allow for NaNs (point outside grid) in the result.

@wpbonelli
Copy link
Member

Columns with indices are set to dtype int where possible. For points_to_cellids the dtype remains float to allow for NaNs (point outside grid) in the result.

I don't know what the conventional way to handle this is, but I assume we aren't the first to run into the problem. I guess the pure-numpy alternatives are sentinal values and masked arrays. Sentinels are awkward and feel wrong. Masked arrays seem like a decent solution but maybe those have limitations I'm not aware of?

Once we start returning pandas by default, pd.NA (which works for integer arrays) becomes an option?

It looks like the elements of the legacy cellids column are now floats too. This is why the model splitter test is failing.

@dbrakenhoff
Copy link
Contributor Author

It looks like the elements of the legacy cellids column are now floats too. This is why the model splitter test is failing.

Good catch, fix incoming.

On that note, it's probably also a good idea to suppress dataframe warnings everywhere by passing dataframe=False everywhere gridintersect is being used.

Masked arrays seem like a decent solution but maybe those have limitations I'm not aware of?

I think that is an acceptable solution. The only place we allow NaNs in the result is in points_to_cellids(), which is a new method, where we default to returning a dataframe. If someone really wants a recarray, they now get a masked recarray.

On a related note, for the "layer" field, I also use floats to allow input of +inf and -inf to indicate points lie above/below the model grid. Ideally this column would be int as well, but using a similar approach to above would mean we would lose information about points lying above/below the model. What are your thoughts there?

- fix legacy cellids to return int
- use -1 as inernal invalid index number
- set invalid indices as pd.NA when returning as dataframes
- return masked recarray in points_to_cellids when indices contain NaNs
- modify tests to use pd.isna()
- update example notebook
@wpbonelli
Copy link
Member

On that note, it's probably also a good idea to suppress dataframe warnings everywhere by passing dataframe=False everywhere gridintersect is being used.

Seems like a good compromise, pass False internally but leave the default None so directly using the APIs shows the warning.. A question is, when do we start returning dataframe over recarray. If this is a breaking change, maybe better to wait til 4.x?

On a related note, for the "layer" field, I also use floats to allow input of +inf and -inf to indicate points lie above/below the model grid. Ideally this column would be int as well, but using a similar approach to above would mean we would lose information about points lying above/below the model. What are your thoughts there?

Is this bit of information valuable for typical use cases? Or is it usually enough to know the point is outside the grid? I don't feel like I have the user perspective to have strong opinions either way

@dbrakenhoff
Copy link
Contributor Author

Seems like a good compromise, pass False internally but leave the default None so directly using the APIs shows the warning.. A question is, when do we start returning dataframe over recarray. If this is a breaking change, maybe better to wait til 4.x?

Yeah, 4.x makes sense to me.

Is this bit of information valuable for typical use cases? Or is it usually enough to know the point is outside the grid? I don't feel like I have the user perspective to have strong opinions either way

I think you're right. Returning floats is just awkward. So I think I'll make it an int dtype and mask invalid values and leave the further investigations to the user.

- mask result when recarray, return NA when dataframe
- update tests/nb
@dbrakenhoff
Copy link
Contributor Author

Modflow 6 examples update is here: MODFLOW-ORG/modflow6-examples#318

Copy link
Member

@wpbonelli wpbonelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all looks good to me! I know @jlarsen-usgs was planning on having a look too though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

4 participants