Skip to content

#328 - Remove annotations in a specified range#329

Merged
reckart merged 14 commits intodkpro:mainfrom
dmnc-grdnr:feature/328-remove-annotations
Mar 17, 2026
Merged

#328 - Remove annotations in a specified range#329
reckart merged 14 commits intodkpro:mainfrom
dmnc-grdnr:feature/328-remove-annotations

Conversation

@dmnc-grdnr
Copy link
Copy Markdown
Contributor

@dmnc-grdnr dmnc-grdnr commented Aug 10, 2025

This PR adds the following convenience functions to the cas object:

  • crop_sofa_string -> replaces the sofa string with a cutout of the original as specified by the index range and removes all annotations in the cas that are outside of that specified range
  • remove_annotations_in_range -> removes all annoations (of one specified type) inside of the specified index range of the sofa string

The methods can be called on any cas object directly.

Tests for each function was added to test_cas.py

- added functions for removing annoations from cas
- cut_sofa_string_to_range
- remove_annotations_in_range
- added tests
@reckart
Copy link
Copy Markdown
Member

reckart commented Aug 10, 2025

Thanks for the PR.

"Normally" (i.e. in particular in Java UIMA), the sofa is considered to be immutable once set.

Have you considered generating a new CAS that contains only the cut-out section instead of doing an in-place modification?

@dmnc-grdnr
Copy link
Copy Markdown
Contributor Author

I didn't expect such a quick response, thanks :)
Maybe I have overlooked something, but how can you only "partially" read a CAS file or only the section of interest?
Because I don't generate the CAS files I use myself, I am looking for a way to cutout a specific part of it without having to manually modify the original CAS file.

@reckart reckart added this to the 0.11.0 milestone Nov 10, 2025
@reckart reckart requested a review from Copilot November 10, 2025 19:43
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds two convenience methods to the Cas class for managing annotations based on their position in the sofa string: cut_sofa_string_to_range and remove_in_range.

Key changes:

  • cut_sofa_string_to_range: Cuts the sofa string to a specified range and removes/adjusts annotations accordingly
  • remove_in_range: Removes annotations within a specified range, optionally filtered by type
  • Comprehensive test coverage for both methods with different scenarios

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.

File Description
cassis/cas.py Adds two new methods: cut_sofa_string_to_range for cutting sofa string with annotation adjustment, and remove_in_range for removing annotations in a range
tests/test_cas.py Adds four test functions covering different scenarios for both new methods

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@dmnc-grdnr
Copy link
Copy Markdown
Contributor Author

I agree with most suggestions, with the exception of the issue iteration/remove (384-398). But I am, again, unsure how to proceed from here on :)

@reckart
Copy link
Copy Markdown
Member

reckart commented Nov 28, 2025

Thanks again :) Same as with the other PR:

  • make changes
  • make regular commit (no amend, no force push)

This change is a tricky one because you can have annotations that are transitively referenced through a feature of an indexed feature structure, not that themselves are not on the index. That means if you use only select_all(), you would not find them. You would have to use _find_all_fs like in the other PR to actually find those. And then the question is: if you have an annotation that is transitively referenced that would end up being outside of the document, should it be removed entirely and unlinked from its referent - or should it remain but its position be collapsed to the start or end of the sofa? Or possibly something else entirely?

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 10 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@reckart
Copy link
Copy Markdown
Member

reckart commented Jan 27, 2026

@zesch Do you have an opinion on this?

if you have an annotation that is transitively referenced that would end up being outside of the document, should it be removed entirely and unlinked from its referent - or should it remain but its position be collapsed to the start or end of the sofa? Or possibly something else entirely?

@zesch
Copy link
Copy Markdown
Member

zesch commented Jan 27, 2026

@zesch Do you have an opinion on this?

if you have an annotation that is transitively referenced that would end up being outside of the document, should it be removed entirely and unlinked from its referent - or should it remain but its position be collapsed to the start or end of the sofa? Or possibly something else entirely?

I would vote for removing it entirely.

reckart added 8 commits March 17, 2026 15:23
- Defensive copy of list before removal
- Remove debug printf
- Improve test assertion
- Rename parameters for consistency
- Improve test assertion
- Rename parameters for consistency
- Use overlap predicate in test
- Fixed docstring
- Additional testing
- Added warning to the cut_sofa_string_to_range function.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 7 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

reckart added 2 commits March 17, 2026 19:16
- Generate proper error when there is no sofa string
- Fix signature
- Remove duplicate variable declaration
- Expanded test and fixed bug
- Another guard against unset sofa string
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

- Improve tests
- Improve handling of non-annotation types
@reckart reckart merged commit 1e5b9af into dkpro:main Mar 17, 2026
@reckart
Copy link
Copy Markdown
Member

reckart commented Mar 17, 2026

@dmnc-grdnr ok, I have taken some executive decisions to wrap this up now - otherwise we'll ping pong this forever. Probably I should generally be more proactive of taking over and finishing up PRs.

Thanks for your contribution!

@dmnc-grdnr dmnc-grdnr deleted the feature/328-remove-annotations branch April 3, 2026 10:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants