Skip to content

Add GLiNER PII redactor to codebase #1210

@sarahyurick

Description

@sarahyurick

In #1208, we added a basic tutorial for how to wrap a ProcessingStage around a GLiNER PII model. We should consider adding it as a fully supported model by Curator. This means it will be a CompositeStage with a CPU tokenization stage, a GPU model inference stage, a postprocessing stage, etc.

For more context: the reason we did not do it for the initial tutorial is because GLiNER tokenization involves computing more fields besides the token IDs and attention masks (the extra fields are related to span indices, etc. for entities). This can be problematic as the data flow between the tokenizer stage and the model stage can be quite large. We will need some more exploration and planning to help determine the best way to fully support GLiNER PII redaction.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions