-
Notifications
You must be signed in to change notification settings - Fork 187
Description
In #1208, we added a basic tutorial for how to wrap a ProcessingStage around a GLiNER PII model. We should consider adding it as a fully supported model by Curator. This means it will be a CompositeStage with a CPU tokenization stage, a GPU model inference stage, a postprocessing stage, etc.
For more context: the reason we did not do it for the initial tutorial is because GLiNER tokenization involves computing more fields besides the token IDs and attention masks (the extra fields are related to span indices, etc. for entities). This can be problematic as the data flow between the tokenizer stage and the model stage can be quite large. We will need some more exploration and planning to help determine the best way to fully support GLiNER PII redaction.