Privacy scrubbing for GUI automation data - PII/PHI detection and redaction.
pip install openadapt-privacyFor Presidio-based scrubbing (recommended):
pip install openadapt-privacy[presidio]
python -m spacy download en_core_web_trffrom openadapt_privacy.providers.presidio import PresidioScrubbingProvider
scrubber = PresidioScrubbingProvider()
text = "Contact John Smith at [email protected] or 555-123-4567"
scrubbed = scrubber.scrub_text(text)Input:
Contact John Smith at [email protected] or 555-123-4567
Output:
Contact <PERSON> at <EMAIL_ADDRESS> or <PHONE_NUMBER>
| Input | Output |
|---|---|
My email is [email protected] |
My email is <EMAIL_ADDRESS> |
SSN: 923-45-6789 |
SSN: <US_SSN> |
Card: 4532-1234-5678-9012 |
Card: <CREDIT_CARD> |
Call me at 555-123-4567 |
Call me at <PHONE_NUMBER> |
DOB: 01/15/1985 |
DOB: <DATE_TIME> |
Contact John Smith |
Contact <PERSON> |
Scrub PII from nested dictionaries (e.g., GUI element trees):
from openadapt_privacy import scrub_dict
from openadapt_privacy.providers.presidio import PresidioScrubbingProvider
scrubber = PresidioScrubbingProvider()
action = {
"text": "Email: [email protected]",
"metadata": {
"title": "User Profile - John Smith",
"tooltip": "Click to contact [email protected]",
},
"coordinates": {"x": 100, "y": 200},
}
scrubbed = scrub_dict(action, scrubber)Input:
{
"text": "Email: [email protected]",
"metadata": {
"title": "User Profile - John Smith",
"tooltip": "Click to contact [email protected]"
},
"coordinates": {"x": 100, "y": 200}
}Output:
{
"text": "Email: <EMAIL_ADDRESS>",
"metadata": {
"title": "User Profile - <PERSON>",
"tooltip": "Click to contact <EMAIL_ADDRESS>"
},
"coordinates": {"x": 100, "y": 200}
}Process complete GUI automation recordings:
from openadapt_privacy import DictRecordingLoader
from openadapt_privacy.providers.presidio import PresidioScrubbingProvider
scrubber = PresidioScrubbingProvider()
loader = DictRecordingLoader()
recording = loader.load_from_dict({
"task_description": "Send email to John Smith at [email protected]",
"actions": [
{"id": 1, "action_type": "click", "text": "Compose", "timestamp": 1000},
{"id": 2, "action_type": "type", "text": "[email protected]", "timestamp": 2000},
{"id": 3, "action_type": "click", "text": "Send", "window_title": "Email to [email protected]", "timestamp": 3000},
],
})
scrubbed = recording.scrub(scrubber)Input Recording:
task_description: "Send email to John Smith at [email protected]"
actions:
[1] click: "Compose"
[2] type: "[email protected]"
[3] click: "Send" (window: "Email to [email protected]")
Output Recording:
task_description: "Send email to <PERSON> at <EMAIL_ADDRESS>"
actions:
[1] click: "Compose"
[2] type: "<EMAIL_ADDRESS>"
[3] click: "Send" (window: "Email to <EMAIL_ADDRESS>")
Redact PII from screenshots using OCR + NER:
from PIL import Image
from openadapt_privacy.providers.presidio import PresidioScrubbingProvider
scrubber = PresidioScrubbingProvider()
image = Image.open("screenshot.png")
scrubbed_image = scrubber.scrub_image(image)
scrubbed_image.save("screenshot_scrubbed.png")Input Screenshot:
Output Screenshot:
The image redactor:
- Runs OCR to detect text regions
- Analyzes text for PII entities (email, phone, SSN, etc.)
- Fills detected PII regions with solid color (configurable, default: red)
Implement your own loader for custom storage formats:
from openadapt_privacy import RecordingLoader, Recording
class SQLiteRecordingLoader(RecordingLoader):
def __init__(self, db_path: str):
self.db_path = db_path
def load(self, recording_id: str) -> Recording:
# Load from SQLite database
...
def save(self, recording: Recording, recording_id: str) -> None:
# Save to SQLite database
...
# Usage
loader = SQLiteRecordingLoader("recordings.db")
scrubber = PresidioScrubbingProvider()
# Load, scrub, and save
scrubbed = loader.load_and_scrub("recording_001", scrubber)
loader.save(scrubbed, "recording_001_scrubbed")from openadapt_privacy.config import PrivacyConfig
custom_config = PrivacyConfig(
SCRUB_CHAR="X", # Character for scrub_text_all
SCRUB_FILL_COLOR=0xFF0000, # Red for image redaction (BGR)
SCRUB_KEYS_HTML=[ # Keys to scrub in dicts
"text", "value", "title", "tooltip", "custom_field"
],
SCRUB_PRESIDIO_IGNORE_ENTITIES=[ # Entity types to skip
"DATE_TIME",
],
)| Entity | Example Input | Example Output |
|---|---|---|
PERSON |
John Smith |
<PERSON> |
EMAIL_ADDRESS |
[email protected] |
<EMAIL_ADDRESS> |
PHONE_NUMBER |
555-123-4567 |
<PHONE_NUMBER> |
US_SSN |
923-45-6789 |
<US_SSN> |
CREDIT_CARD |
4532-1234-5678-9012 |
<CREDIT_CARD> |
US_BANK_NUMBER |
635526789012 |
<US_BANK_NUMBER> |
US_DRIVER_LICENSE |
A123-456-789-012 |
<US_DRIVER_LICENSE> |
DATE_TIME |
01/15/1985 |
<DATE_TIME> |
LOCATION |
Toronto, ON |
<LOCATION> |
openadapt_privacy/
├── base.py # ScrubbingProvider, TextScrubbingMixin
├── config.py # PrivacyConfig dataclass
├── loaders.py # Recording, Action, Screenshot, RecordingLoader
├── providers/
│ ├── __init__.py # ScrubProvider registry
│ └── presidio.py # PresidioScrubbingProvider
└── pipelines/
└── dicts.py # scrub_dict, scrub_list_dicts
MIT

