feat: duplicate message spam detection by rezhajulio · Pull Request #4 · rezhajulio/PythonID-bot

rezhajulio · 2026-02-24T11:38:27Z

Summary

Detects and restricts users who repeatedly paste the same message in a group chat (e.g., job spam posted 3 times in 2 minutes).

How it works

Tracks recent messages per (group_id, user_id) in an in-memory rolling deque
Normalizes text (lowercase, strip punctuation/emoji, collapse whitespace)
Uses difflib.SequenceMatcher similarity matching (configurable, default 0.95)
On >= 3 similar messages within 120s -> auto-delete + restrict user
Sends notification to warning topic in Indonesian

DM bypass protection

Restrictions from this handler (and existing inline keyboard spam / new user probation handlers) do NOT create a UserWarning record, so the DM unrestriction flow cannot bypass them.

Configuration (per group)

duplicate_spam_enabled (default: true) - Enable/disable detection
duplicate_spam_window_seconds (default: 120) - Time window for tracking
duplicate_spam_threshold (default: 3) - Messages before restricting
duplicate_spam_min_length (default: 20) - Min text length
duplicate_spam_similarity (default: 0.95) - Similarity threshold (0.0-1.0)

Similarity threshold reference

0.97: Only near-exact copy-paste (very safe)
0.95 (default): Catches minor word edits (safe, catches evasion)
0.90: Catches messages with a few words changed (higher false positive risk)

Test coverage

505 tests passing, 99% overall coverage
duplicate_spam.py: 100% coverage (37 new tests)
Updated test_config.py and test_group_config.py for new fields

Detect users who repeatedly paste the same message within a configurable time window. On reaching the threshold (default: 3 similar messages in 120 seconds), the duplicate is deleted and the user is restricted. Key design decisions: - In-memory rolling deque per (group_id, user_id) for tracking - Text normalization + difflib similarity matching (threshold 0.97) - No UserWarning record created, so DM unrestriction flow cannot bypass (same pattern as inline keyboard spam and new user probation handlers) - Configurable per group: enabled, window_seconds, threshold, min_length Files changed: - New: handlers/duplicate_spam.py - core detection and enforcement - New: tests/test_duplicate_spam.py - 37 tests, 100% coverage - Modified: constants.py - Indonesian notification templates - Modified: group_config.py - per-group config fields + .env fallback - Modified: config.py - Settings fields for .env support - Modified: main.py - handler registration at group=0 - Modified: .env.example, groups.json.example - documentation - Modified: test_config.py, test_group_config.py - coverage for new fields

Add duplicate_spam_similarity field (float, default 0.95) to GroupConfig, Settings, and example configs. 0.95 catches minor word edits while avoiding false positives on legitimately similar messages.

…uler tests - Change default duplicate_spam_threshold from 3 to 2 (trigger on 2nd duplicate within window) - Fix RuntimeWarning in test_scheduler: mock get_chat_member to return proper MagicMock user instead of AsyncMock with unawaited coroutines

rezhajulio added 3 commits February 24, 2026 18:38

feat: make similarity threshold configurable, default to 0.95

8e94605

Add duplicate_spam_similarity field (float, default 0.95) to GroupConfig, Settings, and example configs. 0.95 catches minor word edits while avoiding false positives on legitimately similar messages.

rezhajulio merged commit 7b920c9 into main Feb 26, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: duplicate message spam detection#4

feat: duplicate message spam detection#4
rezhajulio merged 3 commits intomainfrom
feat/duplicate-spam-detection

rezhajulio commented Feb 24, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rezhajulio commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

How it works

DM bypass protection

Configuration (per group)

Similarity threshold reference

Test coverage

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

rezhajulio commented Feb 24, 2026 •

edited

Loading