Skip to content

feat: duplicate message spam detection#4

Merged
rezhajulio merged 3 commits intomainfrom
feat/duplicate-spam-detection
Feb 26, 2026
Merged

feat: duplicate message spam detection#4
rezhajulio merged 3 commits intomainfrom
feat/duplicate-spam-detection

Conversation

@rezhajulio
Copy link
Owner

@rezhajulio rezhajulio commented Feb 24, 2026

Summary

Detects and restricts users who repeatedly paste the same message in a group chat (e.g., job spam posted 3 times in 2 minutes).

How it works

  • Tracks recent messages per (group_id, user_id) in an in-memory rolling deque
  • Normalizes text (lowercase, strip punctuation/emoji, collapse whitespace)
  • Uses difflib.SequenceMatcher similarity matching (configurable, default 0.95)
  • On >= 3 similar messages within 120s -> auto-delete + restrict user
  • Sends notification to warning topic in Indonesian

DM bypass protection

Restrictions from this handler (and existing inline keyboard spam / new user probation handlers) do NOT create a UserWarning record, so the DM unrestriction flow cannot bypass them.

Configuration (per group)

  • duplicate_spam_enabled (default: true) - Enable/disable detection
  • duplicate_spam_window_seconds (default: 120) - Time window for tracking
  • duplicate_spam_threshold (default: 3) - Messages before restricting
  • duplicate_spam_min_length (default: 20) - Min text length
  • duplicate_spam_similarity (default: 0.95) - Similarity threshold (0.0-1.0)

Similarity threshold reference

  • 0.97: Only near-exact copy-paste (very safe)
  • 0.95 (default): Catches minor word edits (safe, catches evasion)
  • 0.90: Catches messages with a few words changed (higher false positive risk)

Test coverage

  • 505 tests passing, 99% overall coverage
  • duplicate_spam.py: 100% coverage (37 new tests)
  • Updated test_config.py and test_group_config.py for new fields

Detect users who repeatedly paste the same message within a configurable
time window. On reaching the threshold (default: 3 similar messages in
120 seconds), the duplicate is deleted and the user is restricted.

Key design decisions:
- In-memory rolling deque per (group_id, user_id) for tracking
- Text normalization + difflib similarity matching (threshold 0.97)
- No UserWarning record created, so DM unrestriction flow cannot bypass
  (same pattern as inline keyboard spam and new user probation handlers)
- Configurable per group: enabled, window_seconds, threshold, min_length

Files changed:
- New: handlers/duplicate_spam.py - core detection and enforcement
- New: tests/test_duplicate_spam.py - 37 tests, 100% coverage
- Modified: constants.py - Indonesian notification templates
- Modified: group_config.py - per-group config fields + .env fallback
- Modified: config.py - Settings fields for .env support
- Modified: main.py - handler registration at group=0
- Modified: .env.example, groups.json.example - documentation
- Modified: test_config.py, test_group_config.py - coverage for new fields
Add duplicate_spam_similarity field (float, default 0.95) to GroupConfig,
Settings, and example configs. 0.95 catches minor word edits while
avoiding false positives on legitimately similar messages.
…uler tests

- Change default duplicate_spam_threshold from 3 to 2 (trigger on 2nd
  duplicate within window)
- Fix RuntimeWarning in test_scheduler: mock get_chat_member to return
  proper MagicMock user instead of AsyncMock with unawaited coroutines
@rezhajulio rezhajulio merged commit 7b920c9 into main Feb 26, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant