Skip to content

feat(gateway): implement hybrid LINE reply/push strategy#608

Merged
thepagent merged 15 commits intoopenabdev:mainfrom
iamninihuang:feat/line-hybrid-gateway
Apr 28, 2026
Merged

feat(gateway): implement hybrid LINE reply/push strategy#608
thepagent merged 15 commits intoopenabdev:mainfrom
iamninihuang:feat/line-hybrid-gateway

Conversation

@iamninihuang
Copy link
Copy Markdown
Contributor

@iamninihuang iamninihuang commented Apr 28, 2026

Problem

The current LINE gateway implementation relies solely on the Push Message API. This consumes the monthly free quota (typically 200 messages) very quickly, making it unsuitable for active bots.

Solution

Implemented a stateful hybrid messaging strategy in the gateway:

  • Reply Token Caching: Webhook events capture replyToken and store them in a thread-safe memory cache (TTL 50s, max 10k entries).
  • Hybrid Dispatch: Prioritizes the free Reply API; falls back to Push API if the token is missing or expired.
  • Duplicate Safety: Non-token-specific errors (5xx, network) do NOT fall back to Push to avoid duplicate delivery.
  • Cache Sweep: Background task cleans expired entries every 50s.

Design

LINE User sends message
        │
        ▼
┌─────────────────────────────────────────────────────┐
│  Gateway: POST /webhook/line                        │
│                                                     │
│  1. Verify HMAC signature                           │
│  2. Generate event_id = "evt_<uuid>"                │
│  3. Cache: event_id → (replyToken, timestamp)       │
│  4. Send GatewayEvent { event_id } to OAB via WS    │
└──────────────────────┬──────────────────────────────┘
                       │ WebSocket
                       ▼
┌─────────────────────────────────────────────────────┐
│  OAB Core                                           │
│                                                     │
│  1. Receive GatewayEvent                            │
│  2. Store event_id in ChannelRef.origin_event_id    │
│     (#619)                                          │
│  3. Process message → agent generates response      │
│  4. send_message() builds GatewayReply:             │
│     reply_to = origin_event_id = "evt_<uuid>"       │
└──────────────────────┬──────────────────────────────┘
                       │ WebSocket
                       ▼
┌─────────────────────────────────────────────────────┐
│  Gateway: Hybrid Dispatch                           │
│                                                     │
│  reply.reply_to = "evt_<uuid>"                      │
│       │                                             │
│       ▼                                             │
│  cache.remove("evt_<uuid>")                         │
│       │                                             │
│       ├── HIT + fresh (< 50s)                       │
│       │     └─→ LINE Reply API (FREE) ✅             │
│       │           │                                 │
│       │           ├── 200 OK → done                 │
│       │           ├── 400 token unusable → Push ↓   │
│       │           └── 5xx/network → do NOT Push ⚠️  │
│       │                                             │
│       ├── HIT + expired (> 50s)                     │
│       │     └─→ Push API (quota) ↓                  │
│       │                                             │
│       └── MISS (no token cached)                    │
│             └─→ Push API (quota) ↓                  │
│                                                     │
│  Push API: POST v2/bot/message/push                 │
│  (fallback, consumes monthly quota)                 │
└─────────────────────────────────────────────────────┘

Core dependency

Requires PR #619 (merged) which propagates event_id through ChannelRef.origin_event_id to GatewayReply.reply_to, enabling the gateway to correlate replies with cached reply tokens.

Review fixes applied

  • Fixed operator precedence bug in Reply API 400 fallback logic
  • Changed lock().unwrap() to lock().unwrap_or_else(|e| e.into_inner()) for mutex poison safety
  • Added cache size bound (10k max) to prevent unbounded memory growth
  • Removed docs/feature-requests/line-reply-api.md (feature is shipped, ADR covers it)

Closes #607

https://discord.com/channels/1491295327620169908/1491365150664560881/1498518933198737429

@iamninihuang iamninihuang requested a review from thepagent as a code owner April 28, 2026 02:47
@github-actions github-actions Bot added closing-soon PR missing Discord Discussion URL — will auto-close in 3 days pending-screening PR awaiting automated screening labels Apr 28, 2026
Copy link
Copy Markdown
Contributor

@shaun-agent shaun-agent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR #608 Screening

Verdict: changes requested / do not merge yet.

PR: #608
Reference PR checked: #492


Findings

Blocker: replies can attach to the wrong LINE message under same-channel concurrency.

gateway/src/main.rs:452 auto-fills empty reply_to from the last event seen for that channel. But normal gateway sends still emit empty reply_to in src/gateway.rs:139, while inbound trigger state only preserves message_id, not event_id, at src/gateway.rs:382.

If two LINE messages arrive in the same group/user before the first agent reply returns, the first reply can consume the second message's reply token. That is a real correctness bug. The fix should carry explicit per-message reply context through the adapter, not infer it from "latest event".


Major: cache mutex is held across the LINE Reply API network call.

gateway/src/main.rs:581 locks reply_cache, removes the entry, then awaits .send() before the lock scope ends. That serializes all token cache access behind a network request, including webhook-side inserts. Remove the cache entry into a local variable, drop the lock, then call LINE.


Security/process footgun: run_gateway.sh introduces secrets.env, but it is not ignored.

run_gateway.sh:9 sources secrets.env; .gitignore does not ignore it. Either remove this local helper from the PR, move it under docs/examples, or add secrets.env to .gitignore.


Formatting fails on the changed gateway file.

rustfmt --edition 2021 --check gateway/src/main.rs reports diffs in the new code. Small, but should be fixed before merge.

chaodu-agent pushed a commit to iamninihuang/openab that referenced this pull request Apr 28, 2026
- Remove auto-fill reply_to mechanism that could attach replies to wrong
  LINE messages under concurrent traffic (blocker from shaun-agent,
  超渡法師, 普渡法師). If OAB sends empty reply_to, gateway now correctly
  falls back to Push API instead of guessing the latest event.
- Extract reply token from cache before making HTTP call so Mutex is not
  held across network I/O (blocker from shaun-agent, 超渡法師, 普渡法師).
- Unify sweep TTL with handler TTL (REPLY_TOKEN_TTL_SECS) to eliminate
  dead-weight cache entries in 50-60s range (普渡法師).
- Add secrets.env to .gitignore (shaun-agent, 超渡法師).
- Run rustfmt --edition 2021 (shaun-agent).
- Update ADR Consequences to reflect hybrid reply/push strategy (超渡法師).
chaodu-agent pushed a commit to iamninihuang/openab that referenced this pull request Apr 28, 2026
- Switch ReplyTokenCache from tokio::sync::Mutex to std::sync::Mutex;
  critical sections are short and never held across .await (超渡法師).
- Log LINE Reply API 400 response body for easier debugging (超渡法師).
- Guard secrets.env source in run_gateway.sh with existence check (超渡法師).
- Align sweep interval with REPLY_TOKEN_TTL_SECS (50s) (超渡法師).
@github-actions github-actions Bot removed the closing-soon PR missing Discord Discussion URL — will auto-close in 3 days label Apr 28, 2026
Copy link
Copy Markdown
Collaborator

@chaodu-agent chaodu-agent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Core Dependency: reply_to is always empty

We traced the full data flow and found that OAB core's send_message in src/gateway.rs currently hardcodes reply_to: String::new():

// src/gateway.rs — GatewayAdapter::send_message
let reply = GatewayReply {
    reply_to: String::new(),   // ← always empty
    ...
};

The event_id field on GatewayEvent is even marked #[allow(dead_code)] — it is never read or propagated.

This means the reply token cache in this PR will never hit. cache.remove(&reply.reply_to) is always cache.remove(""), which always misses, so every reply falls back to Push API. The hybrid feature is effectively a no-op today.

What core needs to do

  1. Remove #[allow(dead_code)] from GatewayEvent.event_id
  2. Propagate event_id through the message handling chain (e.g. add origin_event_id: Option<String> to ChannelRef or MessageRef)
  3. Fill reply_to with the originating event_id in send_message

Suggestion

Please update the PR description to note this dependency explicitly, and open a tracking issue for the core-side change. The gateway-side cache logic looks correct and will work once core carries reply_to through — but reviewers and users should know this feature is pending core support before it becomes functional.


Review by 超渡法師 🙏

thepagent pushed a commit that referenced this pull request Apr 28, 2026
* feat(core): propagate event_id through ChannelRef to GatewayReply.reply_to

Add origin_event_id field to ChannelRef so the gateway event_id
survives the message handling pipeline and appears in
GatewayReply.reply_to. This enables the gateway to correlate
replies with inbound events (e.g. LINE reply token cache in #608).

Changes:
- Add origin_event_id: Option<String> to ChannelRef
- Remove #[allow(dead_code)] from GatewayEvent.event_id
- Fill origin_event_id from event_id in gateway event handler
- Use origin_event_id in GatewayAdapter::send_message reply_to
- Propagate through create_thread
- Set origin_event_id: None for Discord, Slack, cron (non-gateway)

* fix: exclude origin_event_id from ChannelRef Hash/Eq

Manual impl of Hash, Eq, PartialEq so that origin_event_id does not
affect channel identity. Two ChannelRefs pointing to the same channel
are equal regardless of which gateway event they originated from.

* test: add origin_event_id Hash/Eq exclusion and propagation tests

- origin_event_id_excluded_from_eq: same channel, different event IDs are equal
- origin_event_id_excluded_from_hash: HashMap treats them as same key
- origin_event_id_survives_clone: simulates create_thread propagation

* docs: document reply_to field semantics in custom-gateway ADR

Explain that reply_to carries the inbound event_id for reply
correlation (e.g. LINE reply token lookup).

---------

Co-authored-by: 超渡法師 <chaodu@openab.dev>
iamninihuang and others added 10 commits April 28, 2026 19:02
This implementation adds a stateful token cache in the gateway to prioritize the free LINE Reply API. It also includes an auto-fill logic for the reply_to field to avoid modifying the OAB core.
- Remove auto-fill reply_to mechanism that could attach replies to wrong
  LINE messages under concurrent traffic (blocker from shaun-agent,
  超渡法師, 普渡法師). If OAB sends empty reply_to, gateway now correctly
  falls back to Push API instead of guessing the latest event.
- Extract reply token from cache before making HTTP call so Mutex is not
  held across network I/O (blocker from shaun-agent, 超渡法師, 普渡法師).
- Unify sweep TTL with handler TTL (REPLY_TOKEN_TTL_SECS) to eliminate
  dead-weight cache entries in 50-60s range (普渡法師).
- Add secrets.env to .gitignore (shaun-agent, 超渡法師).
- Run rustfmt --edition 2021 (shaun-agent).
- Update ADR Consequences to reflect hybrid reply/push strategy (超渡法師).
…antics

LINE reply tokens are single-use. Document that the first OAB client to
remove() a cached token gets the free Reply API call; others fall back
to Push API. This is correct behavior but was not documented, which
could confuse maintainers (普渡法師).
The Per-Client Last Event Tracker was removed during review because it
caused incorrect reply-token association in group chats. Update the
feature request doc to match the actual implementation: gateway does not
infer reply_to, it falls back to Push API when reply_to is empty or
not found in cache (普渡法師 doc nit).
The 4-step hybrid strategy description still referenced the removed
Per-Client Last Event Tracker auto-fill mechanism. Updated to accurately
describe the actual implementation: event_id-keyed cache lookup with
empty reply_to falling back to Push API.
- Switch ReplyTokenCache from tokio::sync::Mutex to std::sync::Mutex;
  critical sections are short and never held across .await (超渡法師).
- Log LINE Reply API 400 response body for easier debugging (超渡法師).
- Guard secrets.env source in run_gateway.sh with existence check (超渡法師).
- Align sweep interval with REPLY_TOKEN_TTL_SECS (50s) (超渡法師).
- Only fallback to Push API on 4xx (token invalid/expired). For 5xx and
  network errors, the message may have already been delivered — do NOT
  fallback to avoid duplicate delivery (擺渡法師 major finding).
- Update ADR message flow step 7 to reflect hybrid Reply/Push strategy
  instead of Push-only (擺渡法師 doc nit).
Only fallback to Push API when LINE's 400 response body explicitly
indicates the reply token is invalid or expired. All other failures
(5xx, other 4xx, network/timeout errors) are treated as ambiguous
delivery — logged as error and dropped to prevent duplicate delivery
(擺渡法師 refined major finding, 普渡法師 concurred).
- Fix operator precedence bug in Reply API 400 fallback: add explicit
  parentheses so (invalid AND reply token) OR expired is evaluated correctly
- Change lock().unwrap() to lock().unwrap_or_else(|e| e.into_inner())
  at all 3 cache lock sites to recover from mutex poison instead of panicking
- Add REPLY_TOKEN_CACHE_MAX (10k) bound: skip insert and log warning
  when cache is full, gracefully degrading to Push API
@chaodu-agent chaodu-agent force-pushed the feat/line-hybrid-gateway branch from 009c1bc to f7536ca Compare April 28, 2026 19:08
超渡法師 added 5 commits April 28, 2026 19:15
Feature is now implemented and documented in docs/adr/line-adapter.md.
Keeping a feature request doc for a shipped feature is misleading.
…elog

- Status: Proposed → Accepted
- Add sequence diagram showing hybrid Reply/Push dispatch flow
- Update changelog to v0.2 with openabdev#608 and openabdev#619 references
- Add @iamninihuang as co-author
- Remove run_gateway.sh (README already documents how to run; script
  used port 9090 conflicting with README default 8080)
- Add LINE_CHANNEL_SECRET and LINE_CHANNEL_ACCESS_TOKEN to env vars table
- Add POST /webhook/line to endpoints table
@thepagent thepagent merged commit d58493d into openabdev:main Apr 28, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pending-screening PR awaiting automated screening

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(gateway): hybrid LINE Reply/Push API to save messaging quota

4 participants