feat(gateway): implement hybrid LINE reply/push strategy#608
feat(gateway): implement hybrid LINE reply/push strategy#608thepagent merged 15 commits intoopenabdev:mainfrom
Conversation
shaun-agent
left a comment
There was a problem hiding this comment.
PR #608 Screening
Verdict: changes requested / do not merge yet.
PR: #608
Reference PR checked: #492
Findings
Blocker: replies can attach to the wrong LINE message under same-channel concurrency.
gateway/src/main.rs:452 auto-fills empty reply_to from the last event seen for that channel. But normal gateway sends still emit empty reply_to in src/gateway.rs:139, while inbound trigger state only preserves message_id, not event_id, at src/gateway.rs:382.
If two LINE messages arrive in the same group/user before the first agent reply returns, the first reply can consume the second message's reply token. That is a real correctness bug. The fix should carry explicit per-message reply context through the adapter, not infer it from "latest event".
Major: cache mutex is held across the LINE Reply API network call.
gateway/src/main.rs:581 locks reply_cache, removes the entry, then awaits .send() before the lock scope ends. That serializes all token cache access behind a network request, including webhook-side inserts. Remove the cache entry into a local variable, drop the lock, then call LINE.
Security/process footgun: run_gateway.sh introduces secrets.env, but it is not ignored.
run_gateway.sh:9 sources secrets.env; .gitignore does not ignore it. Either remove this local helper from the PR, move it under docs/examples, or add secrets.env to .gitignore.
Formatting fails on the changed gateway file.
rustfmt --edition 2021 --check gateway/src/main.rs reports diffs in the new code. Small, but should be fixed before merge.
- Remove auto-fill reply_to mechanism that could attach replies to wrong LINE messages under concurrent traffic (blocker from shaun-agent, 超渡法師, 普渡法師). If OAB sends empty reply_to, gateway now correctly falls back to Push API instead of guessing the latest event. - Extract reply token from cache before making HTTP call so Mutex is not held across network I/O (blocker from shaun-agent, 超渡法師, 普渡法師). - Unify sweep TTL with handler TTL (REPLY_TOKEN_TTL_SECS) to eliminate dead-weight cache entries in 50-60s range (普渡法師). - Add secrets.env to .gitignore (shaun-agent, 超渡法師). - Run rustfmt --edition 2021 (shaun-agent). - Update ADR Consequences to reflect hybrid reply/push strategy (超渡法師).
- Switch ReplyTokenCache from tokio::sync::Mutex to std::sync::Mutex; critical sections are short and never held across .await (超渡法師). - Log LINE Reply API 400 response body for easier debugging (超渡法師). - Guard secrets.env source in run_gateway.sh with existence check (超渡法師). - Align sweep interval with REPLY_TOKEN_TTL_SECS (50s) (超渡法師).
chaodu-agent
left a comment
There was a problem hiding this comment.
⚠️ Core Dependency: reply_to is always empty
We traced the full data flow and found that OAB core's send_message in src/gateway.rs currently hardcodes reply_to: String::new():
// src/gateway.rs — GatewayAdapter::send_message
let reply = GatewayReply {
reply_to: String::new(), // ← always empty
...
};The event_id field on GatewayEvent is even marked #[allow(dead_code)] — it is never read or propagated.
This means the reply token cache in this PR will never hit. cache.remove(&reply.reply_to) is always cache.remove(""), which always misses, so every reply falls back to Push API. The hybrid feature is effectively a no-op today.
What core needs to do
- Remove
#[allow(dead_code)]fromGatewayEvent.event_id - Propagate
event_idthrough the message handling chain (e.g. addorigin_event_id: Option<String>toChannelReforMessageRef) - Fill
reply_towith the originatingevent_idinsend_message
Suggestion
Please update the PR description to note this dependency explicitly, and open a tracking issue for the core-side change. The gateway-side cache logic looks correct and will work once core carries reply_to through — but reviewers and users should know this feature is pending core support before it becomes functional.
Review by 超渡法師 🙏
* feat(core): propagate event_id through ChannelRef to GatewayReply.reply_to Add origin_event_id field to ChannelRef so the gateway event_id survives the message handling pipeline and appears in GatewayReply.reply_to. This enables the gateway to correlate replies with inbound events (e.g. LINE reply token cache in #608). Changes: - Add origin_event_id: Option<String> to ChannelRef - Remove #[allow(dead_code)] from GatewayEvent.event_id - Fill origin_event_id from event_id in gateway event handler - Use origin_event_id in GatewayAdapter::send_message reply_to - Propagate through create_thread - Set origin_event_id: None for Discord, Slack, cron (non-gateway) * fix: exclude origin_event_id from ChannelRef Hash/Eq Manual impl of Hash, Eq, PartialEq so that origin_event_id does not affect channel identity. Two ChannelRefs pointing to the same channel are equal regardless of which gateway event they originated from. * test: add origin_event_id Hash/Eq exclusion and propagation tests - origin_event_id_excluded_from_eq: same channel, different event IDs are equal - origin_event_id_excluded_from_hash: HashMap treats them as same key - origin_event_id_survives_clone: simulates create_thread propagation * docs: document reply_to field semantics in custom-gateway ADR Explain that reply_to carries the inbound event_id for reply correlation (e.g. LINE reply token lookup). --------- Co-authored-by: 超渡法師 <chaodu@openab.dev>
This implementation adds a stateful token cache in the gateway to prioritize the free LINE Reply API. It also includes an auto-fill logic for the reply_to field to avoid modifying the OAB core.
- Remove auto-fill reply_to mechanism that could attach replies to wrong LINE messages under concurrent traffic (blocker from shaun-agent, 超渡法師, 普渡法師). If OAB sends empty reply_to, gateway now correctly falls back to Push API instead of guessing the latest event. - Extract reply token from cache before making HTTP call so Mutex is not held across network I/O (blocker from shaun-agent, 超渡法師, 普渡法師). - Unify sweep TTL with handler TTL (REPLY_TOKEN_TTL_SECS) to eliminate dead-weight cache entries in 50-60s range (普渡法師). - Add secrets.env to .gitignore (shaun-agent, 超渡法師). - Run rustfmt --edition 2021 (shaun-agent). - Update ADR Consequences to reflect hybrid reply/push strategy (超渡法師).
…antics LINE reply tokens are single-use. Document that the first OAB client to remove() a cached token gets the free Reply API call; others fall back to Push API. This is correct behavior but was not documented, which could confuse maintainers (普渡法師).
The Per-Client Last Event Tracker was removed during review because it caused incorrect reply-token association in group chats. Update the feature request doc to match the actual implementation: gateway does not infer reply_to, it falls back to Push API when reply_to is empty or not found in cache (普渡法師 doc nit).
The 4-step hybrid strategy description still referenced the removed Per-Client Last Event Tracker auto-fill mechanism. Updated to accurately describe the actual implementation: event_id-keyed cache lookup with empty reply_to falling back to Push API.
- Switch ReplyTokenCache from tokio::sync::Mutex to std::sync::Mutex; critical sections are short and never held across .await (超渡法師). - Log LINE Reply API 400 response body for easier debugging (超渡法師). - Guard secrets.env source in run_gateway.sh with existence check (超渡法師). - Align sweep interval with REPLY_TOKEN_TTL_SECS (50s) (超渡法師).
- Only fallback to Push API on 4xx (token invalid/expired). For 5xx and network errors, the message may have already been delivered — do NOT fallback to avoid duplicate delivery (擺渡法師 major finding). - Update ADR message flow step 7 to reflect hybrid Reply/Push strategy instead of Push-only (擺渡法師 doc nit).
Only fallback to Push API when LINE's 400 response body explicitly indicates the reply token is invalid or expired. All other failures (5xx, other 4xx, network/timeout errors) are treated as ambiguous delivery — logged as error and dropped to prevent duplicate delivery (擺渡法師 refined major finding, 普渡法師 concurred).
- Fix operator precedence bug in Reply API 400 fallback: add explicit parentheses so (invalid AND reply token) OR expired is evaluated correctly - Change lock().unwrap() to lock().unwrap_or_else(|e| e.into_inner()) at all 3 cache lock sites to recover from mutex poison instead of panicking - Add REPLY_TOKEN_CACHE_MAX (10k) bound: skip insert and log warning when cache is full, gracefully degrading to Push API
009c1bc to
f7536ca
Compare
Feature is now implemented and documented in docs/adr/line-adapter.md. Keeping a feature request doc for a shipped feature is misleading.
…elog - Status: Proposed → Accepted - Add sequence diagram showing hybrid Reply/Push dispatch flow - Update changelog to v0.2 with openabdev#608 and openabdev#619 references - Add @iamninihuang as co-author
- Remove run_gateway.sh (README already documents how to run; script used port 9090 conflicting with README default 8080) - Add LINE_CHANNEL_SECRET and LINE_CHANNEL_ACCESS_TOKEN to env vars table - Add POST /webhook/line to endpoints table
Problem
The current LINE gateway implementation relies solely on the Push Message API. This consumes the monthly free quota (typically 200 messages) very quickly, making it unsuitable for active bots.
Solution
Implemented a stateful hybrid messaging strategy in the gateway:
replyTokenand store them in a thread-safe memory cache (TTL 50s, max 10k entries).Design
Core dependency
Requires PR #619 (merged) which propagates
event_idthroughChannelRef.origin_event_idtoGatewayReply.reply_to, enabling the gateway to correlate replies with cached reply tokens.Review fixes applied
lock().unwrap()tolock().unwrap_or_else(|e| e.into_inner())for mutex poison safetydocs/feature-requests/line-reply-api.md(feature is shipped, ADR covers it)Closes #607
https://discord.com/channels/1491295327620169908/1491365150664560881/1498518933198737429