feat(gateway): implement hybrid LINE reply/push strategy by iamninihuang · Pull Request #608 · openabdev/openab

iamninihuang · 2026-04-28T02:47:36Z

Problem

The current LINE gateway implementation relies solely on the Push Message API. This consumes the monthly free quota (typically 200 messages) very quickly, making it unsuitable for active bots.

Solution

Implemented a stateful hybrid messaging strategy in the gateway:

Reply Token Caching: Webhook events capture replyToken and store them in a thread-safe memory cache (TTL 50s, max 10k entries).
Hybrid Dispatch: Prioritizes the free Reply API; falls back to Push API if the token is missing or expired.
Duplicate Safety: Non-token-specific errors (5xx, network) do NOT fall back to Push to avoid duplicate delivery.
Cache Sweep: Background task cleans expired entries every 50s.

Design

LINE User sends message
        │
        ▼
┌─────────────────────────────────────────────────────┐
│  Gateway: POST /webhook/line                        │
│                                                     │
│  1. Verify HMAC signature                           │
│  2. Generate event_id = "evt_<uuid>"                │
│  3. Cache: event_id → (replyToken, timestamp)       │
│  4. Send GatewayEvent { event_id } to OAB via WS    │
└──────────────────────┬──────────────────────────────┘
                       │ WebSocket
                       ▼
┌─────────────────────────────────────────────────────┐
│  OAB Core                                           │
│                                                     │
│  1. Receive GatewayEvent                            │
│  2. Store event_id in ChannelRef.origin_event_id    │
│     (#619)                                          │
│  3. Process message → agent generates response      │
│  4. send_message() builds GatewayReply:             │
│     reply_to = origin_event_id = "evt_<uuid>"       │
└──────────────────────┬──────────────────────────────┘
                       │ WebSocket
                       ▼
┌─────────────────────────────────────────────────────┐
│  Gateway: Hybrid Dispatch                           │
│                                                     │
│  reply.reply_to = "evt_<uuid>"                      │
│       │                                             │
│       ▼                                             │
│  cache.remove("evt_<uuid>")                         │
│       │                                             │
│       ├── HIT + fresh (< 50s)                       │
│       │     └─→ LINE Reply API (FREE) ✅             │
│       │           │                                 │
│       │           ├── 200 OK → done                 │
│       │           ├── 400 token unusable → Push ↓   │
│       │           └── 5xx/network → do NOT Push ⚠️  │
│       │                                             │
│       ├── HIT + expired (> 50s)                     │
│       │     └─→ Push API (quota) ↓                  │
│       │                                             │
│       └── MISS (no token cached)                    │
│             └─→ Push API (quota) ↓                  │
│                                                     │
│  Push API: POST v2/bot/message/push                 │
│  (fallback, consumes monthly quota)                 │
└─────────────────────────────────────────────────────┘

Core dependency

Requires PR #619 (merged) which propagates event_id through ChannelRef.origin_event_id to GatewayReply.reply_to, enabling the gateway to correlate replies with cached reply tokens.

Review fixes applied

Fixed operator precedence bug in Reply API 400 fallback logic
Changed lock().unwrap() to lock().unwrap_or_else(|e| e.into_inner()) for mutex poison safety
Added cache size bound (10k max) to prevent unbounded memory growth
Removed docs/feature-requests/line-reply-api.md (feature is shipped, ADR covers it)

Closes #607

https://discord.com/channels/1491295327620169908/1491365150664560881/1498518933198737429

shaun-agent

PR #608 Screening

Verdict: changes requested / do not merge yet.

PR: #608
Reference PR checked: #492

Findings

Blocker: replies can attach to the wrong LINE message under same-channel concurrency.

gateway/src/main.rs:452 auto-fills empty reply_to from the last event seen for that channel. But normal gateway sends still emit empty reply_to in src/gateway.rs:139, while inbound trigger state only preserves message_id, not event_id, at src/gateway.rs:382.

If two LINE messages arrive in the same group/user before the first agent reply returns, the first reply can consume the second message's reply token. That is a real correctness bug. The fix should carry explicit per-message reply context through the adapter, not infer it from "latest event".

Major: cache mutex is held across the LINE Reply API network call.

gateway/src/main.rs:581 locks reply_cache, removes the entry, then awaits .send() before the lock scope ends. That serializes all token cache access behind a network request, including webhook-side inserts. Remove the cache entry into a local variable, drop the lock, then call LINE.

Security/process footgun: run_gateway.sh introduces secrets.env, but it is not ignored.

run_gateway.sh:9 sources secrets.env; .gitignore does not ignore it. Either remove this local helper from the PR, move it under docs/examples, or add secrets.env to .gitignore.

Formatting fails on the changed gateway file.

rustfmt --edition 2021 --check gateway/src/main.rs reports diffs in the new code. Small, but should be fixed before merge.

- Remove auto-fill reply_to mechanism that could attach replies to wrong LINE messages under concurrent traffic (blocker from shaun-agent, 超渡法師, 普渡法師). If OAB sends empty reply_to, gateway now correctly falls back to Push API instead of guessing the latest event. - Extract reply token from cache before making HTTP call so Mutex is not held across network I/O (blocker from shaun-agent, 超渡法師, 普渡法師). - Unify sweep TTL with handler TTL (REPLY_TOKEN_TTL_SECS) to eliminate dead-weight cache entries in 50-60s range (普渡法師). - Add secrets.env to .gitignore (shaun-agent, 超渡法師). - Run rustfmt --edition 2021 (shaun-agent). - Update ADR Consequences to reflect hybrid reply/push strategy (超渡法師).

- Switch ReplyTokenCache from tokio::sync::Mutex to std::sync::Mutex; critical sections are short and never held across .await (超渡法師). - Log LINE Reply API 400 response body for easier debugging (超渡法師). - Guard secrets.env source in run_gateway.sh with existence check (超渡法師). - Align sweep interval with REPLY_TOKEN_TTL_SECS (50s) (超渡法師).

chaodu-agent

⚠️ Core Dependency: `reply_to` is always empty

We traced the full data flow and found that OAB core's send_message in src/gateway.rs currently hardcodes reply_to: String::new():

// src/gateway.rs — GatewayAdapter::send_message
let reply = GatewayReply {
    reply_to: String::new(),   // ← always empty
    ...
};

The event_id field on GatewayEvent is even marked #[allow(dead_code)] — it is never read or propagated.

This means the reply token cache in this PR will never hit. cache.remove(&reply.reply_to) is always cache.remove(""), which always misses, so every reply falls back to Push API. The hybrid feature is effectively a no-op today.

What core needs to do

Remove #[allow(dead_code)] from GatewayEvent.event_id
Propagate event_id through the message handling chain (e.g. add origin_event_id: Option<String> to ChannelRef or MessageRef)
Fill reply_to with the originating event_id in send_message

Suggestion

Please update the PR description to note this dependency explicitly, and open a tracking issue for the core-side change. The gateway-side cache logic looks correct and will work once core carries reply_to through — but reviewers and users should know this feature is pending core support before it becomes functional.

Review by 超渡法師 🙏

* feat(core): propagate event_id through ChannelRef to GatewayReply.reply_to Add origin_event_id field to ChannelRef so the gateway event_id survives the message handling pipeline and appears in GatewayReply.reply_to. This enables the gateway to correlate replies with inbound events (e.g. LINE reply token cache in #608). Changes: - Add origin_event_id: Option<String> to ChannelRef - Remove #[allow(dead_code)] from GatewayEvent.event_id - Fill origin_event_id from event_id in gateway event handler - Use origin_event_id in GatewayAdapter::send_message reply_to - Propagate through create_thread - Set origin_event_id: None for Discord, Slack, cron (non-gateway) * fix: exclude origin_event_id from ChannelRef Hash/Eq Manual impl of Hash, Eq, PartialEq so that origin_event_id does not affect channel identity. Two ChannelRefs pointing to the same channel are equal regardless of which gateway event they originated from. * test: add origin_event_id Hash/Eq exclusion and propagation tests - origin_event_id_excluded_from_eq: same channel, different event IDs are equal - origin_event_id_excluded_from_hash: HashMap treats them as same key - origin_event_id_survives_clone: simulates create_thread propagation * docs: document reply_to field semantics in custom-gateway ADR Explain that reply_to carries the inbound event_id for reply correlation (e.g. LINE reply token lookup). --------- Co-authored-by: 超渡法師 <chaodu@openab.dev>

This implementation adds a stateful token cache in the gateway to prioritize the free LINE Reply API. It also includes an auto-fill logic for the reply_to field to avoid modifying the OAB core.

…race conditions

- Remove auto-fill reply_to mechanism that could attach replies to wrong LINE messages under concurrent traffic (blocker from shaun-agent, 超渡法師, 普渡法師). If OAB sends empty reply_to, gateway now correctly falls back to Push API instead of guessing the latest event. - Extract reply token from cache before making HTTP call so Mutex is not held across network I/O (blocker from shaun-agent, 超渡法師, 普渡法師). - Unify sweep TTL with handler TTL (REPLY_TOKEN_TTL_SECS) to eliminate dead-weight cache entries in 50-60s range (普渡法師). - Add secrets.env to .gitignore (shaun-agent, 超渡法師). - Run rustfmt --edition 2021 (shaun-agent). - Update ADR Consequences to reflect hybrid reply/push strategy (超渡法師).

…antics LINE reply tokens are single-use. Document that the first OAB client to remove() a cached token gets the free Reply API call; others fall back to Push API. This is correct behavior but was not documented, which could confuse maintainers (普渡法師).

The Per-Client Last Event Tracker was removed during review because it caused incorrect reply-token association in group chats. Update the feature request doc to match the actual implementation: gateway does not infer reply_to, it falls back to Push API when reply_to is empty or not found in cache (普渡法師 doc nit).

The 4-step hybrid strategy description still referenced the removed Per-Client Last Event Tracker auto-fill mechanism. Updated to accurately describe the actual implementation: event_id-keyed cache lookup with empty reply_to falling back to Push API.

- Switch ReplyTokenCache from tokio::sync::Mutex to std::sync::Mutex; critical sections are short and never held across .await (超渡法師). - Log LINE Reply API 400 response body for easier debugging (超渡法師). - Guard secrets.env source in run_gateway.sh with existence check (超渡法師). - Align sweep interval with REPLY_TOKEN_TTL_SECS (50s) (超渡法師).

- Only fallback to Push API on 4xx (token invalid/expired). For 5xx and network errors, the message may have already been delivered — do NOT fallback to avoid duplicate delivery (擺渡法師 major finding). - Update ADR message flow step 7 to reflect hybrid Reply/Push strategy instead of Push-only (擺渡法師 doc nit).

Only fallback to Push API when LINE's 400 response body explicitly indicates the reply token is invalid or expired. All other failures (5xx, other 4xx, network/timeout errors) are treated as ambiguous delivery — logged as error and dropped to prevent duplicate delivery (擺渡法師 refined major finding, 普渡法師 concurred).

- Fix operator precedence bug in Reply API 400 fallback: add explicit parentheses so (invalid AND reply token) OR expired is evaluated correctly - Change lock().unwrap() to lock().unwrap_or_else(|e| e.into_inner()) at all 3 cache lock sites to recover from mutex poison instead of panicking - Add REPLY_TOKEN_CACHE_MAX (10k) bound: skip insert and log warning when cache is full, gracefully degrading to Push API

Feature is now implemented and documented in docs/adr/line-adapter.md. Keeping a feature request doc for a shipped feature is misleading.

@iamninihuang

…elog - Status: Proposed → Accepted - Add sequence diagram showing hybrid Reply/Push dispatch flow - Update changelog to v0.2 with openabdev#608 and openabdev#619 references - Add @iamninihuang as co-author

- Remove run_gateway.sh (README already documents how to run; script used port 9090 conflicting with README default 8080) - Add LINE_CHANNEL_SECRET and LINE_CHANNEL_ACCESS_TOKEN to env vars table - Add POST /webhook/line to endpoints table

iamninihuang requested a review from thepagent as a code owner April 28, 2026 02:47

github-actions Bot added closing-soon PR missing Discord Discussion URL — will auto-close in 3 days pending-screening PR awaiting automated screening labels Apr 28, 2026

shaun-agent requested changes Apr 28, 2026

View reviewed changes

github-actions Bot removed the closing-soon PR missing Discord Discussion URL — will auto-close in 3 days label Apr 28, 2026

chaodu-agent reviewed Apr 28, 2026

View reviewed changes

chaodu-agent mentioned this pull request Apr 28, 2026

feat(core): propagate event_id to GatewayReply.reply_to #619

Merged

iamninihuang and others added 10 commits April 28, 2026 19:02

feat(gateway): implement hybrid LINE reply/push strategy

b6f3e47

This implementation adds a stateful token cache in the gateway to prioritize the free LINE Reply API. It also includes an auto-fill logic for the reply_to field to avoid modifying the OAB core.

fix(gateway): track last_event_id per channel to prevent cross-group …

50a3b94

…race conditions

chaodu-agent force-pushed the feat/line-hybrid-gateway branch from 009c1bc to f7536ca Compare April 28, 2026 19:08

超渡法師 added 5 commits April 28, 2026 19:15

docs: remove feature-requests/line-reply-api.md

3da68d7

Feature is now implemented and documented in docs/adr/line-adapter.md. Keeping a feature request doc for a shipped feature is misleading.

revert: remove .gitignore change (secrets.env not needed)

970018d

docs(adr): update line-adapter with hybrid dispatch diagram and chang…

df2693d

…elog - Status: Proposed → Accepted - Add sequence diagram showing hybrid Reply/Push dispatch flow - Update changelog to v0.2 with openabdev#608 and openabdev#619 references - Add @iamninihuang as co-author

docs(adr): restore original Date, add Last Updated field

22f8826

This was referenced Apr 28, 2026

test(gateway): add coverage for hybrid LINE reply/push dispatch #620

Closed

test(gateway): add unit tests for LINE hybrid reply/push logic #621

Open

thepagent approved these changes Apr 28, 2026

View reviewed changes

thepagent merged commit d58493d into openabdev:main Apr 28, 2026
2 checks passed

This was referenced Apr 28, 2026

test(gateway): add coverage for hybrid LINE reply/push dispatch #622

Closed

test(gateway): add coverage for hybrid LINE reply/push dispatch #623

Merged

test(gateway): add wiring integration test for LINE hybrid dispatch #625

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(gateway): implement hybrid LINE reply/push strategy#608

feat(gateway): implement hybrid LINE reply/push strategy#608
thepagent merged 15 commits intoopenabdev:mainfrom
iamninihuang:feat/line-hybrid-gateway

iamninihuang commented Apr 28, 2026 •

edited by chaodu-agent

Loading

Uh oh!

shaun-agent left a comment

Uh oh!

chaodu-agent left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

iamninihuang commented Apr 28, 2026 • edited by chaodu-agent Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Design

Core dependency

Review fixes applied

Uh oh!

shaun-agent left a comment

Choose a reason for hiding this comment

Findings

Uh oh!

chaodu-agent left a comment

Choose a reason for hiding this comment

⚠️ Core Dependency: reply_to is always empty

What core needs to do

Suggestion

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

iamninihuang commented Apr 28, 2026 •

edited by chaodu-agent

Loading

⚠️ Core Dependency: `reply_to` is always empty