fix: Security hardening, reliable daemon stop, and dashboard UX overhaul#18
Conversation
…ze icacls principal The lockfile carries the plaintext bearer token but was created with default permissions (world-readable on Unix). acquire() now opens with mode 0o600, replace() stages via safeAtomicWriteFileSync at 0o600, and both apply the same Windows ACL restriction daemon.token already used. The LSP port_lsp writer stages its temp file at 0o600 so its atomic rename does not undo the hardening. applyPrivatePermissions is exported from token.js and now sanitizes USERNAME/USERDOMAIN before building the icacls grant principal. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
AIRTABLE_EMAIL/PASSWORD/OTP_SECRET were placed in the login-runner child environment, visible via /proc/<pid>/environ on Linux and captured in core dumps. The runner now requests credentials over the fork() IPC channel (request-credentials -> credentials handshake); env remains a documented fallback for standalone use. The VS Code MCP stdio definition (registration.ts) still uses env because VS Code owns that spawn and env is the only available channel there. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
vsce and ovsx run with VSCE_PAT/OVSX_PAT/NODE_AUTH_TOKEN in scope, so the release workflow must not auto-pull whatever latest is on npm: pin @vscode/vsce@3.9.2 + ovsx@1.0.1 and invoke via npx --no-install. Marketplace/npm version queries are external input interpolated into node -e version-bump scripts — accept strict semver only, fall back to 0.0.0 with a workflow warning otherwise. Quote OVSX_PAT. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ep copy cpSync with dereference:true follows every symlink in the copied packages and writes the target into the VSIX — a trojanized dependency shipping a symlink to ~/.ssh or CI credentials would have the target silently included in the published artifact. Walk each package tree before copying (cycle-safe, descending through directory-symlink targets) and fail the build if any symlink resolves outside the workspace node_modules tree, where all legitimate pnpm links live. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
An unrecognized activeProfile in tools-config.json (hand-edited or corrupted) silently fell back to the full profile, enabling all 66 tools including destructive ones. It now falls back to the read-only set and logs a warning to stderr. tools-config.json is also written with mode 0o600. Adds a regression test. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…atus Bearer token comparison used ===, which short-circuits on the first differing byte and is timeable from the tunnel-exposed surface; use crypto.timingSafeEqual. 'daemon status' printed the full bearer token into shell history/scrollback — redact it; the token stays readable from ~/.airtable-user-mcp/daemon.token when genuinely needed. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Symlink targets are compared post-realpathSync; an unresolved allowlist entry would reject legitimate pnpm links when the workspace itself sits behind a symlink (e.g. a git worktree). Found by adversarial diff review. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…stale locks, prevent auto-resurrect Root causes of 'stop button sometimes does not work': 1. DaemonManager.stopDaemon() was fire-and-forget: it ignored the HTTP response status (a 401 from a stale lockfile token looked like success), swallowed connection errors (dead daemon with a leftover lock), and returned before the daemon released its lockfile — so the immediate pushState() re-read the still-present lock and the UI kept showing 'running'. It now mirrors the CLI launcher semantics: verify the shutdown response, wait for lock release, escalate SIGTERM → SIGKILL when the daemon answers but will not exit, and reclaim stale lockfiles (without killing when the port is unreachable — the pid may have been recycled onto an innocent process). 2. With useDaemon + loginMode=auto, provideMcpServerDefinitions → getCredentials() → ensureDaemon() respawned the daemon right after a stop (VS Code re-queries definitions when the daemon connection drops). A user-stopped latch now blocks implicit respawns; explicit start/restart clears it. The implicit ensure is also best-effort so daemon startup failures no longer block login. 3. Stop failures are now surfaced in the UI and the stopDaemon command instead of silently reporting success. Lockfile port is also range-validated (audit finding). 6 new tests cover graceful stop, 401 escalation, stale-lock reclaim, PID-reuse safety, and the resurrect latch. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… a11y, content 26 confirmed findings from a multi-agent UI/UX audit of the dashboard: - store.ts: all async actions track pending state (refresh, selectCustomBrowser, setBrowserChoice, copyAirtablePat, openStoragePath were missing it); pending actions auto-expire (60s default, longer for login/browser downloads) so a lost action:result can no longer leave every button disabled forever. - DashboardProvider: pushState() serialized + coalesced (file-watch bursts could post a stale state:update last); re-sync on onDidChangeVisibility; daemon:start posts failure when the daemon manager is unavailable; toolProfile fallback includes all 13 categories (recordRead/recordWrite were missing). - Theming: undefined --accent-green replaced with --fg-ok; LSP badges and error/warn banners use theme tokens; new --border-error, --border-warn, --bg-lsp-* tokens. - A11y: global :focus-visible outlines for buttons/inputs/selects; --fg-muted contrast raised 2.86:1 -> ~4.8:1 (WCAG AA); login toggle gets role=switch + aria-label; daemon buttons get aria-busy; disabled buttons get not-allowed cursor. - Content: friendlyError.ts maps raw errno/HTTP errors to readable guidance (raw kept in tooltip); TOTP/bearer jargon explained; daemon tunnel URL row made responsive with Copy button; IDE detection shows animated skeleton; dropped webview messages are logged. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…her retry; aria-busy alignment Adversarial review of the previous commits confirmed three issues: - The daemon Start/Restart/Stop buttons disabled on the GLOBAL pendingActions set, so an unrelated slow action (e.g. the custom-browser file dialog, open for up to 10 minutes) locked the user out of stopping the daemon — made worse by the new tracking of lightweight actions. The store now keeps a pendingDaemonActions set; daemon controls disable on that plus daemon.starting only, with stop/restart expiry sized to the new stopDaemon escalation windows (30s/45s). - The lockfile watcher silently failed when ~/.airtable-user-mcp did not exist at activation and was only re-attempted on explicit restart; pushState() now retries attaching it. - Stop button aria-busy now matches its disabled condition (was inconsistent with Start/Restart). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Selecting cf-named and pressing Enable always dead-ended with 'Named tunnel requires a hostname (domain)' because the Setup tab rendered no hostname input for that provider — only ngrok had fields — and handleEnableTunnel sent the ngrok domain value for every provider. - Render a Hostname input when cf-named is selected (helper text: Cloudflare-managed domain, one-time browser login, leave empty to reuse an existing tunnel config). - Send the per-provider domain value from handleEnableTunnel. - ngrok's Reserved Domain field was also unreachable once an authtoken was stored (it lived inside the !ngrokAuthtokenSet block) — it now always renders for ngrok. - Host fallback: _ensureCfNamedSetup prompts via showInputBox when the hostname is still missing (flow reachable from commands that never showed the field), and the post-setup enable retry sends the RESOLVED hostname instead of the original empty originalMsg.domain. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… feedback, form state, cosmetics Second-pass audit targeting the cf-named-hostname bug class (UI flows missing the affordances their handlers require) confirmed 12 findings: - ngrok authtoken was unreachable after first save: the input was gated on !ngrokAuthtokenSet and setNgrokAuthtoken existed in the store but was never wired to any UI. Stored state now shows an 'Auth token stored' chip with a Replace… flow (input + Save/Cancel) so the token can be rotated without disabling the tunnel. - PromptEditor Save/Reset/Delete gave no in-flight feedback and were clickable repeatedly; they now disable with aria-busy while any action is pending. - Copy Token now tracks pending state and disables; Rotate Token is daemon-scoped (beginDaemonAction) and disables with daemon controls; Copy PAT disables while busy. - Credentials form: email cleared on save alongside password/OTP, and Cancel resets all fields — no stale pre-filled email on reopen. - Cosmetics: input padding unified (6px 10px), helper text color standardized to --fg-subtle, daemon button gap 6->8, chip casing normalized (Ready/Missing/Via extension), uppercase label styles extracted to .uppercase-label, ngrok section spacing made state-independent. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…it for confirmation Closes the two scope cuts from the dead-end sweep: - TunnelState gains optional namedTunnelHostname. The extension reads it from cloudflared-named.yml (same fixed format the daemon's writeTunnelConfig emits, including quoted-value handling). The Setup tab's cf-named block now shows a 'Configured' chip with the hostname, relabels the input 'New Hostname (optional)', and the helper text says exactly which tunnel an empty field reuses. - PromptEditor no longer navigates back optimistically. savePrompt / deletePrompt / resetPrompt return their action id, markActionDone records a consumable result (bounded map), and the editor leaves only on confirmed success — failures keep the editor open with the user's input intact and show an inline alert. Save shows 'Saving…' while in flight. 3 new store tests cover the id/result round-trip. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 40eca03b13
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const domain = | ||
| selectedProvider === 'cf-named' ? (namedHostnameInput.trim() || undefined) | ||
| : selectedProvider === 'ngrok' ? (ngrokDomainInput.trim() || undefined) |
There was a problem hiding this comment.
Route hostname changes through reconfiguration
When a Cloudflare named tunnel is already configured, entering a value in the new “New Hostname” field only passes domain to enable-tunnel; the setup flow that calls named-create is skipped because isSetupComplete() is already ready, and cf-named.start() rewrites the YAML with the existing hostname from disk rather than this domain. In that scenario the UI presents this as reconfiguration, but enabling the tunnel silently keeps serving the old hostname, so users cannot change a configured named tunnel from this flow.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Confirmed and fixed in 4bfb9d7. named-create now reconfigures in place when the requested hostname differs from the existing config — routes DNS to the existing tunnel uuid (new routeTunnelDns helper) and rewrites the managed YAML; same-hostname requests stay idempotent. The extension's tunnel:enable handler runs named-create as a pre-step whenever a cf-named hostname is supplied, and aborts on real failures (e.g. DNS route rejected) instead of starting the tunnel on the previous hostname; setup-incomplete errors still fall through to the full setup wizard.
|
|
||
| # Query VS Code Marketplace for the latest published version | ||
| PUBLISHED=$(npx vsce show Nskha.airtable-formula --json 2>/dev/null \ | ||
| PUBLISHED=$(npx --no-install vsce show Nskha.airtable-formula --json 2>/dev/null \ |
There was a problem hiding this comment.
Use an exec command that really disables installs
I checked the npm/npx CLI semantics (npx --help / npm exec --help) and the options list only --package, --call, and workspace flags; --no-install is not a supported install-prevention flag for this runner, and a quick npx --no-install <missing-bin> still attempted a registry fetch. Since this release job runs these npx --no-install commands with publishing tokens in scope, a missing local binary can still fall back to downloading a package instead of failing closed; use a local-only invocation such as pnpm exec vsce/pnpm exec ovsx or another verified offline mechanism.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Confirmed and fixed in 4bfb9d7. Reproduced locally on npm 11: npx --no-install <missing-bin> still hit the registry (the flag isn't an install-prevention option on npm 7+). Replaced all four call sites with pnpm exec vsce/pnpm exec ovsx — verified pnpm exec resolves the workspace-root pinned binaries from package subdirs (the cd packages/extension step included) and fails closed when the binary is absent.
… exec in release Addresses both Codex review findings on PR #18 (verified before fixing): 1. Entering a new hostname for an already-configured named tunnel was silently ignored: named-create short-circuited on the existing config and cf-named.start() reads the YAML from disk, so the old hostname kept serving. named-create now reconfigures in place when the requested hostname differs — routes DNS to the existing tunnel uuid (new exported routeTunnelDns helper) and rewrites the managed YAML; same-hostname requests stay idempotent. The extension's tunnel:enable handler runs named-create as a pre-step whenever a cf-named hostname is supplied, aborting on real failures (e.g. DNS route rejected) instead of starting the tunnel on the previous hostname; setup- incomplete errors still fall through to the full setup wizard. 2. npx --no-install is not an install-prevention flag on npm 7+ — verified locally on npm 11 that a missing binary still hits the registry, which defeats the supply-chain pin in steps that hold publish tokens. Replaced with pnpm exec (verified to resolve the workspace-root pinned binaries from package subdirs and to fail closed when absent). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4bfb9d747a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (outcome !== 'unreachable' && pidAlive && typeof pid === 'number') { | ||
| this._killPid(pid); |
There was a problem hiding this comment.
Treat rejected shutdowns as stale unless daemon identity is proven
When a stale daemon.lock points to a port that has since been reused by any HTTP service (or by a new daemon with a different token), the shutdown request gets a non-2xx response, outcome becomes rejected, and this branch sends SIGTERM/SIGKILL to the PID recorded in the stale lock. A rejected HTTP response only proves something answered on the port; it does not prove the recorded PID owns that port, so PID reuse can still terminate an unrelated process. Only escalate after an authenticated daemon response, or otherwise reclaim the lock without killing like the unreachable path does.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Confirmed and fixed in 9624aab. The 'port answered ⇒ pid is the daemon' inference was wrong. Kill escalation now requires proven identity: an accepted bearer-authenticated shutdown, or /daemon/health (with the lockfile's token) echoing the lockfile's uuid. Unproven rejections fall through to reclaim-without-kill, same as the unreachable path. Tests updated: a 401-shutdown server whose health endpoint echoes the lock uuid still escalates; an unrelated service answering 404 with a live recorded pid is never killed.
…entity before kill escalation Two review findings on PR #18, both verified real: 1. cloudflared-named-setup.js carried its own pre-existing applyPrivatePermissions copy that did NOT sanitize USERNAME/USERDOMAIN before building the icacls principal — the gap the PR's token.js fix was meant to close, and reachable via writeTunnelConfig on every named tunnel create/reconfigure. Deleted the local copy; the file now imports the shared sanitized helper from ../token.js. 2. stopDaemon's 'rejected' branch killed the lockfile pid on ANY non-2xx shutdown response. A non-2xx only proves something answered on the port — a stale lock whose port was reused by an unrelated HTTP service also rejects, while the recorded pid may belong to an innocent recycled process. Kill escalation now requires proven identity: an accepted (bearer-authenticated) shutdown, or /daemon/health echoing the lockfile's uuid. Unproven answers fall through to reclaim-without-kill like the unreachable path. Tests updated: proven-identity 401 still escalates; unproven rejection must not kill. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9624aabe5b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const provenOurDaemon = outcome === 'accepted' || ( | ||
| outcome === 'rejected' | ||
| && status.port != null && status.bearerToken != null | ||
| && await this._verifyDaemonIdentity(status.port, status.bearerToken, status.uuid) | ||
| ); |
There was a problem hiding this comment.
Verify daemon identity before accepted-response kills
When a stale daemon.lock points at a port that now returns any 2xx response for POST /daemon/shutdown, this treats outcome === 'accepted' as proof of daemon identity and proceeds to kill the PID recorded in the stale lock. That still has the PID-reuse hazard the surrounding code is trying to avoid: the 2xx response may come from a different local HTTP service (or anything that ignores the bearer header), while status.pid may now be an unrelated live process. Fresh evidence beyond the earlier rejected-response case is that the accepted branch bypasses the UUID /daemon/health check entirely; require the same UUID proof before escalating to process.kill.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Confirmed and fixed in a1a56cf. The accepted/rejected distinction no longer grants kill permission: EVERY escalation now requires /daemon/health (authenticated with the lockfile's bearer) to echo the lockfile's uuid. In the legitimate accepted path this costs nothing — a healthy daemon releases the lock during the wait, and a wedged one still answers health. Tests added: a catch-all impostor that 200s the shutdown POST is never killed; a wedged-but-uuid-proven daemon still escalates.
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
…n aborts on failed stop Round-3 review findings on PR #18, both verified real: 1. The accepted-shutdown branch treated any 2xx as identity proof — a catch-all local service on a stale lock's reused port can 200 a POST /daemon/shutdown while ignoring the bearer header, and after the lock-release wait expired the recorded (possibly recycled) pid was killed. The accepted/rejected distinction no longer matters for escalation: EVERY kill now requires /daemon/health (authenticated with the lockfile's bearer) to echo the lockfile's uuid. Costs nothing in the legit path: a healthy daemon dies during the wait, a wedged one still answers health. Graceful wait is test-tunable. 2. restartDaemon ignored stopDaemon's StopResult: when the old daemon survived SIGKILL (stopped:false leaves the lock in place), ensureDaemon would reconnect to that same wedged process and report a successful restart. It now throws with the stop reason; the restartDaemon command surfaces the error instead of claiming success. 3 new tests: impostor-2xx must not kill; wedged-but-proven daemon still escalates; restart aborts when stop fails. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
…lls CI warn The dry run surfaced a CI WARN (bin link ENOENT) whose investigation uncovered that airtable-user-lsp has been runtime-dead under real Node: 1. dist imported the extensionless 'vscode-languageserver/node' subpath. That package ships no exports map, so Node's ESM resolver rejects it (ERR_MODULE_NOT_FOUND) — while tsup/vitest resolve it bundler-style, which is why build and tests never caught it. Every editor config (npx -y airtable-user-lsp --stdio) and the daemon's --tcp spawn hit this at startup. Fixed with explicit /node.js deep imports. 2. With imports fixed, the next masked failure appeared: 'Dynamic require of "util" is not supported' from CJS bundled into the ESM output. Fixed with a createRequire banner in tsup.config.ts — the same pattern scripts/bundle-mcp.mjs already uses. 3. The bin map pointed at dist/index.mjs, which does not exist at install time on fresh checkouts — pnpm logged 'WARN Failed to create bin … ENOENT' on every CI install. The bin now points at a committed shim (bin/airtable-user-lsp.mjs) that defers to the built entry and prints an actionable error if dist/ is missing. bin/ added to the npm files allowlist. Verified: --stdio and --tcp both launch and stay alive via the real bin; lsp-server 21 tests; full monorepo suite green; fresh pnpm 9 install produces zero failed-bin warnings. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Summary
Three workstreams from a full-codebase hardening effort, all adversarially verified by multi-agent audits before and after implementation.
🔒 Security — critical tier from the 2026-06-12 audit (71 raised → 49 confirmed)
daemon.lockcreated0600+ Windows ACL — it carries the plaintext bearer token and was world-readable; the LSPport_lspwriter stages at0600too so its atomic rename can't undo the hardening; icacls principal is sanitizedfork()IPC channel (request-credentials→credentialshandshake) instead ofAIRTABLE_EMAIL/AIRTABLE_PASSWORDenv vars visible in/proc/<pid>/environ; env remains a documented standalone fallbacktools-config.jsonused to silently enable all 66 tools including destructive ones@vscode/vsce@3.9.2/ovsx@1.0.1pins,npx --no-install, strict-semver validation of Marketplace/npm replies before interpolation into version-bump scriptsnode_modules(a trojanized dep could otherwise smuggle~/.sshetc. into the published artifact)crypto.timingSafeEqualbearer comparison;daemon statusredacts the token🛑 Daemon Stop reliability (root-caused "stop sometimes doesn't work")
stopDaemon()now mirrors the CLI launcher semantics: verifies the shutdown HTTP response (a 401 from a stale lock token no longer counts as success), waits for lockfile release, escalates SIGTERM → SIGKILL when the daemon answers but won't exit, reclaims stale locks — and never kills a PID whose port doesn't answer (PID-reuse safety)ensureDaemon()paths (MCP definition provider / credential handoff) from resurrecting a daemon the user just stopped🎨 Dashboard UI/UX (three audit rounds, 41 confirmed findings fixed)
TunnelState.namedTunnelHostnamepushStateserialized + coalesced; re-sync on sidebar re-open; lockfile watcher self-heals on cold start--fg-mutedraised to WCAG AA contrast,role="switch"/aria-busywhere missingTest plan
check:tool-sync✓login-runner.jsFull audit report (49 confirmed findings + 22 rejected false positives + deferred backlog) lives locally in
.planning/audits/2026-06-12-hardening-audit.md(gitignored).🤖 Generated with Claude Code