Skip to content

Ingest agent runtime sessions into WordPress (read-side of the skills pattern) #41

@chubes4

Description

@chubes4

Context

wp-coding-agents already handles the write side of the runtime-agnostic adapter pattern: skills defined in WordPress are synced out to runtime-native locations (.opencode/skills/<slug>/SKILL.md, .claude/skills/<slug>/SKILL.md, etc.). Each runtime has an adapter that knows where its files belong and how they're formatted.

This issue proposes the read side of that same pattern: ingesting agent session transcripts from runtime-native storage into WordPress, so plugins can build on session data without each one reinventing runtime-specific parsers.

The gap

Every major agent runtime stores session transcripts locally in a runtime-specific format:

Runtime Typical location Format
Claude Code ~/.config/claude/projects/<hash>/*.jsonl JSONL (one message per line)
OpenCode ~/.local/share/opencode/project/<hash>/session/* Session directory per conversation
Cursor Local app storage (SQLite / JSON) Proprietary
Cline / Continue / Aider Various Various

A WP-CLI command, a pipeline that processes session history, an analytics plugin, a memory system — none of these can consume session data today without re-implementing per-runtime file discovery and parsing. That's a job wp-coding-agents is uniquely positioned to own, because it already owns the inverse (runtime-adapter → disk) for skills and agent configs.

Storage model: index in WordPress, content on disk

This is not the skills model. Skills are small (~1-5 KB), low-volume (10-50 total), and benefit from being first-class editable WordPress content. Duplicating them to disk is cheap and the use case justifies it.

Sessions are the opposite:

Property Skills Sessions
Size per unit 1-5 KB 10 KB – 10+ MB
Volume 10-50 total hundreds/month, grows forever
Churn Low High (new daily, existing append mid-run)
Write owner WordPress Runtime
Value of full copy in WP High (editable content) Low (mostly tool-call noise)

Copying raw sessions into wp_posts or a custom table would bloat the database by GBs within months, duplicate data the runtime already stores canonically, and store content that's mostly not directly useful (intermediate reasoning, tool call IO, file reads).

The correct pattern — borrowed from markdown-database-integration:

SQLite is the projection. Files are the source of truth. The database is an indexing engine that maps to content on disk.

For sessions:

  1. Runtime session files remain the source of truth. wp-coding-agents does not copy, move, or rewrite them.
  2. WordPress stores a lightweight index — one row per session with metadata only: session_id, runtime, project_path, started_at, ended_at, file_path, message_count, model, token_total, status. ~200 bytes per session. Millions of sessions would still be a small table.
  3. On-demand parse. When a consumer needs the actual messages, the adapter reads the file from disk and returns the normalized shape. No bulk content lives in the DB.
  4. Consumers persist what they derive, not everything they read. Summaries, extracted decisions, linked PRs, salient events → those go into WP (via posts, custom tables, wikis, whatever the consumer wants). Raw transcripts stay on disk.

This keeps the WordPress DB small, keeps the runtime's session files authoritative, and gives consumers the full messages when they need them via a single API.

Proposal

Add a runtime-session-ingest subsystem to wp-coding-agents that mirrors the existing skills-sync architecture, adapted for the index-vs-content distinction above.

1. Runtime adapters

Each supported runtime gets an adapter implementing:

  • Discovery — enumerate session files/directories on the host for a given project path
  • Parse — convert runtime-native format to a normalized message shape (on demand)
  • Metadata — extract session ID, start/end timestamps, project path, model, token counts, runtime-specific context (without reading full content)

Adapters live alongside the existing skills-write adapters so new runtimes add both read and write support in one place.

2. Normalized session schema (on-demand, not stored in full)

[
    'session_id'   => string,  // runtime-native, stable across reads
    'runtime'      => string,  // 'claude-code' | 'opencode' | ...
    'project_path' => string,
    'started_at'   => int,
    'ended_at'     => int|null,
    'file_path'    => string,  // absolute path to the source file on disk
    'messages'     => [
        [
            'id'        => string,
            'role'      => 'user' | 'assistant' | 'tool',
            'content'   => string|array,
            'tool_calls'=> array|null,
            'timestamp' => int,
        ],
        // ...
    ],
    'meta' => [
        'model'   => string|null,
        'tokens'  => array|null,
        'cost'    => float|null,
        // runtime-specific extras preserved here
    ],
]

3. Index schema (what does live in WP)

A single custom table, wp_coding_agents_session_index (or similar), holding only metadata — never message content:

session_id      VARCHAR  PRIMARY KEY
runtime         VARCHAR  INDEX
project_path    VARCHAR  INDEX
started_at      BIGINT   INDEX
ended_at        BIGINT
file_path       TEXT
file_mtime      BIGINT      -- for cheap change detection
message_count   INT
model           VARCHAR
token_total     INT
status          VARCHAR     -- active, completed, truncated, etc.

Change detection via file_mtime means re-indexing is cheap and idempotent — only sessions that actually changed get re-parsed.

4. Ingestion API

WP-CLI:

  • wp coding-agents sessions list [--runtime=<name>] [--project=<path>] [--since=<timestamp>]
  • wp coding-agents sessions read <session-id> — parses and returns full normalized session on demand
  • wp coding-agents sessions reindex [--runtime=<name>] — rescans filesystem, updates index only

Action hooks:

  • do_action( 'wp_coding_agents_session_indexed', $index_row ) — fires when a new session is indexed or an existing index row is updated. Lightweight — just metadata.
  • do_action( 'wp_coding_agents_session_parsed', $session ) — fires when a consumer explicitly requests a full parse. Carries the complete normalized session.

Consumers that just want to react to "a new session happened" subscribe to session_indexed. Consumers that want to process content (summarizers, analyzers) subscribe to session_parsed or trigger a parse themselves via the API.

Optional file-watch service (later): a long-running WP-CLI command or background worker that tails session files and fires session_indexed on changes.

5. What this does NOT do

  • Does not copy session content into WordPress.
  • Does not summarize, score, or process message content — that's application logic on top.
  • Does not write sessions back to runtimes — runtimes own their session storage.
  • Does not opine on what consumers persist. Consumers are free to store derived knowledge wherever they want; raw sessions just stay on disk where the runtime already put them.

Why wp-coding-agents

This project already:

  • Maintains the list of supported runtimes
  • Owns per-runtime path conventions (via the skills-write adapters)
  • Knows how to detect which runtimes are installed on a given host
  • Is the WordPress plugin any site running a local agent is already likely to install

Pushing session ingestion into a separate project would duplicate the adapter registry. Putting it in a specific consumer (memory system, analytics plugin) would force every consumer to re-implement it.

Downstream consumers this unlocks

With the index + parse-on-demand API and the two action hooks as the seam, independent plugins can:

  • Build persistent memory systems that summarize actual agent activity
  • Power agent analytics (tokens, cost, tool usage trends)
  • Enable search across historical sessions
  • Feed session transcripts into knowledge bases or wikis
  • Generate daily/weekly digests of work done via AI agents
  • Detect patterns (frequently-failing tool calls, repeated questions, etc.)

None of these need to exist in this plugin. They just need the data to be accessible.

Suggested first pass

  1. Define the normalized session schema and the index table schema.
  2. Implement one adapter end-to-end (Claude Code is well-documented and JSONL is trivially parseable).
  3. Ship the index table, WP-CLI commands, and both action hooks.
  4. Add OpenCode adapter as the second proof of the abstraction.
  5. Document the adapter contract so community contributors can add Cursor / Cline / etc.

Out of scope for this issue

  • File-watch / real-time ingestion (follow-up)
  • Any opinion on what consumers do with ingested sessions
  • Any runtime-specific features beyond adapter read support

Related

  • Claude Code: dynamic agent sync via SessionStart hook #25 (closed) was about agent sync (write side for Claude Code); this is about session read.
  • The existing skills-sync pipeline is the architectural template for the adapter layer.
  • The MDI projection pattern (SQLite indexes → markdown files on disk) is the template for the storage model.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions