From 57119e5e331afa4f6dc8d56ae49b276f8570e82c Mon Sep 17 00:00:00 2001 From: Adrian Con Date: Fri, 9 Jan 2026 18:54:53 -0600 Subject: [PATCH 01/42] Enhance README for Orchestrator module with detailed scope, definitions, interface and Input/Outputs. --- modules/orchestrator/README.md | 105 ++++++++++++++++++++++++++++++++- 1 file changed, 103 insertions(+), 2 deletions(-) diff --git a/modules/orchestrator/README.md b/modules/orchestrator/README.md index 601f6de..1eb560b 100644 --- a/modules/orchestrator/README.md +++ b/modules/orchestrator/README.md @@ -1,4 +1,105 @@ # Orchestrator -Owner: Adrián Con -Goal: /plan_and_execute + reproducible runs + traces/logging. +- **Owner:** Adrián Con García +- **Module Path:** `modules/orchestrator` +- **Goal:** Provide a central orchestrator service that calls OpenPolicyStack microservices and produces reproducible “runs” with basic logs/traces. +--- + +## 1) Scope (What This Module Does) + +The **Orchestrator** coordinates end-to-end workflows across OpenPolicyStack modules. + +It is responsible for: +- Receiving a **Scenario Request** (what to run + which modules + parameters). +- Creating a **Reproducible Run Record** (e.g: Inputs, timestamps, versions, outputs). +- Executing a workflow via a single entrypoint: **`/plan_and_execute`**. +- Calling module microservices (HTTP) and collecting their outputs/artifacts. +- Writing basic **Logs** and optional **Trace Data** for transparency/debugging. +- Returning a structured response that can be plugged into a demo portal/workflow. + +**Non-Goals (MVP / Scope NOW)** +- Not an API Gateway or Portal (handled by the **Integration UI / API Gateway** effort). +- Not advanced **Monitoring & Telemetry** (phase 2). +- Not **Privacy & PII Redaction** (phase 2 / optional later). +- Not complex workflow engines (queues, distributed scheduling, etc.); keep sequential execution. +- Not implementing the analytics logic of other modules. +Out of scope for NOW; may be integrated later as separate services. + +--- + +## 2) Definitions (plain language) +- **Run:** One execution of the system for a given scenario + parameters. +- **Run ID:** Unique identifier for a run (used to find outputs/logs later). +- **Plan:** A list of steps the orchestrator will execute (e.g: call policy simulator → call strategy agent). +- **Artifact:** A saved output file (plot image, brief markdown, JSON results, etc.). +- **Trace/Logs:** A record of what happened during a run (steps, timing, success/failure). + +--- + +## 3) Interface (What This Module Exposes) + +### MVP API (Proposed Interface) +The orchestrator exposes a small HTTP API. + +#### `POST /plan_and_execute` +Create a run, generate a simple plan (sequence of module calls), execute it, and persist inputs/outputs/logs under a `run_id`. + +**Request (example):** This request starts a new orchestrated run for a demo policy scenario. It specifies which modules to execute and provides the scenario inputs (e.g: country, time horizon, and policy parameters like a VAT rate change). +```json +{ + "scenario_id": "demo-scenario-001", + "modules": ["policy-simulator", "strategy-agent"], + "inputs": { + "country": "DR", + "time_horizon_years": 5, + "policy_parameters": { + "vat_rate": 0.18 + } + }, + "run_options": { + "seed": 42, + "save_artifacts": true + } +} +``` + + +**Response (example):** The orchestrator returns a unique `run_id`, the execution plan it followed, and pointers to the saved outputs (KPIs/plots/brief) plus where logs and artifacts were stored for reproducibility. + +```json +{ + "run_id": "run_2026-01-09T12-34-56Z_ab12cd", + "status": "completed", + "plan": [ + {"step": 1, "module": "policy-simulator", "action": "execute"}, + {"step": 2, "module": "strategy-agent", "action": "execute"} + ], + "results": { + "policy-simulator": {"kpis": {"gdp_growth": 0.02}, "artifacts": ["kpis.json", "plot.png"]}, + "strategy-agent": {"brief": "brief.md", "artifacts": ["brief.md"]} + }, + "artifacts_path": "runs/run_2026-01-09T12-34-56Z_ab12cd/", + "logs_path": "runs/run_2026-01-09T12-34-56Z_ab12cd/logs.jsonl" +} +``` +`GET /runs/{run_id}` + +Returns the saved run metadata (inputs, plan, results pointers). + +`GET /health` + +Simple healthcheck endpoint. + +## 4) Inputs → Outputs (MVP) + +### Inputs +- `scenario_id` (string) +- `modules` (list of module names to call) +- `inputs` (JSON payload passed to modules) +- optional `run_options` (seed, toggles for saving artifacts, etc.) + +### Outputs +- `run_id` +- `plan` (what steps were executed) +- per-module results (JSON + artifact references) +- paths/pointers to stored logs and artifacts for reproducibility \ No newline at end of file From 265fe7a05abf9cad85f5a5dbe8b67c70e050f30f Mon Sep 17 00:00:00 2001 From: Adrian Con Date: Sat, 10 Jan 2026 12:14:25 -0600 Subject: [PATCH 02/42] Added sections for run storage and assumptions to call modules to the README. --- modules/orchestrator/README.md | 44 +++++++++++++++++++++++++++++++++- 1 file changed, 43 insertions(+), 1 deletion(-) diff --git a/modules/orchestrator/README.md b/modules/orchestrator/README.md index 1eb560b..a85c977 100644 --- a/modules/orchestrator/README.md +++ b/modules/orchestrator/README.md @@ -102,4 +102,46 @@ Simple healthcheck endpoint. - `run_id` - `plan` (what steps were executed) - per-module results (JSON + artifact references) -- paths/pointers to stored logs and artifacts for reproducibility \ No newline at end of file +- paths/pointers to stored logs and artifacts for reproducibility + +## 5) Run Storage (Reproducibility) + +Every call to `POST /plan_and_execute` creates a **run**. +A run is a single execution of a scenario with specific modules + inputs. + +To make runs **reproducible and traceable**, the orchestrator saves: +- The original request (inputs, chosen modules, options like seed), +- What steps were executed (the “plan”), +- Each module’s returned results, +- Basic logs of what happened. + +This is stored under a unique `run_id` in a folder like: + + +
runs/<run_id>/
+  run.json        # input request + derived plan + timestamps
+  results.json    # merged results (pointers to artifacts)
+  logs.jsonl      # structured logs (one JSON per line)
+  artifacts/
+    <module-name>/
+      ...         # plots, JSONs, briefs, etc.
+ +The API response includes the `run_id` and (optionally) paths/pointers to the saved logs and artifacts. + +## 6) How modules are called (assumptions for integration) + +For MVP integration, the orchestrator assumes modules are reachable over **HTTP** (Docker-first) and expose a minimal interface so they can be called consistently. + +The proposed minimal module contract is documented in the repo Issue: +**“MVP module interface contract (proposed)”** (see GitHub Issues). + +In short, the orchestrator expects: +- `GET /health` for basic service readiness checks +- One primary execution endpoint (preferred: `POST /execute`, or a module-specific endpoint such as `/score`, `/risk`, `/run_scenario`) +- JSON-in / JSON-out +- A consistent response “envelope” including: + - `status`, `module`, `outputs` + - Optional `artifacts` and `evidence` fields + +> Note: Endpoint naming and exact payload fields may be refined during integration month. The orchestrator will prioritize compatibility with the agreed contract in the Issue and adapt via lightweight adapters if needed. + From 2c2e00281d739cb2eb5cd9de2f051a939d3907b8 Mon Sep 17 00:00:00 2001 From: Adrian Con Date: Wed, 28 Jan 2026 10:56:17 +0100 Subject: [PATCH 03/42] Initial commit --- modules/orchestrator/README.md | 43 ++++++++++++++++++++++++++++++++-- 1 file changed, 41 insertions(+), 2 deletions(-) diff --git a/modules/orchestrator/README.md b/modules/orchestrator/README.md index a85c977..5469c19 100644 --- a/modules/orchestrator/README.md +++ b/modules/orchestrator/README.md @@ -27,7 +27,7 @@ Out of scope for NOW; may be integrated later as separate services. --- -## 2) Definitions (plain language) +## 2) Definitions (Plain Language) - **Run:** One execution of the system for a given scenario + parameters. - **Run ID:** Unique identifier for a run (used to find outputs/logs later). - **Plan:** A list of steps the orchestrator will execute (e.g: call policy simulator → call strategy agent). @@ -128,7 +128,7 @@ This is stored under a unique `run_id` in a folder like: The API response includes the `run_id` and (optionally) paths/pointers to the saved logs and artifacts. -## 6) How modules are called (assumptions for integration) +## 6) How Modules Are Called (Assumptions for Integration) For MVP integration, the orchestrator assumes modules are reachable over **HTTP** (Docker-first) and expose a minimal interface so they can be called consistently. @@ -145,3 +145,42 @@ In short, the orchestrator expects: > Note: Endpoint naming and exact payload fields may be refined during integration month. The orchestrator will prioritize compatibility with the agreed contract in the Issue and adapt via lightweight adapters if needed. +## 7) How To Run (Local / Docker) + +> Status: This module is currently in setup phase. The commands below describe the intended MVP run method and will be made runnable as the skeleton is implemented. + +### Docker (recommended for reproducibility) +Planned workflow: +```bash +# from modules/orchestrator +docker build -t openpolicystack-orchestrator:dev . +docker run --rm -p 8000:8000 \ + -v $(pwd)/runs:/app/runs \ + openpolicystack-orchestrator:dev +``` + +### Local (for development) +Planned workflow: +```bash +# from modules/orchestrator +pip install -r requirements.txt +# example server command (framework to be confirmed) +python -m src.main +``` +## 8) Quickstart Demo (Planned) + +> Status: This is the intended MVP smoke test once the orchestrator skeleton is implemented. + +1) Start the orchestrator (see Section 7) +2) Start at least one module service (or a stub) reachable by the orchestrator +3) Trigger a run: + +```bash +curl -X POST http://localhost:8000/plan_and_execute \ + -H "Content-Type: application/json" \ + -d @examples/plan_and_execute.request.json +``` +Expected behavior (once implemented): +- Returns a JSON response containing a `run_id` +- Creates `runs//` with `run.json`, `results.json`, `logs.jsonl`, and `artifacts/` + From be8f2349fce252bf6cffd1b60dc57b064e18a1cf Mon Sep 17 00:00:00 2001 From: Adrian Con Date: Wed, 28 Jan 2026 14:48:15 +0100 Subject: [PATCH 04/42] Nothing to commit. --- .github/pull_request_template.md | 13 +++++-------- 1 file changed, 5 insertions(+), 8 deletions(-) diff --git a/.github/pull_request_template.md b/.github/pull_request_template.md index 2bd66b3..7d8291d 100644 --- a/.github/pull_request_template.md +++ b/.github/pull_request_template.md @@ -1,10 +1,7 @@ -## What does this PR change? -- - -## Which module? -- +## Summary +Explain what changed and why. ## Checklist -- [ ] README updated (how to run + demo) -- [ ] Inputs/outputs documented -- [ ] No credentials / private data +- [ ] No secrets/credentials committed +- [ ] Docs updated (if needed) +- [ ] Tests added/updated (if applicable) From a7b7127943c50761b6714d333001159f24eeeafe Mon Sep 17 00:00:00 2001 From: Adrian Con Date: Mon, 2 Feb 2026 20:06:13 +0100 Subject: [PATCH 05/42] Added a monitor module outputs section to. All generated outputs from the monitor module workflows will now be ignored by git. --- .gitignore | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/.gitignore b/.gitignore index 2fb32df..b98b2e3 100644 --- a/.gitignore +++ b/.gitignore @@ -236,5 +236,14 @@ __pycache__/ .DS_Store Thumbs.db +# Monitor module generated outputs +modules/monitor/data/ +modules/monitor/deliverables/ +modules/monitor/embedding/ +modules/monitor/*.log +modules/monitor/*.dat +modules/monitor/**/*.db +modules/monitor/**/*.db-journal + OpenPolicyStack/ From d5aaf9a12adcd7ae0ae7ceceec0a3529ac672955 Mon Sep 17 00:00:00 2001 From: Adrian Con Date: Wed, 4 Feb 2026 09:44:03 +0100 Subject: [PATCH 06/42] Created a new README.md in the modules/monitor folder with an initial Module Interface Brief. --- modules/monitor/README.md | 397 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 397 insertions(+) create mode 100644 modules/monitor/README.md diff --git a/modules/monitor/README.md b/modules/monitor/README.md new file mode 100644 index 0000000..672fa7e --- /dev/null +++ b/modules/monitor/README.md @@ -0,0 +1,397 @@ +## Monitor Module (EFMO – European Funding Monitor) + +### 1. Purpose & Scope + +**Problem it solves** + +The monitor module (EFMO) automates monitoring of EU-funded projects for specific technology domains (quantum, HPC, AI, cybersecurity). It: + +- **Ingests** projects and organizations from the EU Funding & Tenders Portal +- **Filters** them by topic using keyword-based scoring and (optionally) LLM-based categorization +- **Aggregates & analyzes** funding and participation over time +- **Publishes** the processed data as SQLite databases and figures for dashboards and reporting + +**Policy questions it helps answer (per topic)** + +Examples (for quantum, but analogous for HPC/AI/cybersecurity): + +- How much EU funding is going into *quantum computing* vs *quantum sensing* vs *quantum communication* over time? +- Which countries and organization types are most active in this topic? +- How is the portfolio evolving (TRL distribution, platforms, applications)? +- What new relevant projects have appeared since the last run? + +**When the orchestrator should call it** + +- **Do call** when: + - You want to **refresh topic dashboards / analytics** (e.g. daily, weekly). + - Raw source data has been updated (sourcing workflow has run recently). + - You need an updated **topic-specific snapshot** for a briefing or policy question. + +- **Do *not* call** when: + - You need **ad‑hoc, per-request analytics** on a small set of projects (better to query the DB directly). + - You don’t have valid **EU portal API credentials** (for sourcing) or the network is constrained. + - You cannot or do not want to incur **LLM costs** and you strictly require LLM-derived categories (e.g. “quantum computing” vs “basic science”). + +At a high level: **orchestrator should treat this as a scheduled batch job** that periodically refreshes topic-specific funding intelligence, not as an interactive microservice. + +--- + +### 2. Invocation Contract + +**Current invocation options** + +1. **Python API (recommended for orchestrator)** + +For a given topic (e.g. quantum): + +```python +from modules.monitor.data_workflows import MonitorWorkflow, DataSourcingWorkflow +from modules.monitor.workflow_settings import ( + sourcing_settings, + quantum_settings, + hpc_settings, + ai_settings, + cybersecurity_settings, +) + +# One-time / periodic sourcing (raw data from EU API) +sourcing = DataSourcingWorkflow("sourcing", sourcing_settings) +sourcing.run() + +# Topic-specific monitor run, using sourced data +quantum = MonitorWorkflow("quantum", quantum_settings) +quantum.run() +``` + +This is the **most deterministic** way for the orchestrator to trigger runs: you explicitly choose which workflow to run and when. + +2. **Scheduler script** + +```bash +cd modules/monitor +pipenv run python scheduler.py +``` + +- Uses `ENV` env var (`dev` vs `prod`) to decide behavior. +- In `prod` it uses `schedule` to run multiple workflows at fixed UTC times. +- In `dev` it currently runs the quantum workflow once and then loops. + +For an orchestrator, this is more of a **standalone daemon**; less granular than calling workflows directly. + +**Recommended invocation contract for the orchestrator** + +Treat the monitor as a Python module with **two primary entrypoints**: + +- `run_sourcing()` – refresh raw data from EU portal. +- `run_monitor(topic, options)` – run one topic workflow with explicit options. + +Conceptually: + +```python +def run_sourcing(): + DataSourcingWorkflow("sourcing", sourcing_settings).run() + +def run_monitor(topic: str, *, use_llm: bool = True): + settings_map = { + "quantum": quantum_settings, + "hpc": hpc_settings, + "ai": ai_settings, + "cybersecurity": cybersecurity_settings, + } + settings = settings_map[topic] + # Orchestrator sets settings.suppress_llm_categorization as needed + MonitorWorkflow(topic, settings).run() +``` + +This keeps **triggering deterministic**: the orchestrator always specifies `{workflow, topic, use_llm}` explicitly. + +--- + +### 3. Input Schema + +The module does **not** currently consume a JSON payload; its “inputs” are: + +- **Configuration classes** in `workflow_settings.py` (per-topic settings) +- **Environment variables**: + - `SEDIA_API_KEY` (EU Funding & Tenders Portal API) – required for sourcing + - `lite_llm_url`, `lite_llm_model`, `lite_llm_api_key` – required if LLM categorization is enabled + - `hook_teams` – required if Teams newsletter is enabled +- **Existing files**: + - Raw data pickles (after sourcing) + - Optional manual CSV inputs + +For orchestration purposes, you can conceptualize a **run request** as a JSON config that drives how you call the Python API. + +#### Proposed run request JSON (for orchestrator) + +```json +{ + "workflow": "monitor", + "topic": "quantum", + "mode": "full", + "use_llm": true, + "suppress_ft_crawl": false, + "import_manual_data": true, + "send_deliverable": false, + "send_newsletter": true +} +``` + +#### JSON Schema (proposed) + +```json +{ + "$schema": "http://json-schema.org/draft-07/schema#", + "title": "MonitorRunRequest", + "type": "object", + "required": ["workflow", "topic", "mode"], + "properties": { + "workflow": { + "type": "string", + "enum": ["sourcing", "monitor"], + "description": "Which workflow to run." + }, + "topic": { + "type": "string", + "enum": ["quantum", "hpc", "ai", "cybersecurity"], + "description": "Topic-specific monitor to run (ignored for pure sourcing)." + }, + "mode": { + "type": "string", + "enum": ["full", "no_llm"], + "description": "Whether to include LLM categorization." + }, + "use_llm": { + "type": "boolean", + "default": true, + "description": "Convenience flag; equivalent to mode != 'no_llm'." + }, + "suppress_ft_crawl": { + "type": "boolean", + "default": false, + "description": "If true, sourcing reuses cached raw data instead of hitting the EU API." + }, + "import_manual_data": { + "type": "boolean", + "default": false, + "description": "Whether to merge manual CSVs into the dataset." + }, + "send_deliverable": { + "type": "boolean", + "default": false, + "description": "Whether to zip deliverables and send (currently email delivery is disabled)." + }, + "send_newsletter": { + "type": "boolean", + "default": false, + "description": "Whether to send a Teams newsletter with new projects." + } + }, + "additionalProperties": false +} +``` + +**Mandatory inputs for a successful run** + +- For **sourcing**: + - `workflow = "sourcing"` + - Valid `SEDIA_API_KEY` in environment +- For **monitor**: + - `workflow = "monitor"` + - `topic` in allowed set + - Raw data files present (sourcing has been run previously) + - If `mode = "full"` / `use_llm = true`: valid `lite_llm_*` env vars and connectivity + +The orchestrator can validate this JSON against the schema before calling the Python API. + +--- + +### 4. Output Contract + +**Output types** + +1. **Intermediate data files (per topic)** – under `modules/monitor/data/{topic}/` + - `filtered_projects.csv` + - `filtered_organizations.csv` + - `filtered_projects_prev.csv` + - `processed_projects_diff.csv` + - `matchscore_histogram.png` + - Optional: `input_manual_projects.csv`, `input_manual_orgas.csv` (orchestrator / human-managed) + + These are **intermediate analysis products** – suitable for exploration, debugging, or feeding other data pipelines. + +2. **SQLite database (per topic)** – under `modules/monitor/deliverables/{topic}/` + - `{topic}.db` with tables: + - `projects` – cleaned project-level data (including LLM-based dimensions if enabled) + - `organizations` – organization-level data + - `metadata` – run metadata (timestamps, keywords, thresholds, prompt) + + This is the **primary machine-consumable output** for downstream dashboards or analytics. + +3. **Evaluation artifacts (per topic)** – under `modules/monitor/deliverables/{topic}/` + - PNG plots: + - `TotalFundingByFPOverTime.png` + - `TotalFundingByLLMCategoryOverTime.png` (requires LLM categories) + - `OrganizationsByCountryGroupOverTime.png` + - `OrganizationTypeByCountryGroupOverTime.png` + - `TotalFundingbyFP.png` + - JSON results: + - `TotalFundingByFPOverTime.json` + - `TotalFundingByLLMCategoryOverTime.json` (requires LLM) + - `OrganizationsByCountryGroupOverTime.json` + - `OrganizationsByCountryGroupOverTime_absolute.json` + - `OrganizationTypeByCountryGroupOverTime.json` + - `TotalFundingbyFP.json` + + These are **intermediate / analytic outputs** – suitable for dashboards and human interpretation, but still “data-level”, not narrative policy briefs. + +4. **Notifications (optional)** + +- Teams messages (if `send_newsletter = true` and `hook_teams` configured) listing newly added projects in a given topic. + +**Final vs intermediate** + +- **Final, policy-ready data for other modules**: + - The **SQLite DBs** (and their JSON summaries) – these should be treated as the canonical, versioned outputs that downstream modules (or the orchestrator) consume. +- **Intermediate**: + - CSVs in `data/{topic}/` + - PNG plots in `deliverables/{topic}/` + - Logs and temporary `.dat` files + +The orchestrator should primarily depend on **existence and freshness of `{topic}.db`** (plus metadata table) as the success signal and downstream interface. + +--- + +### 5. Determinism & Reproducibility + +**Is it deterministic given the same inputs?** + +Not fully, because: + +- **External data source**: + - Sourcing hits the live EU Funding & Tenders Portal API. The dataset changes over time. +- **Time-based filters**: + - New-project detection filters by `ecSignatureDate` within the last 4 weeks, using `datetime.now()`. +- **LLM categorization**: + - Calls a remote LLM API; outputs can vary slightly run-to-run even for the same prompt. + +Given completely frozen inputs (frozen raw data files, fixed env vars, same code version) and **LLM disabled**, the keyword-based parts are deterministic. + +**Randomness** + +- No explicit RNG usage; non-determinism comes from: + - Network APIs (EU portal + LLM provider) + - Current wall-clock time + - Possible non-determinism in remote LLM + +**Logging & metadata** + +- Logging: + - `scheduler.py` configures logging to `scheduler.log` + stdout: + - Logs environment configuration (`ENV`, `lite_llm_*` prefixes) + - Logs workflow execution and keep-alive heartbeat. + - Other modules (`data_sourcing.py`, `data_processing.py`, `data_workflows.py`) use `logging` to record progress and errors. + +- Metadata in outputs: + - `metadata` table in each `{topic}.db` contains: + - `DataAnalysisStartDate` + - `DataAnalysisEndDate` + - `categorization_prompt` + - `keyword_list` + - `matchscore_threshold` + - This is sufficient to reconstruct *how* the run was configured. + +- Not currently logged: + - Git commit hash + - Exact version of dependencies or OS + - Exact LLM model version (beyond model name string) + +**For orchestrator-managed reproducible runs** + +To move towards reproducible “Runs”, the orchestrator should: + +- Capture and store: + - Code revision / commit hash for each run + - Full monitor run request (the JSON described above) + - Environment snapshot for key variables (e.g., `SEDIA_API_KEY` omitted, but flags & URLs kept) +- Optionally, disable LLM (`mode = "no_llm"`) for runs where strict determinism is a requirement. + +--- + +### 6. Runtime & Failure Modes + +**Common runtime dependencies** + +- Network connectivity to: + - EU Funding & Tenders Portal API (`SEDIA_API_KEY`). + - Remote LLM endpoint (`lite_llm_url`) if LLM is enabled. + - Microsoft Teams webhook (`hook_teams`) if newsletters are enabled. +- Filesystem write access under `modules/monitor/data/` and `modules/monitor/deliverables/`. + +**Typical failure modes** + +1. **Missing or invalid credentials** + - `SEDIA_API_KEY` not set → sourcing fails with HTTP error or “unauthorized”. + - `lite_llm_*` not set but `suppress_llm_categorization = False` → LLM categorization step raises an exception from the OpenAI client. + - `hook_teams` not set but `send_newsletter = True` → Teams delivery fails; logged error. + + *Signal to orchestrator*: Non-zero exit code from the Python process, error entries in logs, and **missing or stale `{topic}.db`**. + +2. **Network / API issues** + - EU API timeouts, connection errors, or malformed JSON: + - Handled via retry loops with sleep; if retries exhausted, particular chunks/codes are skipped and logged. + - LLM API rate limits or errors: + - Retried with exponential backoff (`make_chat_completion`), ultimately returning empty responses on hard failure. + + *Consequence*: Partial data download or incomplete categorization; outputs may exist but be incomplete. Orchestrator should treat “missing DB” or DB without fresh metadata as a failure signal. + +3. **Missing intermediate files** + - If `suppress_llm_categorization = True` but no previously generated `filtered_projects.csv` / `filtered_organizations.csv` exist, the monitor workflow will fail on: + - `pd.read_csv(self.settings.filtered_projects_filename, ...)` + + *Mitigation*: As orchestrator, enforce ordering: + - Always run sourcing + full monitor at least once to create baseline filtered files. + - Or modify the module (future enhancement) to support a “keyword-only, no-LLM” path that still writes required files. + +4. **Data shape / schema drift** + - The EU portal API might change field names or structure. + - This can cause: + - KeyErrors when building DataFrames + - Issues in evaluation logic that assumes certain columns. + + *Signal*: Python exceptions in `data_sourcing.py` or `data_evaluation.py`, logged with stack traces. + +5. **Disk / SQLite issues** + - Insufficient disk space or permissions when writing: + - CSVs in `data/{topic}/` + - DBs in `deliverables/{topic}/` + - SQLite `to_sql` failures. + + *Signal*: Exceptions during DB write; absence or partial creation of `{topic}.db`. + +6. **Long runtimes** + - Sourcing can take **hours** due to thousands of EU API requests. + - LLM categorization can also take hours because of rate limits and per-project calls. + + *Operational implications*: + - Orchestrator should run these as **background jobs** with timeouts and monitoring. + - Avoid scheduling overlapping LLM-heavy workflows (as noted in `docs/scheduler.md`). + +**Failure signaling pattern the orchestrator can rely on** + +- **Hard failure (run considered failed)**: + - Python process exits with error. + - Missing or unchanged `{topic}.db` or `metadata` timestamps after the run. + +- **Soft / partial failure (run completes but data incomplete)**: + - `{topic}.db` exists but: + - `metadata` shows errors in logs (or missing expected fields). + - Evaluations relying on `LLMCategory` are missing when LLM was expected. + +For robust orchestration, you should: + +- Treat “fresh `{topic}.db` + updated `metadata` timestamp” as the **success condition**. +- Inspect `scheduler.log` or per-run logs when that condition is not met. +- Optionally, enforce timeouts and retry policies at the orchestrator level. + + From 6e7412fc5e327fe3538d8e034f1f7a51ccc073d0 Mon Sep 17 00:00:00 2001 From: Adrian Con Date: Wed, 4 Feb 2026 09:55:03 +0100 Subject: [PATCH 07/42] Added monitor_adapter.py under the monitor module. This is the Python Wrapper API. --- modules/monitor/monitor_adapter.py | 84 ++++++++++++++++++++++++++++++ 1 file changed, 84 insertions(+) create mode 100644 modules/monitor/monitor_adapter.py diff --git a/modules/monitor/monitor_adapter.py b/modules/monitor/monitor_adapter.py new file mode 100644 index 0000000..6f0e297 --- /dev/null +++ b/modules/monitor/monitor_adapter.py @@ -0,0 +1,84 @@ +from typing import Any, Dict + +from data_workflows import DataSourcingWorkflow, MonitorWorkflow +from workflow_settings import ( + sourcing_settings, + quantum_settings, + hpc_settings, + ai_settings, + cybersecurity_settings, +) + + +TOPIC_SETTINGS = { + "quantum": quantum_settings, + "hpc": hpc_settings, + "ai": ai_settings, + "cybersecurity": cybersecurity_settings, +} + + +def run_monitor_from_request(req: Dict[str, Any]) -> Dict[str, Any]: + """ + Minimal adapter for the monitor module. + + Expects a MonitorRunRequest-like dict: + + { + "workflow": "sourcing" | "monitor", + "topic": "quantum" | "hpc" | "ai" | "cybersecurity", + "mode": "full" | "no_llm", + "suppress_ft_crawl": bool, + "import_manual_data": bool, + "send_deliverable": bool, + "send_newsletter": bool + } + """ + + workflow = req["workflow"] + topic = req.get("topic") + mode = req.get("mode", "full") + + if workflow == "sourcing": + # Allow orchestrator to override suppress_ft_crawl if needed + sourcing_settings.suppress_ft_crawl = bool(req.get("suppress_ft_crawl", False)) + + wf = DataSourcingWorkflow("sourcing", sourcing_settings) + wf.run() + + return { + "status": "completed", + "workflow": "sourcing", + "topic": None, + "artifacts": [], + } + + if workflow == "monitor": + if topic not in TOPIC_SETTINGS: + raise ValueError(f"Unsupported topic: {topic}") + + settings_cls = TOPIC_SETTINGS[topic] + + # Mutate settings based on request (simple, explicit knobs) + settings_cls.suppress_llm_categorization = mode != "no_llm" + settings_cls.import_manual_data = bool(req.get("import_manual_data", False)) + settings_cls.send_deliverable = bool(req.get("send_deliverable", False)) + settings_cls.send_newsletter = bool(req.get("send_newsletter", False)) + + wf = MonitorWorkflow(topic, settings_cls) + wf.run() + + return { + "status": "completed", + "workflow": "monitor", + "topic": topic, + "mode": mode, + "outputs": { + "db_path": f"modules/monitor/deliverables/{topic}/{topic}.db", + "data_dir": f"modules/monitor/data/{topic}/", + }, + } + + raise ValueError(f"Unsupported workflow: {workflow}") + + From 81e21c006c997f53250eaa316857d8b1eb4b77db Mon Sep 17 00:00:00 2001 From: Adrian Con Date: Mon, 16 Mar 2026 13:41:06 +0100 Subject: [PATCH 08/42] Deleted unnecessary files (I previously added) from the monitor module. --- modules/monitor/README.md | 397 ----------------------------- modules/monitor/monitor_adapter.py | 84 ------ 2 files changed, 481 deletions(-) delete mode 100644 modules/monitor/README.md delete mode 100644 modules/monitor/monitor_adapter.py diff --git a/modules/monitor/README.md b/modules/monitor/README.md deleted file mode 100644 index 672fa7e..0000000 --- a/modules/monitor/README.md +++ /dev/null @@ -1,397 +0,0 @@ -## Monitor Module (EFMO – European Funding Monitor) - -### 1. Purpose & Scope - -**Problem it solves** - -The monitor module (EFMO) automates monitoring of EU-funded projects for specific technology domains (quantum, HPC, AI, cybersecurity). It: - -- **Ingests** projects and organizations from the EU Funding & Tenders Portal -- **Filters** them by topic using keyword-based scoring and (optionally) LLM-based categorization -- **Aggregates & analyzes** funding and participation over time -- **Publishes** the processed data as SQLite databases and figures for dashboards and reporting - -**Policy questions it helps answer (per topic)** - -Examples (for quantum, but analogous for HPC/AI/cybersecurity): - -- How much EU funding is going into *quantum computing* vs *quantum sensing* vs *quantum communication* over time? -- Which countries and organization types are most active in this topic? -- How is the portfolio evolving (TRL distribution, platforms, applications)? -- What new relevant projects have appeared since the last run? - -**When the orchestrator should call it** - -- **Do call** when: - - You want to **refresh topic dashboards / analytics** (e.g. daily, weekly). - - Raw source data has been updated (sourcing workflow has run recently). - - You need an updated **topic-specific snapshot** for a briefing or policy question. - -- **Do *not* call** when: - - You need **ad‑hoc, per-request analytics** on a small set of projects (better to query the DB directly). - - You don’t have valid **EU portal API credentials** (for sourcing) or the network is constrained. - - You cannot or do not want to incur **LLM costs** and you strictly require LLM-derived categories (e.g. “quantum computing” vs “basic science”). - -At a high level: **orchestrator should treat this as a scheduled batch job** that periodically refreshes topic-specific funding intelligence, not as an interactive microservice. - ---- - -### 2. Invocation Contract - -**Current invocation options** - -1. **Python API (recommended for orchestrator)** - -For a given topic (e.g. quantum): - -```python -from modules.monitor.data_workflows import MonitorWorkflow, DataSourcingWorkflow -from modules.monitor.workflow_settings import ( - sourcing_settings, - quantum_settings, - hpc_settings, - ai_settings, - cybersecurity_settings, -) - -# One-time / periodic sourcing (raw data from EU API) -sourcing = DataSourcingWorkflow("sourcing", sourcing_settings) -sourcing.run() - -# Topic-specific monitor run, using sourced data -quantum = MonitorWorkflow("quantum", quantum_settings) -quantum.run() -``` - -This is the **most deterministic** way for the orchestrator to trigger runs: you explicitly choose which workflow to run and when. - -2. **Scheduler script** - -```bash -cd modules/monitor -pipenv run python scheduler.py -``` - -- Uses `ENV` env var (`dev` vs `prod`) to decide behavior. -- In `prod` it uses `schedule` to run multiple workflows at fixed UTC times. -- In `dev` it currently runs the quantum workflow once and then loops. - -For an orchestrator, this is more of a **standalone daemon**; less granular than calling workflows directly. - -**Recommended invocation contract for the orchestrator** - -Treat the monitor as a Python module with **two primary entrypoints**: - -- `run_sourcing()` – refresh raw data from EU portal. -- `run_monitor(topic, options)` – run one topic workflow with explicit options. - -Conceptually: - -```python -def run_sourcing(): - DataSourcingWorkflow("sourcing", sourcing_settings).run() - -def run_monitor(topic: str, *, use_llm: bool = True): - settings_map = { - "quantum": quantum_settings, - "hpc": hpc_settings, - "ai": ai_settings, - "cybersecurity": cybersecurity_settings, - } - settings = settings_map[topic] - # Orchestrator sets settings.suppress_llm_categorization as needed - MonitorWorkflow(topic, settings).run() -``` - -This keeps **triggering deterministic**: the orchestrator always specifies `{workflow, topic, use_llm}` explicitly. - ---- - -### 3. Input Schema - -The module does **not** currently consume a JSON payload; its “inputs” are: - -- **Configuration classes** in `workflow_settings.py` (per-topic settings) -- **Environment variables**: - - `SEDIA_API_KEY` (EU Funding & Tenders Portal API) – required for sourcing - - `lite_llm_url`, `lite_llm_model`, `lite_llm_api_key` – required if LLM categorization is enabled - - `hook_teams` – required if Teams newsletter is enabled -- **Existing files**: - - Raw data pickles (after sourcing) - - Optional manual CSV inputs - -For orchestration purposes, you can conceptualize a **run request** as a JSON config that drives how you call the Python API. - -#### Proposed run request JSON (for orchestrator) - -```json -{ - "workflow": "monitor", - "topic": "quantum", - "mode": "full", - "use_llm": true, - "suppress_ft_crawl": false, - "import_manual_data": true, - "send_deliverable": false, - "send_newsletter": true -} -``` - -#### JSON Schema (proposed) - -```json -{ - "$schema": "http://json-schema.org/draft-07/schema#", - "title": "MonitorRunRequest", - "type": "object", - "required": ["workflow", "topic", "mode"], - "properties": { - "workflow": { - "type": "string", - "enum": ["sourcing", "monitor"], - "description": "Which workflow to run." - }, - "topic": { - "type": "string", - "enum": ["quantum", "hpc", "ai", "cybersecurity"], - "description": "Topic-specific monitor to run (ignored for pure sourcing)." - }, - "mode": { - "type": "string", - "enum": ["full", "no_llm"], - "description": "Whether to include LLM categorization." - }, - "use_llm": { - "type": "boolean", - "default": true, - "description": "Convenience flag; equivalent to mode != 'no_llm'." - }, - "suppress_ft_crawl": { - "type": "boolean", - "default": false, - "description": "If true, sourcing reuses cached raw data instead of hitting the EU API." - }, - "import_manual_data": { - "type": "boolean", - "default": false, - "description": "Whether to merge manual CSVs into the dataset." - }, - "send_deliverable": { - "type": "boolean", - "default": false, - "description": "Whether to zip deliverables and send (currently email delivery is disabled)." - }, - "send_newsletter": { - "type": "boolean", - "default": false, - "description": "Whether to send a Teams newsletter with new projects." - } - }, - "additionalProperties": false -} -``` - -**Mandatory inputs for a successful run** - -- For **sourcing**: - - `workflow = "sourcing"` - - Valid `SEDIA_API_KEY` in environment -- For **monitor**: - - `workflow = "monitor"` - - `topic` in allowed set - - Raw data files present (sourcing has been run previously) - - If `mode = "full"` / `use_llm = true`: valid `lite_llm_*` env vars and connectivity - -The orchestrator can validate this JSON against the schema before calling the Python API. - ---- - -### 4. Output Contract - -**Output types** - -1. **Intermediate data files (per topic)** – under `modules/monitor/data/{topic}/` - - `filtered_projects.csv` - - `filtered_organizations.csv` - - `filtered_projects_prev.csv` - - `processed_projects_diff.csv` - - `matchscore_histogram.png` - - Optional: `input_manual_projects.csv`, `input_manual_orgas.csv` (orchestrator / human-managed) - - These are **intermediate analysis products** – suitable for exploration, debugging, or feeding other data pipelines. - -2. **SQLite database (per topic)** – under `modules/monitor/deliverables/{topic}/` - - `{topic}.db` with tables: - - `projects` – cleaned project-level data (including LLM-based dimensions if enabled) - - `organizations` – organization-level data - - `metadata` – run metadata (timestamps, keywords, thresholds, prompt) - - This is the **primary machine-consumable output** for downstream dashboards or analytics. - -3. **Evaluation artifacts (per topic)** – under `modules/monitor/deliverables/{topic}/` - - PNG plots: - - `TotalFundingByFPOverTime.png` - - `TotalFundingByLLMCategoryOverTime.png` (requires LLM categories) - - `OrganizationsByCountryGroupOverTime.png` - - `OrganizationTypeByCountryGroupOverTime.png` - - `TotalFundingbyFP.png` - - JSON results: - - `TotalFundingByFPOverTime.json` - - `TotalFundingByLLMCategoryOverTime.json` (requires LLM) - - `OrganizationsByCountryGroupOverTime.json` - - `OrganizationsByCountryGroupOverTime_absolute.json` - - `OrganizationTypeByCountryGroupOverTime.json` - - `TotalFundingbyFP.json` - - These are **intermediate / analytic outputs** – suitable for dashboards and human interpretation, but still “data-level”, not narrative policy briefs. - -4. **Notifications (optional)** - -- Teams messages (if `send_newsletter = true` and `hook_teams` configured) listing newly added projects in a given topic. - -**Final vs intermediate** - -- **Final, policy-ready data for other modules**: - - The **SQLite DBs** (and their JSON summaries) – these should be treated as the canonical, versioned outputs that downstream modules (or the orchestrator) consume. -- **Intermediate**: - - CSVs in `data/{topic}/` - - PNG plots in `deliverables/{topic}/` - - Logs and temporary `.dat` files - -The orchestrator should primarily depend on **existence and freshness of `{topic}.db`** (plus metadata table) as the success signal and downstream interface. - ---- - -### 5. Determinism & Reproducibility - -**Is it deterministic given the same inputs?** - -Not fully, because: - -- **External data source**: - - Sourcing hits the live EU Funding & Tenders Portal API. The dataset changes over time. -- **Time-based filters**: - - New-project detection filters by `ecSignatureDate` within the last 4 weeks, using `datetime.now()`. -- **LLM categorization**: - - Calls a remote LLM API; outputs can vary slightly run-to-run even for the same prompt. - -Given completely frozen inputs (frozen raw data files, fixed env vars, same code version) and **LLM disabled**, the keyword-based parts are deterministic. - -**Randomness** - -- No explicit RNG usage; non-determinism comes from: - - Network APIs (EU portal + LLM provider) - - Current wall-clock time - - Possible non-determinism in remote LLM - -**Logging & metadata** - -- Logging: - - `scheduler.py` configures logging to `scheduler.log` + stdout: - - Logs environment configuration (`ENV`, `lite_llm_*` prefixes) - - Logs workflow execution and keep-alive heartbeat. - - Other modules (`data_sourcing.py`, `data_processing.py`, `data_workflows.py`) use `logging` to record progress and errors. - -- Metadata in outputs: - - `metadata` table in each `{topic}.db` contains: - - `DataAnalysisStartDate` - - `DataAnalysisEndDate` - - `categorization_prompt` - - `keyword_list` - - `matchscore_threshold` - - This is sufficient to reconstruct *how* the run was configured. - -- Not currently logged: - - Git commit hash - - Exact version of dependencies or OS - - Exact LLM model version (beyond model name string) - -**For orchestrator-managed reproducible runs** - -To move towards reproducible “Runs”, the orchestrator should: - -- Capture and store: - - Code revision / commit hash for each run - - Full monitor run request (the JSON described above) - - Environment snapshot for key variables (e.g., `SEDIA_API_KEY` omitted, but flags & URLs kept) -- Optionally, disable LLM (`mode = "no_llm"`) for runs where strict determinism is a requirement. - ---- - -### 6. Runtime & Failure Modes - -**Common runtime dependencies** - -- Network connectivity to: - - EU Funding & Tenders Portal API (`SEDIA_API_KEY`). - - Remote LLM endpoint (`lite_llm_url`) if LLM is enabled. - - Microsoft Teams webhook (`hook_teams`) if newsletters are enabled. -- Filesystem write access under `modules/monitor/data/` and `modules/monitor/deliverables/`. - -**Typical failure modes** - -1. **Missing or invalid credentials** - - `SEDIA_API_KEY` not set → sourcing fails with HTTP error or “unauthorized”. - - `lite_llm_*` not set but `suppress_llm_categorization = False` → LLM categorization step raises an exception from the OpenAI client. - - `hook_teams` not set but `send_newsletter = True` → Teams delivery fails; logged error. - - *Signal to orchestrator*: Non-zero exit code from the Python process, error entries in logs, and **missing or stale `{topic}.db`**. - -2. **Network / API issues** - - EU API timeouts, connection errors, or malformed JSON: - - Handled via retry loops with sleep; if retries exhausted, particular chunks/codes are skipped and logged. - - LLM API rate limits or errors: - - Retried with exponential backoff (`make_chat_completion`), ultimately returning empty responses on hard failure. - - *Consequence*: Partial data download or incomplete categorization; outputs may exist but be incomplete. Orchestrator should treat “missing DB” or DB without fresh metadata as a failure signal. - -3. **Missing intermediate files** - - If `suppress_llm_categorization = True` but no previously generated `filtered_projects.csv` / `filtered_organizations.csv` exist, the monitor workflow will fail on: - - `pd.read_csv(self.settings.filtered_projects_filename, ...)` - - *Mitigation*: As orchestrator, enforce ordering: - - Always run sourcing + full monitor at least once to create baseline filtered files. - - Or modify the module (future enhancement) to support a “keyword-only, no-LLM” path that still writes required files. - -4. **Data shape / schema drift** - - The EU portal API might change field names or structure. - - This can cause: - - KeyErrors when building DataFrames - - Issues in evaluation logic that assumes certain columns. - - *Signal*: Python exceptions in `data_sourcing.py` or `data_evaluation.py`, logged with stack traces. - -5. **Disk / SQLite issues** - - Insufficient disk space or permissions when writing: - - CSVs in `data/{topic}/` - - DBs in `deliverables/{topic}/` - - SQLite `to_sql` failures. - - *Signal*: Exceptions during DB write; absence or partial creation of `{topic}.db`. - -6. **Long runtimes** - - Sourcing can take **hours** due to thousands of EU API requests. - - LLM categorization can also take hours because of rate limits and per-project calls. - - *Operational implications*: - - Orchestrator should run these as **background jobs** with timeouts and monitoring. - - Avoid scheduling overlapping LLM-heavy workflows (as noted in `docs/scheduler.md`). - -**Failure signaling pattern the orchestrator can rely on** - -- **Hard failure (run considered failed)**: - - Python process exits with error. - - Missing or unchanged `{topic}.db` or `metadata` timestamps after the run. - -- **Soft / partial failure (run completes but data incomplete)**: - - `{topic}.db` exists but: - - `metadata` shows errors in logs (or missing expected fields). - - Evaluations relying on `LLMCategory` are missing when LLM was expected. - -For robust orchestration, you should: - -- Treat “fresh `{topic}.db` + updated `metadata` timestamp” as the **success condition**. -- Inspect `scheduler.log` or per-run logs when that condition is not met. -- Optionally, enforce timeouts and retry policies at the orchestrator level. - - diff --git a/modules/monitor/monitor_adapter.py b/modules/monitor/monitor_adapter.py deleted file mode 100644 index 6f0e297..0000000 --- a/modules/monitor/monitor_adapter.py +++ /dev/null @@ -1,84 +0,0 @@ -from typing import Any, Dict - -from data_workflows import DataSourcingWorkflow, MonitorWorkflow -from workflow_settings import ( - sourcing_settings, - quantum_settings, - hpc_settings, - ai_settings, - cybersecurity_settings, -) - - -TOPIC_SETTINGS = { - "quantum": quantum_settings, - "hpc": hpc_settings, - "ai": ai_settings, - "cybersecurity": cybersecurity_settings, -} - - -def run_monitor_from_request(req: Dict[str, Any]) -> Dict[str, Any]: - """ - Minimal adapter for the monitor module. - - Expects a MonitorRunRequest-like dict: - - { - "workflow": "sourcing" | "monitor", - "topic": "quantum" | "hpc" | "ai" | "cybersecurity", - "mode": "full" | "no_llm", - "suppress_ft_crawl": bool, - "import_manual_data": bool, - "send_deliverable": bool, - "send_newsletter": bool - } - """ - - workflow = req["workflow"] - topic = req.get("topic") - mode = req.get("mode", "full") - - if workflow == "sourcing": - # Allow orchestrator to override suppress_ft_crawl if needed - sourcing_settings.suppress_ft_crawl = bool(req.get("suppress_ft_crawl", False)) - - wf = DataSourcingWorkflow("sourcing", sourcing_settings) - wf.run() - - return { - "status": "completed", - "workflow": "sourcing", - "topic": None, - "artifacts": [], - } - - if workflow == "monitor": - if topic not in TOPIC_SETTINGS: - raise ValueError(f"Unsupported topic: {topic}") - - settings_cls = TOPIC_SETTINGS[topic] - - # Mutate settings based on request (simple, explicit knobs) - settings_cls.suppress_llm_categorization = mode != "no_llm" - settings_cls.import_manual_data = bool(req.get("import_manual_data", False)) - settings_cls.send_deliverable = bool(req.get("send_deliverable", False)) - settings_cls.send_newsletter = bool(req.get("send_newsletter", False)) - - wf = MonitorWorkflow(topic, settings_cls) - wf.run() - - return { - "status": "completed", - "workflow": "monitor", - "topic": topic, - "mode": mode, - "outputs": { - "db_path": f"modules/monitor/deliverables/{topic}/{topic}.db", - "data_dir": f"modules/monitor/data/{topic}/", - }, - } - - raise ValueError(f"Unsupported workflow: {workflow}") - - From b27e0782a75f19aaa2a2b2354faf090343fded7d Mon Sep 17 00:00:00 2001 From: Adrian Con Date: Mon, 16 Mar 2026 13:44:13 +0100 Subject: [PATCH 09/42] Adding the INTEGRATION_SPEC.md file to the orchestrator branch. --- modules/orchestrator/INTEGRATION_SPEC.md | 590 +++++++++++++++++++++++ 1 file changed, 590 insertions(+) create mode 100644 modules/orchestrator/INTEGRATION_SPEC.md diff --git a/modules/orchestrator/INTEGRATION_SPEC.md b/modules/orchestrator/INTEGRATION_SPEC.md new file mode 100644 index 0000000..41ccf12 --- /dev/null +++ b/modules/orchestrator/INTEGRATION_SPEC.md @@ -0,0 +1,590 @@ +# OpenPolicyStack Integration Specification + +**File:** `INTEGRATION_SPEC.md` + +**Version:** 1.0 + +**Status:** Frozen Integration Contract + +--- + +# 1. Purpose + +This document defines the **integration contract governing all modules within the OpenPolicyStack environment**. + +The purpose of this specification is to ensure that independently developed analytical modules can be orchestrated in a **deterministic, reproducible, and traceable execution environment**. + +The integration layer standardizes conventions for: + +- service naming +- API interfaces +- port allocation +- environment configuration +- networking +- logging +- artifact management + +By enforcing these conventions, the OpenPolicyStack orchestrator can coordinate heterogeneous modules while preserving **run-level reproducibility, structured provenance, and auditability**. + +This document represents the **authoritative integration contract** against which all future integration decisions are evaluated. + +--- + +# 2. Architectural Context + +OpenPolicyStack implements a **centralized orchestration architecture** in which multiple analytical modules are coordinated by a single orchestrator service. + +Each module: + +- operates as an **independent microservice** +- exposes a **lightweight REST interface** +- runs inside a **Docker container** +- executes deterministically within a predefined workflow template. + +The orchestrator performs: + +- module sequencing +- metadata capture +- artifact tracking +- module version recording +- execution provenance reconstruction. + +The MVP system integrates six modules: + +- Data Layer +- DR Anticorruption +- Monitor +- Policy Simulator +- Strategy Agent +- Supply Chain Risk + +Modules are treated as **black-box services**, allowing their internal analytical logic to remain independent while the orchestration layer enforces system-level governance guarantees. + +--- + +# 3. Core Design Principles + +All integration decisions must preserve the following architectural principles. + +## Deterministic Execution + +Workflows follow predefined templates specifying module order and execution structure. + +## Reproducibility + +Identical inputs and module versions must produce structurally identical execution runs. + +## Traceability + +All module invocations, inputs, outputs, and artifacts must be recorded in structured metadata. + +## Modular Interoperability + +Modules interact exclusively through standardized HTTP interfaces. + +## Governance by Design + +Accountability and auditability are enforced at the orchestration layer rather than within module internals. + +--- + +# 4. Repository Structure + +All modules must follow the repository layout: + +``` +/opt/openpolicystack +│ +├── orchestrator +│ +├── modules +│ ├── data-layer +│ ├── dr-anticorruption +│ ├── monitor +│ ├── policy-simulator +│ ├── strategy-agent +│ └── supplychain-risk +│ +├── infrastructure +│ +├── artifacts +│ +└── compose.yaml +``` + +Each module must reside under: + +``` +modules/ +``` + +Module names must be unique within the system. + +--- + +# 5. Module Naming Convention + +All services use **lowercase kebab-case identifiers**. + +Examples: + +``` +data-layer +policy-simulator +strategy-agent +monitor +``` + +The module identifier must be used consistently across: + +- repository folder name +- Docker image name +- Docker Compose service name +- network hostname +- artifact directory +- log fields +- module metadata + +Example: + +``` +modules/policy-simulator +service: policy-simulator +image: openpolicystack/policy-simulator +hostname: policy-simulator +``` + +--- + +# 6. Containerization Requirements + +Every module must provide: + +``` +Dockerfile +.env.example +module.yaml +``` + +Modules must run as **standalone containers**. + +The container must start an HTTP service exposing the module API. + +--- + +# 7. Port Allocation Scheme + +### Internal container port + +All module APIs must listen on: + +``` +8080 +``` + +Optional auxiliary ports: + +``` +9090 metrics +8081 admin/debug +``` + +--- + +### Service communication + +Services communicate via Docker DNS using service names: + +``` +http://policy-simulator:8080 +http://data-layer:8080 +``` + +Modules must **not rely on static IP addresses**. + +--- + +### Host port exposure + +Host ports are used only for development or gateway access. + +Reserved host port ranges: + +| Module | Port Range | +| --- | --- | +| orchestrator | 8100–8109 | +| data-layer | 8200–8209 | +| strategy-agent | 8300–8309 | +| dr-anticorruption | 8400–8409 | +| supplychain-risk | 8500–8509 | +| policy-simulator | 8600–8609 | +| monitor | 8700–8709 | + +--- + +# 8. API Integration Contract + +Modules must expose a **REST API interface**. + +### Required endpoints + +``` +GET /health +GET /metadata +POST /execute +``` + +These endpoints form the minimal integration surface. + +--- + +# 8.1 Health Endpoint + +``` +GET /health +``` + +Response: + +```json +{ + "status": "ok", + "module_name": "policy-simulator", + "version": "0.1.0" +} +``` + +Used for orchestrator readiness checks. + +--- + +# 8.2 Metadata Endpoint + +``` +GET /metadata +``` + +Example response: + +```json +{ + "module_name": "policy-simulator", + "version": "0.1.0", + "api_version": "1.0", + "owner": "module_author", + "supported_tasks": [ + "simulate_policy" + ] +} +``` + +The orchestrator records module version metadata for reproducibility tracking. + +--- + +# 8.3 Execution Endpoint + +``` +POST /execute +``` + +Request payload: + +```json +{ + "run_id": "uuid", + "parameters": {}, + "inputs": [], + "metadata": {} +} +``` + +Response payload must include: + +```json +{ + "module_name": "policy-simulator", + "version": "0.1.0", + "status": "success", + "output": {}, + "artifacts": [] +} +``` + +Required response fields: + +- module_name +- version +- status +- output +- artifacts + +This structure allows the orchestrator to capture module-level provenance. + +--- + +# 9. Environment Variable Conventions + +Global environment variables use the prefix: + +``` +OPS_ +``` + +Required base variables: + +``` +OPS_ENV +OPS_MODULE_NAME +OPS_PORT +OPS_LOG_LEVEL +OPS_ARTIFACT_ROOT +OPS_ORCHESTRATOR_URL +``` + +Example: + +``` +OPS_MODULE_NAME=policy-simulator +OPS_PORT=8080 +OPS_ARTIFACT_ROOT=/var/openpolicystack/artifacts +``` + +--- + +### Module-specific variables + +Module-specific variables must use a namespace: + +``` +MODULEPREFIX__ +``` + +Example: + +``` +POLICY_SIMULATOR__MODEL_PATH +DATA_LAYER__DB_PATH +``` + +--- + +# 10. Networking Model + +The deployment uses two Docker networks. + +### Internal network + +``` +ops-core +``` + +All modules and the orchestrator join this network. + +--- + +### Edge network + +``` +ops-edge +``` + +Used only for services exposed externally. + +Service discovery uses DNS names: + +``` +http://module-name:8080 +``` + +--- + +# 11. Logging Convention + +All services log to: + +``` +stdout +stderr +``` + +Logs must follow **structured JSON format**. + +Example: + +```json +{ + "timestamp": "2026-03-12T11:20:31Z", + "level": "INFO", + "service": "policy-simulator", + "run_id": "uuid", + "event": "simulation_started", + "message": "Policy simulation initiated" +} +``` + +Required fields: + +- timestamp +- level +- service +- run_id +- event +- message + +--- + +# 12. Artifact Management + +Artifacts are stored in a shared volume mounted at: + +``` +/var/openpolicystack/artifacts +``` + +Run directory structure: + +``` +artifacts/ +└── runs/ + └── / + └── / + ├── inputs/ + ├── outputs/ + └── meta/ +``` + +Example: + +``` +artifacts/runs/abc123/policy-simulator/outputs/report.json +``` + +Large outputs must be stored as artifacts with references returned to the orchestrator. + +The orchestrator records: + +- artifact path +- artifact hash +- producing module +- associated run + +This enables integrity verification and reproducibility validation. + +--- + +# 13. Module Manifest + +Each module must provide: + +``` +modules//module.yaml +``` + +Example: + +```yaml +module_name: policy-simulator +version: 0.1.0 + +interface: + type: http + port: 8080 + execute: /execute + health: /health +``` + +This allows automated module discovery and integration validation. + +--- + +# 14. Minimum Compliance Checklist + +A module is considered **integration-ready** only if it satisfies: + +- containerized via Docker +- resides under `modules/` +- exposes `/health` +- exposes `/metadata` +- exposes `/execute` +- accepts `run_id` +- logs structured JSON +- returns module version +- stores outputs as artifacts +- includes `module.yaml` +- includes `.env.example` + +--- + +# 15. Scope Boundaries + +The integration specification intentionally excludes: + +- distributed orchestration platforms +- ontology harmonization across modules +- semantic schema alignment +- adaptive runtime planning +- parallel execution scheduling + +These elements are outside the MVP scope and may be explored in future research iterations. + +--- + +# 16. Governance Role of the Orchestrator + +Modules remain responsible for **analytical correctness**. + +The orchestrator is responsible for: + +- workflow coordination +- version capture +- metadata persistence +- artifact lineage tracking +- reproducibility guarantees +- execution trace reconstruction. + +This separation ensures governance guarantees are **architectural system properties rather than implementation details of individual modules**. + +--- + +# 17. Change Management + +This specification represents the **baseline integration contract**. + +Changes must follow: + +``` +proposal → review → version update +``` + +Breaking interface changes require: + +``` +spec version increment +``` + +--- + +# 18. Relationship to Deployment Architecture + +This specification is the **foundation for the deployment architecture**. + +The Docker Compose infrastructure will implement the conventions defined here, including: + +- service names +- network topology +- environment variable schema +- artifact mounts +- inter-service communication + +--- + +# 19. Next Integration Step + +After freezing this specification, the integration process proceeds with: + +1. Generate the **Compose deployment skeleton** +2. Integrate the **first pilot module** +3. Validate compliance +4. Refine integration procedures +5. Expand to the full module ecosystem \ No newline at end of file From 41111f7e0aba02b7ee64af3d6d3432a3f4acc29b Mon Sep 17 00:00:00 2001 From: Adrian Con Date: Mon, 16 Mar 2026 13:46:41 +0100 Subject: [PATCH 10/42] Added MODULE_INTEGRATION_GUIDE.md to provide comprehensive guidelines for integrating modules into the OpenPolicyStack orchestration system, including structure, API requirements, and testing procedures. --- .../orchestrator/MODULE_INTEGRATION_GUIDE.md | 495 ++++++++++++++++++ 1 file changed, 495 insertions(+) create mode 100644 modules/orchestrator/MODULE_INTEGRATION_GUIDE.md diff --git a/modules/orchestrator/MODULE_INTEGRATION_GUIDE.md b/modules/orchestrator/MODULE_INTEGRATION_GUIDE.md new file mode 100644 index 0000000..daff052 --- /dev/null +++ b/modules/orchestrator/MODULE_INTEGRATION_GUIDE.md @@ -0,0 +1,495 @@ +# OpenPolicyStack Module Integration Guide + +**File:** `MODULE_INTEGRATION_GUIDE.md` + +**Version:** 1.0 + +**Status:** Frozen Module Integration Guide + +--- + +# 1. Overview + +This guide explains how to prepare your module so that it can be integrated into the **OpenPolicyStack orchestration system**. + +All modules in OpenPolicyStack operate as **independent containerized microservices** coordinated by a central orchestrator. + +Your module will: + +- run inside a Docker container +- expose a lightweight REST API +- receive execution requests from the orchestrator +- return structured outputs and artifact references + +You **do not need to implement orchestration logic**. + +Your module simply exposes a consistent interface and performs its analysis. + +--- + +# 2. High-Level Integration Flow + +When OpenPolicyStack runs a workflow, the following occurs: + +1. The orchestrator generates a **run_id**. +2. The orchestrator calls your module via HTTP. +3. Your module performs its computation. +4. Your module returns structured JSON results. +5. Any large outputs are saved as artifacts. +6. The orchestrator records execution metadata. + +Your module remains **analytically independent** but participates in the **shared execution environment**. + +--- + +# 3. Module Repository Structure + +Your module must follow this directory structure: + +``` +modules// +│ +├── app/ +│ └── main.py +│ +├── Dockerfile +├── requirements.txt +├── module.yaml +├── .env.example +└── README.md +``` + +Example: + +``` +modules/policy-simulator/ +``` + +--- + +# 4. Naming Rules + +Module names must follow **lowercase kebab-case**. + +Correct: + +``` +policy-simulator +strategy-agent +supplychain-risk +data-layer +``` + +Incorrect: + +``` +PolicySimulator +policy_simulator +policySimulator +``` + +Your module name must match: + +- folder name +- Docker service name +- container hostname +- artifact directory name + +--- + +# 5. Docker Container Requirement + +Your module **must run inside a Docker container**. + +Minimal example Dockerfile: + +``` +FROM python:3.11-slim + +WORKDIR /app + +COPY requirements.txt . +RUN pip install -r requirements.txt + +COPY app/ app/ + +CMD ["python", "app/main.py"] +``` + +The container must start a web server exposing the module API. + +--- + +# 6. API Interface Requirements + +Your module must expose three HTTP endpoints. + +### Required endpoints + +``` +GET /health +GET /metadata +POST /execute +``` + +All endpoints must return JSON. + +--- + +# 6.1 Health Endpoint + +Used by the orchestrator to verify that your service is running. + +``` +GET /health +``` + +Example response: + +``` +{ + "status":"ok", + "module_name":"policy-simulator", + "version":"0.1.0" +} +``` + +--- + +# 6.2 Metadata Endpoint + +Provides module information used for orchestration and debugging. + +``` +GET /metadata +``` + +Example: + +``` +{ + "module_name":"policy-simulator", + "version":"0.1.0", + "supported_tasks": [ +"simulate_policy" + ] +} +``` + +--- + +# 6.3 Execute Endpoint + +This is the **main entry point** used by the orchestrator. + +``` +POST /execute +``` + +Example request: + +``` +{ + "run_id":"123e4567", + "parameters": { + "country":"DO" + }, + "inputs": [], + "metadata": {} +} +``` + +Example response: + +``` +{ + "module_name":"policy-simulator", + "version":"0.1.0", + "status":"success", + "output": { + "risk_score":0.41 + }, + "artifacts": [] +} +``` + +Required response fields: + +| Field | Description | +| --- | --- | +| module_name | name of the module | +| version | module version | +| status | success or failure | +| output | structured JSON result | +| artifacts | list of artifact references | + +--- + +# 7. Service Port + +Your API must listen on: + +``` +8080 +``` + +Example FastAPI server: + +``` +uvicorn main:app --host 0.0.0.0 --port 8080 +``` + +--- + +# 8. Environment Variables + +All modules use environment variables prefixed with: + +``` +OPS_ +``` + +Example variables: + +``` +OPS_MODULE_NAME=policy-simulator +OPS_PORT=8080 +OPS_ARTIFACT_ROOT=/var/openpolicystack/artifacts +OPS_LOG_LEVEL=INFO +``` + +You must include a `.env.example` file documenting required variables. + +--- + +# 9. Artifact Storage + +Large outputs should be stored as **artifacts**. + +Artifacts are saved in the shared directory: + +``` +/var/openpolicystack/artifacts +``` + +Directory structure: + +``` +artifacts/ +└── runs/ + └── / + └── / + ├── inputs/ + ├── outputs/ + └── meta/ +``` + +Example artifact: + +``` +artifacts/runs/abc123/policy-simulator/outputs/report.json +``` + +When returning artifacts in your response: + +``` +{ + "artifacts": [ + { + "name":"simulation_report", + "path":"artifacts/runs/abc123/policy-simulator/outputs/report.json" + } + ] +} +``` + +--- + +# 10. Logging + +Modules must log to: + +``` +stdout +stderr +``` + +Use structured JSON logs. + +Example: + +``` +{ + "timestamp":"2026-03-12T11:20:31Z", + "level":"INFO", + "service":"policy-simulator", + "run_id":"abc123", + "event":"simulation_started", + "message":"Policy simulation initiated" +} +``` + +--- + +# 11. Module Manifest + +Each module must include a `module.yaml`. + +Example: + +``` +module_name: policy-simulator +version: 0.1.0 + +interface: + type: http + port: 8080 + health: /health + metadata: /metadata + execute: /execute +``` + +This allows automated module discovery. + +--- + +# 12. Example Minimal FastAPI Module + +Example `main.py`: + +``` +fromfastapiimportFastAPI +frompydanticimportBaseModel + +app=FastAPI() + +classExecuteRequest(BaseModel): +run_id:str +parameters:dict= {} +inputs:list= [] +metadata:dict= {} + +@app.get("/health") +defhealth(): +return { +"status":"ok", +"module_name":"example-module", +"version":"0.1.0" + } + +@app.get("/metadata") +defmetadata(): +return { +"module_name":"example-module", +"version":"0.1.0" + } + +@app.post("/execute") +defexecute(req:ExecuteRequest): +return { +"module_name":"example-module", +"version":"0.1.0", +"status":"success", +"output": {"example":True}, +"artifacts": [] + } +``` + +--- + +# 13. Local Testing + +You can test your module locally before integration. + +Start your module: + +``` +docker build -t openpolicystack/example-module . +docker run -p 8080:8080 openpolicystack/example-module +``` + +Test endpoint: + +``` +curl http://localhost:8080/health +``` + +--- + +# 14. Integration Checklist + +Before submitting your module for integration, verify: + +✔ Module resides in `modules/` + +✔ Dockerfile builds successfully + +✔ API listens on port `8080` + +✔ `/health` endpoint works + +✔ `/metadata` endpoint works + +✔ `/execute` endpoint works + +✔ module returns JSON response + +✔ module returns version field + +✔ `.env.example` included + +✔ `module.yaml` included + +✔ artifacts stored in correct directory + +--- + +# 15. Common Mistakes + +Avoid these common integration issues: + +**Incorrect port** + +``` +5000 +3000 +``` + +Correct: + +``` +8080 +``` + +--- + +**Using localhost for service calls** + +Wrong: + +``` +http://localhost:8080 +``` + +Correct: + +``` +http://data-layer:8080 +``` + +--- + +**Returning non-JSON responses** + +All API responses must be JSON. + +--- + +# 16. Need Help? + +If your module fails integration: + +1. Check `docker logs` +2. Verify `/health` endpoint +3. Confirm port `8080` +4. Validate JSON responses \ No newline at end of file From a4cd73c2a74e747551348afc768eb15f8e5846d5 Mon Sep 17 00:00:00 2001 From: Adrian Con Date: Mon, 16 Mar 2026 13:49:26 +0100 Subject: [PATCH 11/42] Add COMPOSE_DERIVATION_MATRIX.md to outline the service composition and integration details for the OpenPolicyStack orchestration system, including service names, build contexts, image names, ports, networks, volumes, environment sources, health checks, and roles. --- .../orchestrator/COMPOSE_DERIVATION_MATRIX.md | 21 +++++++++++++++++++ 1 file changed, 21 insertions(+) create mode 100644 modules/orchestrator/COMPOSE_DERIVATION_MATRIX.md diff --git a/modules/orchestrator/COMPOSE_DERIVATION_MATRIX.md b/modules/orchestrator/COMPOSE_DERIVATION_MATRIX.md new file mode 100644 index 0000000..67f6270 --- /dev/null +++ b/modules/orchestrator/COMPOSE_DERIVATION_MATRIX.md @@ -0,0 +1,21 @@ +# OpenPolicyStack Compose Derivation Matrix + +**File:** `COMPOSE_DERIVATION_MATRIX.md` + +**Version:** 1.0 + +**Status:** Compose Derivation Matrix - MVP Draft + +--- + +| Service Name | Build Context | Image Name | Internal Port | Host Port | Networks | Volumes | Env Source | Healthcheck | Role / Notes | +| ------------------- | ----------------------------- | --------------------------------------- | ------------- | --------- | ---------- | -------------------------------------------------------------------------------------------- | ---------------------------------- | ------------- | ----------------------------------------------------------------------------------------- | +| `orchestrator` | `./orchestrator` | `openpolicystack/orchestrator:dev` | `8080` | `8100` | `ops-core` | `ops-artifacts:/var/openpolicystack/artifacts`, `ops-metadata:/var/openpolicystack/metadata` | `./orchestrator/.env` | `GET /health` | Central coordination service; invokes modules, records run metadata, persists SQLite data | +| `data-layer` | `./modules/data-layer` | `openpolicystack/data-layer:dev` | `8080` | — | `ops-core` | `ops-artifacts:/var/openpolicystack/artifacts` | `./modules/data-layer/.env` | `GET /health` | Provides data access / preprocessing service to workflows | +| `dr-anticorruption` | `./modules/dr-anticorruption` | `openpolicystack/dr-anticorruption:dev` | `8080` | — | `ops-core` | `ops-artifacts:/var/openpolicystack/artifacts` | `./modules/dr-anticorruption/.env` | `GET /health` | Domain analysis module integrated as black-box HTTP service | +| `monitor` | `./modules/monitor` | `openpolicystack/monitor:dev` | `8080` | — | `ops-core` | `ops-artifacts:/var/openpolicystack/artifacts` | `./modules/monitor/.env` | `GET /health` | Monitoring / tracking module integrated through standard module contract | +| `policy-simulator` | `./modules/policy-simulator` | `openpolicystack/policy-simulator:dev` | `8080` | — | `ops-core` | `ops-artifacts:/var/openpolicystack/artifacts` | `./modules/policy-simulator/.env` | `GET /health` | Simulation module invoked by deterministic workflow templates | +| `strategy-agent` | `./modules/strategy-agent` | `openpolicystack/strategy-agent:dev` | `8080` | — | `ops-core` | `ops-artifacts:/var/openpolicystack/artifacts` | `./modules/strategy-agent/.env` | `GET /health` | Strategy support module exposed as standardized REST service | +| `supplychain-risk` | `./modules/supplychain-risk` | `openpolicystack/supplychain-risk:dev` | `8080` | — | `ops-core` | `ops-artifacts:/var/openpolicystack/artifacts` | `./modules/supplychain-risk/.env` | `GET /health` | Risk analysis module integrated as independent microservice | + + From f7869daf7c6a1c84665c0a993dca27460d19e5e0 Mon Sep 17 00:00:00 2001 From: Adrian Con Date: Mon, 16 Mar 2026 13:50:38 +0100 Subject: [PATCH 12/42] Add compose.yaml file to define the orchestration services and their configurations for the OpenPolicyStack. --- modules/orchestrator/compose.yaml | 194 ++++++++++++++++++++++++++++++ 1 file changed, 194 insertions(+) create mode 100644 modules/orchestrator/compose.yaml diff --git a/modules/orchestrator/compose.yaml b/modules/orchestrator/compose.yaml new file mode 100644 index 0000000..fbfb95d --- /dev/null +++ b/modules/orchestrator/compose.yaml @@ -0,0 +1,194 @@ +name: openpolicystack-mvp + +services: + orchestrator: + build: + context: ./orchestrator + dockerfile: Dockerfile + image: openpolicystack/orchestrator:dev + env_file: + - ./orchestrator/.env + environment: + OPS_ENV: dev + OPS_MODULE_NAME: orchestrator + OPS_PORT: 8080 + OPS_LOG_LEVEL: INFO + OPS_ARTIFACT_ROOT: /var/openpolicystack/artifacts + ORCHESTRATOR__SQLITE_PATH: /var/openpolicystack/metadata/orchestrator.db + ORCHESTRATOR__DATA_LAYER_URL: http://data-layer:8080 + ORCHESTRATOR__DR_ANTICORRUPTION_URL: http://dr-anticorruption:8080 + ORCHESTRATOR__MONITOR_URL: http://monitor:8080 + ORCHESTRATOR__POLICY_SIMULATOR_URL: http://policy-simulator:8080 + ORCHESTRATOR__STRATEGY_AGENT_URL: http://strategy-agent:8080 + ORCHESTRATOR__SUPPLYCHAIN_RISK_URL: http://supplychain-risk:8080 + ports: + - "8100:8080" + volumes: + - ops-artifacts:/var/openpolicystack/artifacts + - ops-metadata:/var/openpolicystack/metadata + networks: + - ops-core + healthcheck: + test: ["CMD-SHELL", "curl -fsS http://localhost:8080/health || exit 1"] + interval: 15s + timeout: 5s + retries: 5 + start_period: 20s + + data-layer: + build: + context: ./modules/data-layer + dockerfile: Dockerfile + image: openpolicystack/data-layer:dev + env_file: + - ./modules/data-layer/.env + environment: + OPS_ENV: dev + OPS_MODULE_NAME: data-layer + OPS_PORT: 8080 + OPS_LOG_LEVEL: INFO + OPS_ARTIFACT_ROOT: /var/openpolicystack/artifacts + OPS_ORCHESTRATOR_URL: http://orchestrator:8080 + volumes: + - ops-artifacts:/var/openpolicystack/artifacts + networks: + - ops-core + healthcheck: + test: ["CMD-SHELL", "curl -fsS http://localhost:8080/health || exit 1"] + interval: 15s + timeout: 5s + retries: 5 + start_period: 20s + + dr-anticorruption: + build: + context: ./modules/dr-anticorruption + dockerfile: Dockerfile + image: openpolicystack/dr-anticorruption:dev + env_file: + - ./modules/dr-anticorruption/.env + environment: + OPS_ENV: dev + OPS_MODULE_NAME: dr-anticorruption + OPS_PORT: 8080 + OPS_LOG_LEVEL: INFO + OPS_ARTIFACT_ROOT: /var/openpolicystack/artifacts + OPS_ORCHESTRATOR_URL: http://orchestrator:8080 + volumes: + - ops-artifacts:/var/openpolicystack/artifacts + networks: + - ops-core + healthcheck: + test: ["CMD-SHELL", "curl -fsS http://localhost:8080/health || exit 1"] + interval: 15s + timeout: 5s + retries: 5 + start_period: 20s + + monitor: + build: + context: ./modules/monitor + dockerfile: Dockerfile + image: openpolicystack/monitor:dev + env_file: + - ./modules/monitor/.env + environment: + OPS_ENV: dev + OPS_MODULE_NAME: monitor + OPS_PORT: 8080 + OPS_LOG_LEVEL: INFO + OPS_ARTIFACT_ROOT: /var/openpolicystack/artifacts + OPS_ORCHESTRATOR_URL: http://orchestrator:8080 + volumes: + - ops-artifacts:/var/openpolicystack/artifacts + networks: + - ops-core + healthcheck: + test: ["CMD-SHELL", "curl -fsS http://localhost:8080/health || exit 1"] + interval: 15s + timeout: 5s + retries: 5 + start_period: 20s + + policy-simulator: + build: + context: ./modules/policy-simulator + dockerfile: Dockerfile + image: openpolicystack/policy-simulator:dev + env_file: + - ./modules/policy-simulator/.env + environment: + OPS_ENV: dev + OPS_MODULE_NAME: policy-simulator + OPS_PORT: 8080 + OPS_LOG_LEVEL: INFO + OPS_ARTIFACT_ROOT: /var/openpolicystack/artifacts + OPS_ORCHESTRATOR_URL: http://orchestrator:8080 + volumes: + - ops-artifacts:/var/openpolicystack/artifacts + networks: + - ops-core + healthcheck: + test: ["CMD-SHELL", "curl -fsS http://localhost:8080/health || exit 1"] + interval: 15s + timeout: 5s + retries: 5 + start_period: 20s + + strategy-agent: + build: + context: ./modules/strategy-agent + dockerfile: Dockerfile + image: openpolicystack/strategy-agent:dev + env_file: + - ./modules/strategy-agent/.env + environment: + OPS_ENV: dev + OPS_MODULE_NAME: strategy-agent + OPS_PORT: 8080 + OPS_LOG_LEVEL: INFO + OPS_ARTIFACT_ROOT: /var/openpolicystack/artifacts + OPS_ORCHESTRATOR_URL: http://orchestrator:8080 + volumes: + - ops-artifacts:/var/openpolicystack/artifacts + networks: + - ops-core + healthcheck: + test: ["CMD-SHELL", "curl -fsS http://localhost:8080/health || exit 1"] + interval: 15s + timeout: 5s + retries: 5 + start_period: 20s + + supplychain-risk: + build: + context: ./modules/supplychain-risk + dockerfile: Dockerfile + image: openpolicystack/supplychain-risk:dev + env_file: + - ./modules/supplychain-risk/.env + environment: + OPS_ENV: dev + OPS_MODULE_NAME: supplychain-risk + OPS_PORT: 8080 + OPS_LOG_LEVEL: INFO + OPS_ARTIFACT_ROOT: /var/openpolicystack/artifacts + OPS_ORCHESTRATOR_URL: http://orchestrator:8080 + volumes: + - ops-artifacts:/var/openpolicystack/artifacts + networks: + - ops-core + healthcheck: + test: ["CMD-SHELL", "curl -fsS http://localhost:8080/health || exit 1"] + interval: 15s + timeout: 5s + retries: 5 + start_period: 20s + +networks: + ops-core: + driver: bridge + +volumes: + ops-artifacts: + ops-metadata: \ No newline at end of file From f468e850e5532deaeb265da4db60c4de45a90c6a Mon Sep 17 00:00:00 2001 From: Adrian Con Date: Mon, 16 Mar 2026 13:54:04 +0100 Subject: [PATCH 13/42] Created a new folder at project root to have a clear deployment path, where I added the compose.yaml file. --- deploy/compose.yaml | 194 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 194 insertions(+) create mode 100644 deploy/compose.yaml diff --git a/deploy/compose.yaml b/deploy/compose.yaml new file mode 100644 index 0000000..fbfb95d --- /dev/null +++ b/deploy/compose.yaml @@ -0,0 +1,194 @@ +name: openpolicystack-mvp + +services: + orchestrator: + build: + context: ./orchestrator + dockerfile: Dockerfile + image: openpolicystack/orchestrator:dev + env_file: + - ./orchestrator/.env + environment: + OPS_ENV: dev + OPS_MODULE_NAME: orchestrator + OPS_PORT: 8080 + OPS_LOG_LEVEL: INFO + OPS_ARTIFACT_ROOT: /var/openpolicystack/artifacts + ORCHESTRATOR__SQLITE_PATH: /var/openpolicystack/metadata/orchestrator.db + ORCHESTRATOR__DATA_LAYER_URL: http://data-layer:8080 + ORCHESTRATOR__DR_ANTICORRUPTION_URL: http://dr-anticorruption:8080 + ORCHESTRATOR__MONITOR_URL: http://monitor:8080 + ORCHESTRATOR__POLICY_SIMULATOR_URL: http://policy-simulator:8080 + ORCHESTRATOR__STRATEGY_AGENT_URL: http://strategy-agent:8080 + ORCHESTRATOR__SUPPLYCHAIN_RISK_URL: http://supplychain-risk:8080 + ports: + - "8100:8080" + volumes: + - ops-artifacts:/var/openpolicystack/artifacts + - ops-metadata:/var/openpolicystack/metadata + networks: + - ops-core + healthcheck: + test: ["CMD-SHELL", "curl -fsS http://localhost:8080/health || exit 1"] + interval: 15s + timeout: 5s + retries: 5 + start_period: 20s + + data-layer: + build: + context: ./modules/data-layer + dockerfile: Dockerfile + image: openpolicystack/data-layer:dev + env_file: + - ./modules/data-layer/.env + environment: + OPS_ENV: dev + OPS_MODULE_NAME: data-layer + OPS_PORT: 8080 + OPS_LOG_LEVEL: INFO + OPS_ARTIFACT_ROOT: /var/openpolicystack/artifacts + OPS_ORCHESTRATOR_URL: http://orchestrator:8080 + volumes: + - ops-artifacts:/var/openpolicystack/artifacts + networks: + - ops-core + healthcheck: + test: ["CMD-SHELL", "curl -fsS http://localhost:8080/health || exit 1"] + interval: 15s + timeout: 5s + retries: 5 + start_period: 20s + + dr-anticorruption: + build: + context: ./modules/dr-anticorruption + dockerfile: Dockerfile + image: openpolicystack/dr-anticorruption:dev + env_file: + - ./modules/dr-anticorruption/.env + environment: + OPS_ENV: dev + OPS_MODULE_NAME: dr-anticorruption + OPS_PORT: 8080 + OPS_LOG_LEVEL: INFO + OPS_ARTIFACT_ROOT: /var/openpolicystack/artifacts + OPS_ORCHESTRATOR_URL: http://orchestrator:8080 + volumes: + - ops-artifacts:/var/openpolicystack/artifacts + networks: + - ops-core + healthcheck: + test: ["CMD-SHELL", "curl -fsS http://localhost:8080/health || exit 1"] + interval: 15s + timeout: 5s + retries: 5 + start_period: 20s + + monitor: + build: + context: ./modules/monitor + dockerfile: Dockerfile + image: openpolicystack/monitor:dev + env_file: + - ./modules/monitor/.env + environment: + OPS_ENV: dev + OPS_MODULE_NAME: monitor + OPS_PORT: 8080 + OPS_LOG_LEVEL: INFO + OPS_ARTIFACT_ROOT: /var/openpolicystack/artifacts + OPS_ORCHESTRATOR_URL: http://orchestrator:8080 + volumes: + - ops-artifacts:/var/openpolicystack/artifacts + networks: + - ops-core + healthcheck: + test: ["CMD-SHELL", "curl -fsS http://localhost:8080/health || exit 1"] + interval: 15s + timeout: 5s + retries: 5 + start_period: 20s + + policy-simulator: + build: + context: ./modules/policy-simulator + dockerfile: Dockerfile + image: openpolicystack/policy-simulator:dev + env_file: + - ./modules/policy-simulator/.env + environment: + OPS_ENV: dev + OPS_MODULE_NAME: policy-simulator + OPS_PORT: 8080 + OPS_LOG_LEVEL: INFO + OPS_ARTIFACT_ROOT: /var/openpolicystack/artifacts + OPS_ORCHESTRATOR_URL: http://orchestrator:8080 + volumes: + - ops-artifacts:/var/openpolicystack/artifacts + networks: + - ops-core + healthcheck: + test: ["CMD-SHELL", "curl -fsS http://localhost:8080/health || exit 1"] + interval: 15s + timeout: 5s + retries: 5 + start_period: 20s + + strategy-agent: + build: + context: ./modules/strategy-agent + dockerfile: Dockerfile + image: openpolicystack/strategy-agent:dev + env_file: + - ./modules/strategy-agent/.env + environment: + OPS_ENV: dev + OPS_MODULE_NAME: strategy-agent + OPS_PORT: 8080 + OPS_LOG_LEVEL: INFO + OPS_ARTIFACT_ROOT: /var/openpolicystack/artifacts + OPS_ORCHESTRATOR_URL: http://orchestrator:8080 + volumes: + - ops-artifacts:/var/openpolicystack/artifacts + networks: + - ops-core + healthcheck: + test: ["CMD-SHELL", "curl -fsS http://localhost:8080/health || exit 1"] + interval: 15s + timeout: 5s + retries: 5 + start_period: 20s + + supplychain-risk: + build: + context: ./modules/supplychain-risk + dockerfile: Dockerfile + image: openpolicystack/supplychain-risk:dev + env_file: + - ./modules/supplychain-risk/.env + environment: + OPS_ENV: dev + OPS_MODULE_NAME: supplychain-risk + OPS_PORT: 8080 + OPS_LOG_LEVEL: INFO + OPS_ARTIFACT_ROOT: /var/openpolicystack/artifacts + OPS_ORCHESTRATOR_URL: http://orchestrator:8080 + volumes: + - ops-artifacts:/var/openpolicystack/artifacts + networks: + - ops-core + healthcheck: + test: ["CMD-SHELL", "curl -fsS http://localhost:8080/health || exit 1"] + interval: 15s + timeout: 5s + retries: 5 + start_period: 20s + +networks: + ops-core: + driver: bridge + +volumes: + ops-artifacts: + ops-metadata: \ No newline at end of file From 5513e6da768cf3360c5b57565716454871e68633 Mon Sep 17 00:00:00 2001 From: Adrian Con Date: Mon, 16 Mar 2026 13:54:20 +0100 Subject: [PATCH 14/42] Moved file to the folder Deploy at root. --- modules/orchestrator/compose.yaml | 194 ------------------------------ 1 file changed, 194 deletions(-) delete mode 100644 modules/orchestrator/compose.yaml diff --git a/modules/orchestrator/compose.yaml b/modules/orchestrator/compose.yaml deleted file mode 100644 index fbfb95d..0000000 --- a/modules/orchestrator/compose.yaml +++ /dev/null @@ -1,194 +0,0 @@ -name: openpolicystack-mvp - -services: - orchestrator: - build: - context: ./orchestrator - dockerfile: Dockerfile - image: openpolicystack/orchestrator:dev - env_file: - - ./orchestrator/.env - environment: - OPS_ENV: dev - OPS_MODULE_NAME: orchestrator - OPS_PORT: 8080 - OPS_LOG_LEVEL: INFO - OPS_ARTIFACT_ROOT: /var/openpolicystack/artifacts - ORCHESTRATOR__SQLITE_PATH: /var/openpolicystack/metadata/orchestrator.db - ORCHESTRATOR__DATA_LAYER_URL: http://data-layer:8080 - ORCHESTRATOR__DR_ANTICORRUPTION_URL: http://dr-anticorruption:8080 - ORCHESTRATOR__MONITOR_URL: http://monitor:8080 - ORCHESTRATOR__POLICY_SIMULATOR_URL: http://policy-simulator:8080 - ORCHESTRATOR__STRATEGY_AGENT_URL: http://strategy-agent:8080 - ORCHESTRATOR__SUPPLYCHAIN_RISK_URL: http://supplychain-risk:8080 - ports: - - "8100:8080" - volumes: - - ops-artifacts:/var/openpolicystack/artifacts - - ops-metadata:/var/openpolicystack/metadata - networks: - - ops-core - healthcheck: - test: ["CMD-SHELL", "curl -fsS http://localhost:8080/health || exit 1"] - interval: 15s - timeout: 5s - retries: 5 - start_period: 20s - - data-layer: - build: - context: ./modules/data-layer - dockerfile: Dockerfile - image: openpolicystack/data-layer:dev - env_file: - - ./modules/data-layer/.env - environment: - OPS_ENV: dev - OPS_MODULE_NAME: data-layer - OPS_PORT: 8080 - OPS_LOG_LEVEL: INFO - OPS_ARTIFACT_ROOT: /var/openpolicystack/artifacts - OPS_ORCHESTRATOR_URL: http://orchestrator:8080 - volumes: - - ops-artifacts:/var/openpolicystack/artifacts - networks: - - ops-core - healthcheck: - test: ["CMD-SHELL", "curl -fsS http://localhost:8080/health || exit 1"] - interval: 15s - timeout: 5s - retries: 5 - start_period: 20s - - dr-anticorruption: - build: - context: ./modules/dr-anticorruption - dockerfile: Dockerfile - image: openpolicystack/dr-anticorruption:dev - env_file: - - ./modules/dr-anticorruption/.env - environment: - OPS_ENV: dev - OPS_MODULE_NAME: dr-anticorruption - OPS_PORT: 8080 - OPS_LOG_LEVEL: INFO - OPS_ARTIFACT_ROOT: /var/openpolicystack/artifacts - OPS_ORCHESTRATOR_URL: http://orchestrator:8080 - volumes: - - ops-artifacts:/var/openpolicystack/artifacts - networks: - - ops-core - healthcheck: - test: ["CMD-SHELL", "curl -fsS http://localhost:8080/health || exit 1"] - interval: 15s - timeout: 5s - retries: 5 - start_period: 20s - - monitor: - build: - context: ./modules/monitor - dockerfile: Dockerfile - image: openpolicystack/monitor:dev - env_file: - - ./modules/monitor/.env - environment: - OPS_ENV: dev - OPS_MODULE_NAME: monitor - OPS_PORT: 8080 - OPS_LOG_LEVEL: INFO - OPS_ARTIFACT_ROOT: /var/openpolicystack/artifacts - OPS_ORCHESTRATOR_URL: http://orchestrator:8080 - volumes: - - ops-artifacts:/var/openpolicystack/artifacts - networks: - - ops-core - healthcheck: - test: ["CMD-SHELL", "curl -fsS http://localhost:8080/health || exit 1"] - interval: 15s - timeout: 5s - retries: 5 - start_period: 20s - - policy-simulator: - build: - context: ./modules/policy-simulator - dockerfile: Dockerfile - image: openpolicystack/policy-simulator:dev - env_file: - - ./modules/policy-simulator/.env - environment: - OPS_ENV: dev - OPS_MODULE_NAME: policy-simulator - OPS_PORT: 8080 - OPS_LOG_LEVEL: INFO - OPS_ARTIFACT_ROOT: /var/openpolicystack/artifacts - OPS_ORCHESTRATOR_URL: http://orchestrator:8080 - volumes: - - ops-artifacts:/var/openpolicystack/artifacts - networks: - - ops-core - healthcheck: - test: ["CMD-SHELL", "curl -fsS http://localhost:8080/health || exit 1"] - interval: 15s - timeout: 5s - retries: 5 - start_period: 20s - - strategy-agent: - build: - context: ./modules/strategy-agent - dockerfile: Dockerfile - image: openpolicystack/strategy-agent:dev - env_file: - - ./modules/strategy-agent/.env - environment: - OPS_ENV: dev - OPS_MODULE_NAME: strategy-agent - OPS_PORT: 8080 - OPS_LOG_LEVEL: INFO - OPS_ARTIFACT_ROOT: /var/openpolicystack/artifacts - OPS_ORCHESTRATOR_URL: http://orchestrator:8080 - volumes: - - ops-artifacts:/var/openpolicystack/artifacts - networks: - - ops-core - healthcheck: - test: ["CMD-SHELL", "curl -fsS http://localhost:8080/health || exit 1"] - interval: 15s - timeout: 5s - retries: 5 - start_period: 20s - - supplychain-risk: - build: - context: ./modules/supplychain-risk - dockerfile: Dockerfile - image: openpolicystack/supplychain-risk:dev - env_file: - - ./modules/supplychain-risk/.env - environment: - OPS_ENV: dev - OPS_MODULE_NAME: supplychain-risk - OPS_PORT: 8080 - OPS_LOG_LEVEL: INFO - OPS_ARTIFACT_ROOT: /var/openpolicystack/artifacts - OPS_ORCHESTRATOR_URL: http://orchestrator:8080 - volumes: - - ops-artifacts:/var/openpolicystack/artifacts - networks: - - ops-core - healthcheck: - test: ["CMD-SHELL", "curl -fsS http://localhost:8080/health || exit 1"] - interval: 15s - timeout: 5s - retries: 5 - start_period: 20s - -networks: - ops-core: - driver: bridge - -volumes: - ops-artifacts: - ops-metadata: \ No newline at end of file From aac612f658a58d35877c876183e0286df74424f6 Mon Sep 17 00:00:00 2001 From: Adrian Con Date: Mon, 16 Mar 2026 16:31:06 +0100 Subject: [PATCH 15/42] Moved the compose.yaml file to the root of the repo. --- deploy/compose.yaml => compose.yaml | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename deploy/compose.yaml => compose.yaml (100%) diff --git a/deploy/compose.yaml b/compose.yaml similarity index 100% rename from deploy/compose.yaml rename to compose.yaml From 3da64b266e61a8b61302b8056d6ee6f3a38bdb04 Mon Sep 17 00:00:00 2001 From: Adrian Con Date: Mon, 16 Mar 2026 16:48:36 +0100 Subject: [PATCH 16/42] Renamed the file to have a full intended architecture --- compose.yaml => deploy/compose.target-skeleton.yaml | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename compose.yaml => deploy/compose.target-skeleton.yaml (100%) diff --git a/compose.yaml b/deploy/compose.target-skeleton.yaml similarity index 100% rename from compose.yaml rename to deploy/compose.target-skeleton.yaml From 3bd3d1a359a14f6f82e91c5447c2ac6a4d32beb5 Mon Sep 17 00:00:00 2001 From: Adrian Con Date: Mon, 16 Mar 2026 16:57:12 +0100 Subject: [PATCH 17/42] Added the yaml file for the active minimal runnable stack (for the integration-pilot module). --- compose.yaml | 0 1 file changed, 0 insertions(+), 0 deletions(-) create mode 100644 compose.yaml diff --git a/compose.yaml b/compose.yaml new file mode 100644 index 0000000..e69de29 From 6070fc28076dd594a0677c236fc3450af123143d Mon Sep 17 00:00:00 2001 From: Adrian Con Date: Mon, 16 Mar 2026 17:00:08 +0100 Subject: [PATCH 18/42] Added the the actual minimal root compose.yaml for the integration-pilot module. --- compose.yaml | 81 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 81 insertions(+) diff --git a/compose.yaml b/compose.yaml index e69de29..546185e 100644 --- a/compose.yaml +++ b/compose.yaml @@ -0,0 +1,81 @@ +name: openpolicystack-pilot + +services: + orchestrator: + build: + context: ./modules/orchestrator + dockerfile: Dockerfile + image: openpolicystack/orchestrator:dev + env_file: + - ./modules/orchestrator/.env + environment: + OPS_ENV: dev + OPS_MODULE_NAME: orchestrator + OPS_PORT: 8080 + OPS_LOG_LEVEL: INFO + OPS_ARTIFACT_ROOT: /var/openpolicystack/artifacts + ORCHESTRATOR__SQLITE_PATH: /var/openpolicystack/metadata/orchestrator.db + ORCHESTRATOR__INTEGRATION_PILOT_URL: http://integration-pilot:8080 + ports: + - "8100:8080" + volumes: + - ops-artifacts:/var/openpolicystack/artifacts + - ops-metadata:/var/openpolicystack/metadata + networks: + - ops-core + depends_on: + - integration-pilot + healthcheck: + test: + [ + "CMD", + "python", + "-c", + "import urllib.request; urllib.request.urlopen('http://127.0.0.1:8080/health')" + ] + interval: 15s + timeout: 5s + retries: 5 + start_period: 20s + + integration-pilot: + build: + context: ./modules/integration-pilot + dockerfile: Dockerfile + image: openpolicystack/integration-pilot:dev + env_file: + - ./modules/integration-pilot/.env + environment: + OPS_ENV: dev + OPS_MODULE_NAME: integration-pilot + OPS_PORT: 8080 + OPS_LOG_LEVEL: INFO + OPS_ARTIFACT_ROOT: /var/openpolicystack/artifacts + OPS_ORCHESTRATOR_URL: http://orchestrator:8080 + PILOT_MODULE_VERSION: 0.1.0 + ports: + - "8101:8080" + volumes: + - ops-artifacts:/var/openpolicystack/artifacts + networks: + - ops-core + healthcheck: + test: + [ + "CMD", + "python", + "-c", + "import urllib.request; urllib.request.urlopen('http://127.0.0.1:8080/health')" + ] + interval: 15s + timeout: 5s + retries: 5 + start_period: 20s + +networks: + ops-core: + driver: bridge + +volumes: + ops-artifacts: + ops-metadata: \ No newline at end of file From c65860e6455b4a92bcfefe04ff4228767c4aa8af Mon Sep 17 00:00:00 2001 From: Adrian Con Date: Mon, 16 Mar 2026 17:14:59 +0100 Subject: [PATCH 19/42] Added files for requirements.txt and Dockerfile for the orchestrator module. --- modules/orchestrator/Dockerfile | 0 modules/orchestrator/requirements.txt | 0 2 files changed, 0 insertions(+), 0 deletions(-) create mode 100644 modules/orchestrator/Dockerfile create mode 100644 modules/orchestrator/requirements.txt diff --git a/modules/orchestrator/Dockerfile b/modules/orchestrator/Dockerfile new file mode 100644 index 0000000..e69de29 diff --git a/modules/orchestrator/requirements.txt b/modules/orchestrator/requirements.txt new file mode 100644 index 0000000..e69de29 From 7a6237000c25be9e64cfbbc207f18746b16cf6cd Mon Sep 17 00:00:00 2001 From: Adrian Con Date: Mon, 16 Mar 2026 17:22:46 +0100 Subject: [PATCH 20/42] Added files requirements.txt and Dockerfile to the integration-pilot module. --- modules/integration-pilot/Dockerfile | 0 modules/integration-pilot/requirements.txt | 0 2 files changed, 0 insertions(+), 0 deletions(-) create mode 100644 modules/integration-pilot/Dockerfile create mode 100644 modules/integration-pilot/requirements.txt diff --git a/modules/integration-pilot/Dockerfile b/modules/integration-pilot/Dockerfile new file mode 100644 index 0000000..e69de29 diff --git a/modules/integration-pilot/requirements.txt b/modules/integration-pilot/requirements.txt new file mode 100644 index 0000000..e69de29 From 363e739774d19fcbc49a7eb6e33ae3395c43bd5c Mon Sep 17 00:00:00 2001 From: Adrian Con Date: Mon, 16 Mar 2026 17:26:11 +0100 Subject: [PATCH 21/42] Add pilot runnable stack with orchestrator and integration-pilot --- modules/integration-pilot/Dockerfile | 15 ++ modules/integration-pilot/app/__init__.py | 0 modules/integration-pilot/app/main.py | 70 +++++++++ modules/integration-pilot/requirements.txt | 2 + modules/orchestrator/Dockerfile | 15 ++ modules/orchestrator/app/__init__.py | 0 modules/orchestrator/main.py | 160 +++++++++++++++++++++ modules/orchestrator/requirements.txt | 3 + 8 files changed, 265 insertions(+) create mode 100644 modules/integration-pilot/app/__init__.py create mode 100644 modules/integration-pilot/app/main.py create mode 100644 modules/orchestrator/app/__init__.py create mode 100644 modules/orchestrator/main.py diff --git a/modules/integration-pilot/Dockerfile b/modules/integration-pilot/Dockerfile index e69de29..7b7d624 100644 --- a/modules/integration-pilot/Dockerfile +++ b/modules/integration-pilot/Dockerfile @@ -0,0 +1,15 @@ +FROM python:3.11-slim + +WORKDIR /app + +ENV PYTHONDONTWRITEBYTECODE=1 +ENV PYTHONUNBUFFERED=1 + +COPY requirements.txt . +RUN pip install --no-cache-dir -r requirements.txt + +COPY app ./app + +EXPOSE 8080 + +CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8080"] \ No newline at end of file diff --git a/modules/integration-pilot/app/__init__.py b/modules/integration-pilot/app/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/modules/integration-pilot/app/main.py b/modules/integration-pilot/app/main.py new file mode 100644 index 0000000..c60eed5 --- /dev/null +++ b/modules/integration-pilot/app/main.py @@ -0,0 +1,70 @@ +import hashlib +import json +import os +import uuid +from datetime import datetime, timezone +from pathlib import Path +from typing import Any, Dict + +from fastapi import FastAPI + +app = FastAPI(title="OpenPolicyStack Integration Pilot", version="0.1.0") + +MODULE_NAME = os.getenv("OPS_MODULE_NAME", "integration-pilot") +MODULE_VERSION = os.getenv("PILOT_MODULE_VERSION", "0.1.0") +ARTIFACT_ROOT = Path(os.getenv("OPS_ARTIFACT_ROOT", "/var/openpolicystack/artifacts")) + + +def now_iso() -> str: + return datetime.now(timezone.utc).isoformat() + + +@app.get("/health") +def health() -> Dict[str, Any]: + return { + "status": "ok", + "module_name": MODULE_NAME, + "version": MODULE_VERSION, + } + + +@app.post("/execute") +def execute(payload: Dict[str, Any]) -> Dict[str, Any]: + execution_id = str(uuid.uuid4()) + timestamp = now_iso() + + module_dir = ARTIFACT_ROOT / MODULE_NAME + module_dir.mkdir(parents=True, exist_ok=True) + + artifact_path = module_dir / f"{execution_id}.json" + + artifact_content = { + "execution_id": execution_id, + "timestamp": timestamp, + "received_payload": payload, + "message": "integration pilot executed successfully", + } + + serialized = json.dumps(artifact_content, indent=2) + artifact_path.write_text(serialized, encoding="utf-8") + + sha256_hash = hashlib.sha256(serialized.encode("utf-8")).hexdigest() + + return { + "module_name": MODULE_NAME, + "version": MODULE_VERSION, + "status": "success", + "output": { + "message": "pilot module executed successfully", + "execution_id": execution_id, + "received_keys": sorted(list(payload.keys())), + }, + "artifacts": [ + { + "module_name": MODULE_NAME, + "file_path": str(artifact_path), + "hash": sha256_hash, + "type": "pilot_output", + } + ], + } \ No newline at end of file diff --git a/modules/integration-pilot/requirements.txt b/modules/integration-pilot/requirements.txt index e69de29..926ab65 100644 --- a/modules/integration-pilot/requirements.txt +++ b/modules/integration-pilot/requirements.txt @@ -0,0 +1,2 @@ +fastapi==0.115.0 +uvicorn[standard]==0.30.6 \ No newline at end of file diff --git a/modules/orchestrator/Dockerfile b/modules/orchestrator/Dockerfile index e69de29..7b7d624 100644 --- a/modules/orchestrator/Dockerfile +++ b/modules/orchestrator/Dockerfile @@ -0,0 +1,15 @@ +FROM python:3.11-slim + +WORKDIR /app + +ENV PYTHONDONTWRITEBYTECODE=1 +ENV PYTHONUNBUFFERED=1 + +COPY requirements.txt . +RUN pip install --no-cache-dir -r requirements.txt + +COPY app ./app + +EXPOSE 8080 + +CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8080"] \ No newline at end of file diff --git a/modules/orchestrator/app/__init__.py b/modules/orchestrator/app/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/modules/orchestrator/main.py b/modules/orchestrator/main.py new file mode 100644 index 0000000..13495ee --- /dev/null +++ b/modules/orchestrator/main.py @@ -0,0 +1,160 @@ +import json +import os +import sqlite3 +import uuid +from datetime import datetime, timezone +from pathlib import Path +from typing import Any, Dict + +import httpx +from fastapi import FastAPI, HTTPException + +app = FastAPI(title="OpenPolicyStack Orchestrator", version="0.1.0") + +MODULE_NAME = os.getenv("OPS_MODULE_NAME", "orchestrator") +MODULE_VERSION = "0.1.0" +ARTIFACT_ROOT = Path(os.getenv("OPS_ARTIFACT_ROOT", "/var/openpolicystack/artifacts")) +SQLITE_PATH = Path( + os.getenv( + "ORCHESTRATOR__SQLITE_PATH", + "/var/openpolicystack/metadata/orchestrator.db", + ) +) +INTEGRATION_PILOT_URL = os.getenv( + "ORCHESTRATOR__INTEGRATION_PILOT_URL", + "http://integration-pilot:8080", +) + + +def now_iso() -> str: + return datetime.now(timezone.utc).isoformat() + + +def ensure_paths() -> None: + ARTIFACT_ROOT.mkdir(parents=True, exist_ok=True) + SQLITE_PATH.parent.mkdir(parents=True, exist_ok=True) + + +def get_conn() -> sqlite3.Connection: + ensure_paths() + conn = sqlite3.connect(SQLITE_PATH) + conn.row_factory = sqlite3.Row + return conn + + +def init_db() -> None: + with get_conn() as conn: + conn.execute( + """ + CREATE TABLE IF NOT EXISTS runs ( + run_id TEXT PRIMARY KEY, + workflow_template TEXT, + status TEXT, + created_at TEXT, + request_payload TEXT, + response_payload TEXT + ) + """ + ) + conn.commit() + + +@app.on_event("startup") +def startup_event() -> None: + init_db() + + +@app.get("/health") +def health() -> Dict[str, Any]: + return { + "status": "ok", + "module_name": MODULE_NAME, + "version": MODULE_VERSION, + "pilot_url": INTEGRATION_PILOT_URL, + "sqlite_path": str(SQLITE_PATH), + } + + +@app.post("/execute") +async def execute(payload: Dict[str, Any]) -> Dict[str, Any]: + run_id = str(uuid.uuid4()) + created_at = now_iso() + + with get_conn() as conn: + conn.execute( + """ + INSERT INTO runs (run_id, workflow_template, status, created_at, request_payload, response_payload) + VALUES (?, ?, ?, ?, ?, ?) + """, + ( + run_id, + "pilot-workflow", + "running", + created_at, + json.dumps(payload), + None, + ), + ) + conn.commit() + + upstream_payload = { + "run_id": run_id, + "input": payload, + "requested_by": MODULE_NAME, + } + + try: + async with httpx.AsyncClient(timeout=30.0) as client: + upstream_response = await client.post( + f"{INTEGRATION_PILOT_URL}/execute", + json=upstream_payload, + ) + upstream_response.raise_for_status() + pilot_result = upstream_response.json() + except Exception as exc: + with get_conn() as conn: + conn.execute( + "UPDATE runs SET status = ?, response_payload = ? WHERE run_id = ?", + ("failed", json.dumps({"error": str(exc)}), run_id), + ) + conn.commit() + raise HTTPException(status_code=502, detail=f"Pilot module call failed: {exc}") + + orchestrator_dir = ARTIFACT_ROOT / "orchestrator" + orchestrator_dir.mkdir(parents=True, exist_ok=True) + + summary_path = orchestrator_dir / f"{run_id}-summary.json" + summary = { + "run_id": run_id, + "timestamp": created_at, + "orchestrator": MODULE_NAME, + "pilot_response": pilot_result, + } + summary_path.write_text(json.dumps(summary, indent=2), encoding="utf-8") + + response = { + "module_name": MODULE_NAME, + "version": MODULE_VERSION, + "status": "success", + "run_id": run_id, + "output": { + "message": "orchestrator executed pilot workflow successfully", + "pilot_result": pilot_result, + }, + "artifacts": [ + { + "module_name": MODULE_NAME, + "file_path": str(summary_path), + "type": "run_summary", + } + ], + } + + with get_conn() as conn: + conn.execute( + "UPDATE runs SET status = ?, response_payload = ? WHERE run_id = ?", + ("success", json.dumps(response), run_id), + ) + conn.commit() + + return response \ No newline at end of file diff --git a/modules/orchestrator/requirements.txt b/modules/orchestrator/requirements.txt index e69de29..8a89da7 100644 --- a/modules/orchestrator/requirements.txt +++ b/modules/orchestrator/requirements.txt @@ -0,0 +1,3 @@ +fastapi==0.115.0 +uvicorn[standard]==0.30.6 +httpx==0.27.2 \ No newline at end of file From 33e390374f1d9b7b29e4fcb4f2d031d78a49f983 Mon Sep 17 00:00:00 2001 From: Adrian Con Date: Mon, 16 Mar 2026 17:48:39 +0100 Subject: [PATCH 22/42] Moved main.py to the correct location. --- modules/orchestrator/{ => app}/main.py | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename modules/orchestrator/{ => app}/main.py (100%) diff --git a/modules/orchestrator/main.py b/modules/orchestrator/app/main.py similarity index 100% rename from modules/orchestrator/main.py rename to modules/orchestrator/app/main.py From 22285f2890f53b476b4b43811cc74b3aaaee214e Mon Sep 17 00:00:00 2001 From: Adrian Con Date: Tue, 17 Mar 2026 11:23:24 +0100 Subject: [PATCH 23/42] Updated the guide after successful end-to-end run of the integration-pilot module. --- .../orchestrator/MODULE_INTEGRATION_GUIDE.md | 382 ++++++++---------- 1 file changed, 175 insertions(+), 207 deletions(-) diff --git a/modules/orchestrator/MODULE_INTEGRATION_GUIDE.md b/modules/orchestrator/MODULE_INTEGRATION_GUIDE.md index daff052..bb1aa7d 100644 --- a/modules/orchestrator/MODULE_INTEGRATION_GUIDE.md +++ b/modules/orchestrator/MODULE_INTEGRATION_GUIDE.md @@ -10,39 +10,60 @@ # 1. Overview -This guide explains how to prepare your module so that it can be integrated into the **OpenPolicyStack orchestration system**. +This guide explains how to prepare your module so that it can be integrated into the **OpenPolicyStack Orchestration System**. -All modules in OpenPolicyStack operate as **independent containerized microservices** coordinated by a central orchestrator. +All modules in OpenPolicyStack operate as **independent containerized** microservices coordinated by a **central** orchestrator. Your module will: -- run inside a Docker container -- expose a lightweight REST API -- receive execution requests from the orchestrator -- return structured outputs and artifact references +- Run inside a Docker container. +- Expose a lightweight REST API. +- Receive execution requests from the orchestrator. +- Return structured outputs and artifact references. -You **do not need to implement orchestration logic**. +You **do not** need to implement orchestration logic**.** -Your module simply exposes a consistent interface and performs its analysis. +Your module must simply **conform** to the Integration Contract defined in this guide. --- -# 2. High-Level Integration Flow +# 2. Validated Integration Baseline + +A working integration baseline has been implemented and validated. + +The module: +``` +modules/integration-pilot +``` +Serves as the reference implementation for: + +- API contract +- Artifact handling +- Environment variables +- Docker container behavior +- Orchestrator interaction + +All modules must align with the integration-pilot pattern. +If in doubt, follow that implementation exactly. + +--- + +# 3. High-Level Integration Flow When OpenPolicyStack runs a workflow, the following occurs: 1. The orchestrator generates a **run_id**. 2. The orchestrator calls your module via HTTP. 3. Your module performs its computation. -4. Your module returns structured JSON results. -5. Any large outputs are saved as artifacts. +4. Your module returns structured JSON. +5. Artifacts are written to the shared volume. 6. The orchestrator records execution metadata. Your module remains **analytically independent** but participates in the **shared execution environment**. --- -# 3. Module Repository Structure +# 4. Module Repository Structure Your module must follow this directory structure: @@ -59,15 +80,9 @@ modules// └── README.md ``` -Example: - -``` -modules/policy-simulator/ -``` - --- -# 4. Naming Rules +# 5. Naming Rules Module names must follow **lowercase kebab-case**. @@ -90,39 +105,44 @@ policySimulator Your module name must match: -- folder name +- Folder name - Docker service name -- container hostname -- artifact directory name +- Container hostname +- Artifact directory name --- -# 5. Docker Container Requirement +# 6. Docker Container Requirement Your module **must run inside a Docker container**. -Minimal example Dockerfile: +Validated minimal Dockerfile: ``` FROM python:3.11-slim WORKDIR /app +ENV PYTHONDONTWRITEBYTECODE=1 +ENV PYTHONUNBUFFERED=1 + COPY requirements.txt . -RUN pip install -r requirements.txt +RUN pip install --no-cache-dir -r requirements.txt + +COPY app ./app -COPY app/ app/ +EXPOSE 8080 -CMD ["python", "app/main.py"] +CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8080"] ``` The container must start a web server exposing the module API. --- -# 6. API Interface Requirements +# 7. API Interface Requirements -Your module must expose three HTTP endpoints. +Each module must expose three HTTP endpoints. ### Required endpoints @@ -136,13 +156,18 @@ All endpoints must return JSON. --- -# 6.1 Health Endpoint +# 7.1 Health Endpoint Used by the orchestrator to verify that your service is running. ``` GET /health ``` +Must: +- Return HTTP 200. +- Respond quickly (no heavy logic). +- Not depend on external services. + Example response: @@ -153,10 +178,13 @@ Example response: "version":"0.1.0" } ``` +This endpoint is used for: +- Container healthchecks +- Orchestration readiness --- -# 6.2 Metadata Endpoint +# 7.2 Metadata Endpoint Provides module information used for orchestration and debugging. @@ -164,140 +192,120 @@ Provides module information used for orchestration and debugging. GET /metadata ``` -Example: - -``` -{ - "module_name":"policy-simulator", - "version":"0.1.0", - "supported_tasks": [ -"simulate_policy" - ] -} -``` - --- -# 6.3 Execute Endpoint +# 7.3 Execute Endpoint This is the **main entry point** used by the orchestrator. ``` POST /execute ``` +Your module must: +- Accept JSON input. +- Return structured JSON output. +- Never return raw text or HTML. -Example request: +### Required Response Structure ``` { - "run_id":"123e4567", - "parameters": { - "country":"DO" - }, - "inputs": [], - "metadata": {} -} -``` - -Example response: - -``` -{ - "module_name":"policy-simulator", - "version":"0.1.0", - "status":"success", - "output": { - "risk_score":0.41 - }, + "module_name": "policy-simulator", + "version": "0.1.0", + "status": "success", + "output": {}, "artifacts": [] } ``` +### Required Response Fields -Required response fields: - -| Field | Description | -| --- | --- | -| module_name | name of the module | -| version | module version | -| status | success or failure | -| output | structured JSON result | -| artifacts | list of artifact references | +| Field | Description | +| ----------- | --------------------------- | +| module_name | name of the module | +| version | module version | +| status | success or failure | +| output | structured JSON result | +| artifacts | list of artifact references | --- -# 7. Service Port +# 8. Service Port -Your API must listen on: +All modules must listen on: ``` 8080 ``` -Example FastAPI server: +--- + +# 9. Environment Variables + +Each module must include: ``` -uvicorn main:app --host 0.0.0.0 --port 8080 +.env.example ``` +### Important Rule +- .env.example → committed to Git ---- - -# 8. Environment Variables +- .env → NOT committed (local runtime only) -All modules use environment variables prefixed with: +Each developer must create: ``` -OPS_ +cp .env.example .env ``` -Example variables: +### Example env.example ``` -OPS_MODULE_NAME=policy-simulator +OPS_ENV=dev +OPS_LOG_LEVEL=INFO OPS_PORT=8080 OPS_ARTIFACT_ROOT=/var/openpolicystack/artifacts -OPS_LOG_LEVEL=INFO +OPS_MODULE_NAME=policy-simulator ``` +### Why This Matters -You must include a `.env.example` file documenting required variables. +This prevents; +- Missing environment variables in deployment. +- Inconsistent runtime configuration. +- Integration failures on the VM. --- -# 9. Artifact Storage +# 10. Artifact Storage (Validated Behavior) -Large outputs should be stored as **artifacts**. - -Artifacts are saved in the shared directory: +Artifacts must be written to: ``` /var/openpolicystack/artifacts ``` -Directory structure: +Each module must create its own subdirectory: ``` -artifacts/ -└── runs/ - └── / - └── / - ├── inputs/ - ├── outputs/ - └── meta/ +/var/openpolicystack/artifacts// ``` -Example artifact: +Example: ``` -artifacts/runs/abc123/policy-simulator/outputs/report.json +/var/openpolicystack/artifacts/integration-pilot/.json ``` -When returning artifacts in your response: +--- + +### Artifact Response Format ``` { "artifacts": [ { - "name":"simulation_report", - "path":"artifacts/runs/abc123/policy-simulator/outputs/report.json" + "module_name":"policy-simulator", + "file_path":"/var/openpolicystack/artifacts/policy-simulator/output.json", + "type":"output" } ] } @@ -305,38 +313,29 @@ When returning artifacts in your response: --- -# 10. Logging +### Important -Modules must log to: +- The orchestrator reads **references**, not raw files. +- All containers share the same mounted volume. +- Do not use local paths outside `/var/openpolicystack/artifacts`. -``` -stdout -stderr -``` +--- -Use structured JSON logs. +# 11. Logging -Example: +Modules must log to: ``` -{ - "timestamp":"2026-03-12T11:20:31Z", - "level":"INFO", - "service":"policy-simulator", - "run_id":"abc123", - "event":"simulation_started", - "message":"Policy simulation initiated" -} +stdout / stderr ``` +Use structured JSON logs. --- -# 11. Module Manifest +# 12. Module Manifest Each module must include a `module.yaml`. -Example: - ``` module_name: policy-simulator version: 0.1.0 @@ -349,147 +348,116 @@ interface: execute: /execute ``` -This allows automated module discovery. - --- -# 12. Example Minimal FastAPI Module - -Example `main.py`: +# 13. Local Testing +Before integration: ``` -fromfastapiimportFastAPI -frompydanticimportBaseModel - -app=FastAPI() - -classExecuteRequest(BaseModel): -run_id:str -parameters:dict= {} -inputs:list= [] -metadata:dict= {} - -@app.get("/health") -defhealth(): -return { -"status":"ok", -"module_name":"example-module", -"version":"0.1.0" - } +docker build -t openpolicystack/ . +docker run -p 8080:8080 openpolicystack/ +``` -@app.get("/metadata") -defmetadata(): -return { -"module_name":"example-module", -"version":"0.1.0" - } +Test endpoint: -@app.post("/execute") -defexecute(req:ExecuteRequest): -return { -"module_name":"example-module", -"version":"0.1.0", -"status":"success", -"output": {"example":True}, -"artifacts": [] - } +``` +curl http://localhost:8080/health ``` --- -# 13. Local Testing +# Integration Testing (Validated Workflow) +Once integrated with the orchestrator: -You can test your module locally before integration. +``` +docker compose build +docker compose up-d +docker composeps +``` -Start your module: +Test orchestrator: ``` -docker build -t openpolicystack/example-module . -docker run -p 8080:8080 openpolicystack/example-module +curl http://localhost:8100/health ``` -Test endpoint: +Test execution: ``` -curl http://localhost:8080/health +curl-X POST http://localhost:8100/execute \ +-H"Content-Type: application/json" \ +-d'{"test":"hello"}' ``` --- -# 14. Integration Checklist - -Before submitting your module for integration, verify: - -✔ Module resides in `modules/` -✔ Dockerfile builds successfully +# 15. Integration Checklist -✔ API listens on port `8080` +Before submitting your module for integration, verify: -✔ `/health` endpoint works +✔ Docker builds successfully. -✔ `/metadata` endpoint works +✔ Service runs on port 8080. -✔ `/execute` endpoint works +✔ /health returns 200. -✔ module returns JSON response +✔ /execute returns valid JSON. -✔ module returns version field +✔ .env.example included. -✔ `.env.example` included +✔ artifacts written to shared volume. -✔ `module.yaml` included +✔ response includes artifacts field. -✔ artifacts stored in correct directory +✔ module follows integration-pilot pattern. --- -# 15. Common Mistakes - -Avoid these common integration issues: +# 16. Common Mistakes +### Using localhost for inter-service calls -**Incorrect port** +Wrong: ``` -5000 -3000 +http://localhost:8080 ``` Correct: ``` -8080 +http://orchestrator:8080 ``` --- -**Using localhost for service calls** +### Missing `.env.example` -Wrong: +This causes deployment failures on the VM. -``` -http://localhost:8080 -``` +--- -Correct: +### Health endpoint too slow -``` -http://data-layer:8080 -``` +Healthchecks will fail → container marked unhealthy. --- -**Returning non-JSON responses** +### Writing artifacts outside shared volume -All API responses must be JSON. +Artifacts will not be visible to other services. ---- +# 17. Final Note + +The integration-pilot module is the single source of truth for: + +- Correct module behavior. + +- Correct response structure. + +- Correct artifact handling. -# 16. Need Help? +- Correct environment configuration. -If your module fails integration: +If your module behaves differently, it will fail integration. -1. Check `docker logs` -2. Verify `/health` endpoint -3. Confirm port `8080` -4. Validate JSON responses \ No newline at end of file From b0abafc684d19a95b4df2c1e0444266eae701831 Mon Sep 17 00:00:00 2001 From: Adrian Con Date: Tue, 17 Mar 2026 11:24:06 +0100 Subject: [PATCH 24/42] Updated the guide after successful end-to-end run of the integration-pilot module. --- modules/orchestrator/MODULE_INTEGRATION_GUIDE.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/modules/orchestrator/MODULE_INTEGRATION_GUIDE.md b/modules/orchestrator/MODULE_INTEGRATION_GUIDE.md index bb1aa7d..3c781e0 100644 --- a/modules/orchestrator/MODULE_INTEGRATION_GUIDE.md +++ b/modules/orchestrator/MODULE_INTEGRATION_GUIDE.md @@ -366,7 +366,7 @@ curl http://localhost:8080/health --- -# Integration Testing (Validated Workflow) +# 14. Integration Testing (Validated Workflow) Once integrated with the orchestrator: ``` From 9ebed430c27f9953bf01324b665391b095d9e509 Mon Sep 17 00:00:00 2001 From: Adrian Con Date: Tue, 17 Mar 2026 11:31:28 +0100 Subject: [PATCH 25/42] Modified the .gitignore to support the .env.example file. --- .gitignore | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/.gitignore b/.gitignore index b98b2e3..299c868 100644 --- a/.gitignore +++ b/.gitignore @@ -143,6 +143,11 @@ venv/ ENV/ env.bak/ venv.bak/ +# Make Sure .gitignore Supports the .env.example file. +.env +*.env +!.env.example +!*.env.example # Spyder project settings .spyderproject From fdb724cbd9b95490682c0d1a43b917b31aef12ee Mon Sep 17 00:00:00 2001 From: Adrian Con Date: Tue, 17 Mar 2026 11:35:35 +0100 Subject: [PATCH 26/42] Modified the .gitignore to support the .env.example file. --- .gitignore | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/.gitignore b/.gitignore index 299c868..8d36c72 100644 --- a/.gitignore +++ b/.gitignore @@ -143,11 +143,9 @@ venv/ ENV/ env.bak/ venv.bak/ -# Make Sure .gitignore Supports the .env.example file. -.env -*.env +# Allow example env templates (must be AFTER any .env / .env.* ignores) !.env.example -!*.env.example +!**/.env.example # Spyder project settings .spyderproject From 43b78811017b0ecb4ff726824d354c20d5207a1d Mon Sep 17 00:00:00 2001 From: Adrian Con Date: Tue, 17 Mar 2026 11:38:16 +0100 Subject: [PATCH 27/42] Modified the .gitignore to support the .env.example file. --- .gitignore | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/.gitignore b/.gitignore index 8d36c72..a7498e5 100644 --- a/.gitignore +++ b/.gitignore @@ -143,9 +143,7 @@ venv/ ENV/ env.bak/ venv.bak/ -# Allow example env templates (must be AFTER any .env / .env.* ignores) -!.env.example -!**/.env.example + # Spyder project settings .spyderproject @@ -250,3 +248,6 @@ modules/monitor/**/*.db-journal OpenPolicyStack/ +# Allow example env templates (must be AFTER any .env / .env.* ignores) +!.env.example +!**/.env.example \ No newline at end of file From 5f72062fd80e121ef5da6c8ac6c1d8e5c5f49537 Mon Sep 17 00:00:00 2001 From: Adrian Con Date: Tue, 17 Mar 2026 11:39:05 +0100 Subject: [PATCH 28/42] Created the env.example files for both modules orchestrator and integration-pilot. --- modules/integration-pilot/.env.example | 3 +++ modules/orchestrator/.env.example | 2 ++ 2 files changed, 5 insertions(+) create mode 100644 modules/integration-pilot/.env.example create mode 100644 modules/orchestrator/.env.example diff --git a/modules/integration-pilot/.env.example b/modules/integration-pilot/.env.example new file mode 100644 index 0000000..d5f5f8f --- /dev/null +++ b/modules/integration-pilot/.env.example @@ -0,0 +1,3 @@ +OPS_ENV=dev +OPS_LOG_LEVEL=INFO +PILOT_MODULE_VERSION=0.1.0 \ No newline at end of file diff --git a/modules/orchestrator/.env.example b/modules/orchestrator/.env.example new file mode 100644 index 0000000..bf21d5d --- /dev/null +++ b/modules/orchestrator/.env.example @@ -0,0 +1,2 @@ +OPS_ENV=dev +OPS_LOG_LEVEL=INFO \ No newline at end of file From a4ad5790358aeda1c13e5dfe0193ca85dae878c6 Mon Sep 17 00:00:00 2001 From: Adrian Con Date: Tue, 17 Mar 2026 11:41:37 +0100 Subject: [PATCH 29/42] Updated the MODULE_INTEGRATION_GUIDE to include instructions for tracking the .env.example file in .gitignore. --- modules/orchestrator/MODULE_INTEGRATION_GUIDE.md | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/modules/orchestrator/MODULE_INTEGRATION_GUIDE.md b/modules/orchestrator/MODULE_INTEGRATION_GUIDE.md index 3c781e0..55c2e4c 100644 --- a/modules/orchestrator/MODULE_INTEGRATION_GUIDE.md +++ b/modules/orchestrator/MODULE_INTEGRATION_GUIDE.md @@ -257,6 +257,15 @@ Each developer must create: cp .env.example .env ``` +### Important +Add the following at the bottom of your .gitignore to allow Git to track your .env.example file. + +``` +# Allow example env templates (must be AFTER any .env / .env.* ignores) +!.env.example +!**/.env.example +``` + ### Example env.example ``` From 366d0c15f8a34294cc4320b8a9633241cf565cbc Mon Sep 17 00:00:00 2001 From: Adrian Con Date: Wed, 18 Mar 2026 15:03:34 +0100 Subject: [PATCH 30/42] Add deployment and validation guide for OpenPolicyStack --- deploy/README.md | 108 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 108 insertions(+) create mode 100644 deploy/README.md diff --git a/deploy/README.md b/deploy/README.md new file mode 100644 index 0000000..0f272f9 --- /dev/null +++ b/deploy/README.md @@ -0,0 +1,108 @@ +# OpenPolicyStack – Deployment & Validation Guide + +## Purpose +This folder contains deployment-related artifacts for OpenPolicyStack, including the full architecture skeleton and basic validation instructions. + +At this stage, the system is validated through a **minimal runnable pilot stack** composed of: +- `orchestrator` (system coordinator) +- `integration-pilot` (reference module) + +--- + +## Deployment Files + +- `../compose.yaml` + → Current **runnable pilot stack** (validated baseline) + +- `compose.target-skeleton.yaml` + → **Full intended architecture** (not yet fully runnable; used as integration target) + +--- + +## Pilot Validation Procedure + +Run all commands from the repository root: + +```bash +docker compose config +docker compose build +docker compose up -d +docker compose ps +``` +Test the orchestrator: + +``` +curl http://localhost:8100/health +``` + +Execute a sample workflow: + +``` +curl-X POST http://localhost:8100/execute \ +-H"Content-Type: application/json" \ +-d'{"test":"hello","source":"vm-check"}' +``` + +Inspect shared artifacts: + +``` +docker compose exec orchestratorls-R /var/openpolicystack/artifacts +``` + +Inspect metadata: + +``` +docker compose exec orchestratorls-R /var/openpolicystack/metadata +``` + +--- + +## Expected Outcome + +A successful validation should confirm: + +- Both containers build and run successfully +- Both services report **healthy** status +- Orchestrator responds on `http://localhost:8100` +- Orchestrator can resolve `integration-pilot` via Docker network +- Shared artifact volume is written and visible across containers +- Metadata database (`orchestrator.db`) is created + +--- + +## Current Scope + +The pilot validates: + +- Basic orchestration flow (`/execute`) +- Service-to-service communication +- Shared artifact storage +- Metadata persistence (SQLite) +- Contract-compliant module execution + +Not yet covered: + +- Multi-module integration +- Real module onboarding from teammates +- Full evaluation framework (E1–E5) +- Production deployment considerations + +--- + +## Common Issues + +- Missing `.env` files (must be created from `.env.example`) +- Incorrect file paths or build contexts +- Missing or incorrect Dockerfile +- Wrong application entrypoint (`app.main`) +- Running commands outside repo root + +--- + +## Next Steps + +- Expand orchestrator metadata layer (`runs`, `module_calls`, `artifacts`) +- Strengthen determinism and reproducibility guarantees +- Introduce evaluation tests (E1–E5) +- Onboard first real module into the stack +- Progressively align with `compose.target-skeleton.yaml` \ No newline at end of file From 1d929d0a8e95cace4e54b7e387a9fae7a4fef22b Mon Sep 17 00:00:00 2001 From: Adrian Con Date: Wed, 18 Mar 2026 15:29:46 +0100 Subject: [PATCH 31/42] Expand orchestrator metadata schema to runs module_calls artifacts --- modules/orchestrator/app/main.py | 351 +++++++++++++++++++++++++++---- 1 file changed, 315 insertions(+), 36 deletions(-) diff --git a/modules/orchestrator/app/main.py b/modules/orchestrator/app/main.py index 13495ee..eb6239d 100644 --- a/modules/orchestrator/app/main.py +++ b/modules/orchestrator/app/main.py @@ -1,10 +1,11 @@ import json import os import sqlite3 +import time import uuid from datetime import datetime, timezone from pathlib import Path -from typing import Any, Dict +from typing import Any, Dict, Optional import httpx from fastapi import FastAPI, HTTPException @@ -39,6 +40,7 @@ def get_conn() -> sqlite3.Connection: ensure_paths() conn = sqlite3.connect(SQLITE_PATH) conn.row_factory = sqlite3.Row + conn.execute("PRAGMA foreign_keys = ON;") return conn @@ -48,14 +50,226 @@ def init_db() -> None: """ CREATE TABLE IF NOT EXISTS runs ( run_id TEXT PRIMARY KEY, - workflow_template TEXT, - status TEXT, - created_at TEXT, + workflow_template TEXT NOT NULL, + overall_status TEXT NOT NULL, + created_at TEXT NOT NULL, + completed_at TEXT, request_payload TEXT, response_payload TEXT ) """ ) + + conn.execute( + """ + CREATE TABLE IF NOT EXISTS module_calls ( + call_id TEXT PRIMARY KEY, + run_id TEXT NOT NULL, + module_name TEXT NOT NULL, + module_version TEXT, + call_sequence INTEGER NOT NULL, + status TEXT NOT NULL, + started_at TEXT NOT NULL, + completed_at TEXT, + execution_time_ms INTEGER, + request_payload TEXT, + response_payload TEXT, + error_message TEXT, + FOREIGN KEY (run_id) REFERENCES runs(run_id) + ) + """ + ) + + conn.execute( + """ + CREATE TABLE IF NOT EXISTS artifacts ( + artifact_id TEXT PRIMARY KEY, + run_id TEXT NOT NULL, + call_id TEXT, + module_name TEXT NOT NULL, + artifact_type TEXT, + file_path TEXT NOT NULL, + hash TEXT, + created_at TEXT NOT NULL, + FOREIGN KEY (run_id) REFERENCES runs(run_id), + FOREIGN KEY (call_id) REFERENCES module_calls(call_id) + ) + """ + ) + + conn.commit() + + +def insert_run( + run_id: str, + workflow_template: str, + overall_status: str, + created_at: str, + request_payload: Dict[str, Any], +) -> None: + with get_conn() as conn: + conn.execute( + """ + INSERT INTO runs ( + run_id, + workflow_template, + overall_status, + created_at, + request_payload, + response_payload + ) VALUES (?, ?, ?, ?, ?, ?) + """, + ( + run_id, + workflow_template, + overall_status, + created_at, + json.dumps(request_payload), + None, + ), + ) + conn.commit() + + +def update_run( + run_id: str, + overall_status: str, + response_payload: Dict[str, Any], + completed_at: Optional[str] = None, +) -> None: + with get_conn() as conn: + conn.execute( + """ + UPDATE runs + SET overall_status = ?, + completed_at = ?, + response_payload = ? + WHERE run_id = ? + """, + ( + overall_status, + completed_at, + json.dumps(response_payload), + run_id, + ), + ) + conn.commit() + + +def insert_module_call( + call_id: str, + run_id: str, + module_name: str, + call_sequence: int, + status: str, + started_at: str, + request_payload: Dict[str, Any], +) -> None: + with get_conn() as conn: + conn.execute( + """ + INSERT INTO module_calls ( + call_id, + run_id, + module_name, + module_version, + call_sequence, + status, + started_at, + completed_at, + execution_time_ms, + request_payload, + response_payload, + error_message + ) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) + """, + ( + call_id, + run_id, + module_name, + None, + call_sequence, + status, + started_at, + None, + None, + json.dumps(request_payload), + None, + None, + ), + ) + conn.commit() + + +def update_module_call( + call_id: str, + module_version: Optional[str], + status: str, + completed_at: str, + execution_time_ms: int, + response_payload: Optional[Dict[str, Any]] = None, + error_message: Optional[str] = None, +) -> None: + with get_conn() as conn: + conn.execute( + """ + UPDATE module_calls + SET module_version = ?, + status = ?, + completed_at = ?, + execution_time_ms = ?, + response_payload = ?, + error_message = ? + WHERE call_id = ? + """, + ( + module_version, + status, + completed_at, + execution_time_ms, + json.dumps(response_payload) if response_payload is not None else None, + error_message, + call_id, + ), + ) + conn.commit() + + +def insert_artifact( + artifact_id: str, + run_id: str, + call_id: Optional[str], + module_name: str, + artifact_type: Optional[str], + file_path: str, + hash_value: Optional[str], + created_at: str, +) -> None: + with get_conn() as conn: + conn.execute( + """ + INSERT INTO artifacts ( + artifact_id, + run_id, + call_id, + module_name, + artifact_type, + file_path, + hash, + created_at + ) VALUES (?, ?, ?, ?, ?, ?, ?, ?) + """, + ( + artifact_id, + run_id, + call_id, + module_name, + artifact_type, + file_path, + hash_value, + created_at, + ), + ) conn.commit() @@ -78,24 +292,18 @@ def health() -> Dict[str, Any]: @app.post("/execute") async def execute(payload: Dict[str, Any]) -> Dict[str, Any]: run_id = str(uuid.uuid4()) - created_at = now_iso() + run_created_at = now_iso() - with get_conn() as conn: - conn.execute( - """ - INSERT INTO runs (run_id, workflow_template, status, created_at, request_payload, response_payload) - VALUES (?, ?, ?, ?, ?, ?) - """, - ( - run_id, - "pilot-workflow", - "running", - created_at, - json.dumps(payload), - None, - ), - ) - conn.commit() + insert_run( + run_id=run_id, + workflow_template="pilot-workflow", + overall_status="running", + created_at=run_created_at, + request_payload=payload, + ) + + call_id = str(uuid.uuid4()) + call_started_at = now_iso() upstream_payload = { "run_id": run_id, @@ -103,7 +311,19 @@ async def execute(payload: Dict[str, Any]) -> Dict[str, Any]: "requested_by": MODULE_NAME, } + insert_module_call( + call_id=call_id, + run_id=run_id, + module_name="integration-pilot", + call_sequence=1, + status="running", + started_at=call_started_at, + request_payload=upstream_payload, + ) + try: + start_perf = time.perf_counter() + async with httpx.AsyncClient(timeout=30.0) as client: upstream_response = await client.post( f"{INTEGRATION_PILOT_URL}/execute", @@ -111,28 +331,87 @@ async def execute(payload: Dict[str, Any]) -> Dict[str, Any]: ) upstream_response.raise_for_status() pilot_result = upstream_response.json() + + execution_time_ms = int((time.perf_counter() - start_perf) * 1000) + call_completed_at = now_iso() + + update_module_call( + call_id=call_id, + module_version=pilot_result.get("version"), + status="success", + completed_at=call_completed_at, + execution_time_ms=execution_time_ms, + response_payload=pilot_result, + ) + except Exception as exc: - with get_conn() as conn: - conn.execute( - "UPDATE runs SET status = ?, response_payload = ? WHERE run_id = ?", - ("failed", json.dumps({"error": str(exc)}), run_id), - ) - conn.commit() + execution_time_ms = int((time.perf_counter() - start_perf) * 1000) if "start_perf" in locals() else 0 + call_completed_at = now_iso() + run_completed_at = now_iso() + + update_module_call( + call_id=call_id, + module_version=None, + status="failed", + completed_at=call_completed_at, + execution_time_ms=execution_time_ms, + response_payload=None, + error_message=str(exc), + ) + + error_response = { + "module_name": MODULE_NAME, + "version": MODULE_VERSION, + "status": "failed", + "run_id": run_id, + "error": f"Pilot module call failed: {exc}", + } + + update_run( + run_id=run_id, + overall_status="failed", + response_payload=error_response, + completed_at=run_completed_at, + ) + raise HTTPException(status_code=502, detail=f"Pilot module call failed: {exc}") + for artifact in pilot_result.get("artifacts", []): + insert_artifact( + artifact_id=str(uuid.uuid4()), + run_id=run_id, + call_id=call_id, + module_name=artifact.get("module_name", "integration-pilot"), + artifact_type=artifact.get("type"), + file_path=artifact.get("file_path", ""), + hash_value=artifact.get("hash"), + created_at=now_iso(), + ) + orchestrator_dir = ARTIFACT_ROOT / "orchestrator" orchestrator_dir.mkdir(parents=True, exist_ok=True) summary_path = orchestrator_dir / f"{run_id}-summary.json" summary = { "run_id": run_id, - "timestamp": created_at, + "timestamp": run_created_at, "orchestrator": MODULE_NAME, "pilot_response": pilot_result, } summary_path.write_text(json.dumps(summary, indent=2), encoding="utf-8") - response = { + insert_artifact( + artifact_id=str(uuid.uuid4()), + run_id=run_id, + call_id=None, + module_name=MODULE_NAME, + artifact_type="run_summary", + file_path=str(summary_path), + hash_value=None, + created_at=now_iso(), + ) + + final_response = { "module_name": MODULE_NAME, "version": MODULE_VERSION, "status": "success", @@ -150,11 +429,11 @@ async def execute(payload: Dict[str, Any]) -> Dict[str, Any]: ], } - with get_conn() as conn: - conn.execute( - "UPDATE runs SET status = ?, response_payload = ? WHERE run_id = ?", - ("success", json.dumps(response), run_id), - ) - conn.commit() + update_run( + run_id=run_id, + overall_status="success", + response_payload=final_response, + completed_at=now_iso(), + ) - return response \ No newline at end of file + return final_response \ No newline at end of file From 9a33f8629d1a72ffbcc9bc02b83e83237b6c9e56 Mon Sep 17 00:00:00 2001 From: Adrian Con Date: Wed, 18 Mar 2026 15:43:09 +0100 Subject: [PATCH 32/42] Removed the temporary beta file of main for the orchestrator. --- modules/orchestrator/app/main_beta.py | 439 ++++++++++++++++++++++++++ 1 file changed, 439 insertions(+) create mode 100644 modules/orchestrator/app/main_beta.py diff --git a/modules/orchestrator/app/main_beta.py b/modules/orchestrator/app/main_beta.py new file mode 100644 index 0000000..eb6239d --- /dev/null +++ b/modules/orchestrator/app/main_beta.py @@ -0,0 +1,439 @@ +import json +import os +import sqlite3 +import time +import uuid +from datetime import datetime, timezone +from pathlib import Path +from typing import Any, Dict, Optional + +import httpx +from fastapi import FastAPI, HTTPException + +app = FastAPI(title="OpenPolicyStack Orchestrator", version="0.1.0") + +MODULE_NAME = os.getenv("OPS_MODULE_NAME", "orchestrator") +MODULE_VERSION = "0.1.0" +ARTIFACT_ROOT = Path(os.getenv("OPS_ARTIFACT_ROOT", "/var/openpolicystack/artifacts")) +SQLITE_PATH = Path( + os.getenv( + "ORCHESTRATOR__SQLITE_PATH", + "/var/openpolicystack/metadata/orchestrator.db", + ) +) +INTEGRATION_PILOT_URL = os.getenv( + "ORCHESTRATOR__INTEGRATION_PILOT_URL", + "http://integration-pilot:8080", +) + + +def now_iso() -> str: + return datetime.now(timezone.utc).isoformat() + + +def ensure_paths() -> None: + ARTIFACT_ROOT.mkdir(parents=True, exist_ok=True) + SQLITE_PATH.parent.mkdir(parents=True, exist_ok=True) + + +def get_conn() -> sqlite3.Connection: + ensure_paths() + conn = sqlite3.connect(SQLITE_PATH) + conn.row_factory = sqlite3.Row + conn.execute("PRAGMA foreign_keys = ON;") + return conn + + +def init_db() -> None: + with get_conn() as conn: + conn.execute( + """ + CREATE TABLE IF NOT EXISTS runs ( + run_id TEXT PRIMARY KEY, + workflow_template TEXT NOT NULL, + overall_status TEXT NOT NULL, + created_at TEXT NOT NULL, + completed_at TEXT, + request_payload TEXT, + response_payload TEXT + ) + """ + ) + + conn.execute( + """ + CREATE TABLE IF NOT EXISTS module_calls ( + call_id TEXT PRIMARY KEY, + run_id TEXT NOT NULL, + module_name TEXT NOT NULL, + module_version TEXT, + call_sequence INTEGER NOT NULL, + status TEXT NOT NULL, + started_at TEXT NOT NULL, + completed_at TEXT, + execution_time_ms INTEGER, + request_payload TEXT, + response_payload TEXT, + error_message TEXT, + FOREIGN KEY (run_id) REFERENCES runs(run_id) + ) + """ + ) + + conn.execute( + """ + CREATE TABLE IF NOT EXISTS artifacts ( + artifact_id TEXT PRIMARY KEY, + run_id TEXT NOT NULL, + call_id TEXT, + module_name TEXT NOT NULL, + artifact_type TEXT, + file_path TEXT NOT NULL, + hash TEXT, + created_at TEXT NOT NULL, + FOREIGN KEY (run_id) REFERENCES runs(run_id), + FOREIGN KEY (call_id) REFERENCES module_calls(call_id) + ) + """ + ) + + conn.commit() + + +def insert_run( + run_id: str, + workflow_template: str, + overall_status: str, + created_at: str, + request_payload: Dict[str, Any], +) -> None: + with get_conn() as conn: + conn.execute( + """ + INSERT INTO runs ( + run_id, + workflow_template, + overall_status, + created_at, + request_payload, + response_payload + ) VALUES (?, ?, ?, ?, ?, ?) + """, + ( + run_id, + workflow_template, + overall_status, + created_at, + json.dumps(request_payload), + None, + ), + ) + conn.commit() + + +def update_run( + run_id: str, + overall_status: str, + response_payload: Dict[str, Any], + completed_at: Optional[str] = None, +) -> None: + with get_conn() as conn: + conn.execute( + """ + UPDATE runs + SET overall_status = ?, + completed_at = ?, + response_payload = ? + WHERE run_id = ? + """, + ( + overall_status, + completed_at, + json.dumps(response_payload), + run_id, + ), + ) + conn.commit() + + +def insert_module_call( + call_id: str, + run_id: str, + module_name: str, + call_sequence: int, + status: str, + started_at: str, + request_payload: Dict[str, Any], +) -> None: + with get_conn() as conn: + conn.execute( + """ + INSERT INTO module_calls ( + call_id, + run_id, + module_name, + module_version, + call_sequence, + status, + started_at, + completed_at, + execution_time_ms, + request_payload, + response_payload, + error_message + ) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) + """, + ( + call_id, + run_id, + module_name, + None, + call_sequence, + status, + started_at, + None, + None, + json.dumps(request_payload), + None, + None, + ), + ) + conn.commit() + + +def update_module_call( + call_id: str, + module_version: Optional[str], + status: str, + completed_at: str, + execution_time_ms: int, + response_payload: Optional[Dict[str, Any]] = None, + error_message: Optional[str] = None, +) -> None: + with get_conn() as conn: + conn.execute( + """ + UPDATE module_calls + SET module_version = ?, + status = ?, + completed_at = ?, + execution_time_ms = ?, + response_payload = ?, + error_message = ? + WHERE call_id = ? + """, + ( + module_version, + status, + completed_at, + execution_time_ms, + json.dumps(response_payload) if response_payload is not None else None, + error_message, + call_id, + ), + ) + conn.commit() + + +def insert_artifact( + artifact_id: str, + run_id: str, + call_id: Optional[str], + module_name: str, + artifact_type: Optional[str], + file_path: str, + hash_value: Optional[str], + created_at: str, +) -> None: + with get_conn() as conn: + conn.execute( + """ + INSERT INTO artifacts ( + artifact_id, + run_id, + call_id, + module_name, + artifact_type, + file_path, + hash, + created_at + ) VALUES (?, ?, ?, ?, ?, ?, ?, ?) + """, + ( + artifact_id, + run_id, + call_id, + module_name, + artifact_type, + file_path, + hash_value, + created_at, + ), + ) + conn.commit() + + +@app.on_event("startup") +def startup_event() -> None: + init_db() + + +@app.get("/health") +def health() -> Dict[str, Any]: + return { + "status": "ok", + "module_name": MODULE_NAME, + "version": MODULE_VERSION, + "pilot_url": INTEGRATION_PILOT_URL, + "sqlite_path": str(SQLITE_PATH), + } + + +@app.post("/execute") +async def execute(payload: Dict[str, Any]) -> Dict[str, Any]: + run_id = str(uuid.uuid4()) + run_created_at = now_iso() + + insert_run( + run_id=run_id, + workflow_template="pilot-workflow", + overall_status="running", + created_at=run_created_at, + request_payload=payload, + ) + + call_id = str(uuid.uuid4()) + call_started_at = now_iso() + + upstream_payload = { + "run_id": run_id, + "input": payload, + "requested_by": MODULE_NAME, + } + + insert_module_call( + call_id=call_id, + run_id=run_id, + module_name="integration-pilot", + call_sequence=1, + status="running", + started_at=call_started_at, + request_payload=upstream_payload, + ) + + try: + start_perf = time.perf_counter() + + async with httpx.AsyncClient(timeout=30.0) as client: + upstream_response = await client.post( + f"{INTEGRATION_PILOT_URL}/execute", + json=upstream_payload, + ) + upstream_response.raise_for_status() + pilot_result = upstream_response.json() + + execution_time_ms = int((time.perf_counter() - start_perf) * 1000) + call_completed_at = now_iso() + + update_module_call( + call_id=call_id, + module_version=pilot_result.get("version"), + status="success", + completed_at=call_completed_at, + execution_time_ms=execution_time_ms, + response_payload=pilot_result, + ) + + except Exception as exc: + execution_time_ms = int((time.perf_counter() - start_perf) * 1000) if "start_perf" in locals() else 0 + call_completed_at = now_iso() + run_completed_at = now_iso() + + update_module_call( + call_id=call_id, + module_version=None, + status="failed", + completed_at=call_completed_at, + execution_time_ms=execution_time_ms, + response_payload=None, + error_message=str(exc), + ) + + error_response = { + "module_name": MODULE_NAME, + "version": MODULE_VERSION, + "status": "failed", + "run_id": run_id, + "error": f"Pilot module call failed: {exc}", + } + + update_run( + run_id=run_id, + overall_status="failed", + response_payload=error_response, + completed_at=run_completed_at, + ) + + raise HTTPException(status_code=502, detail=f"Pilot module call failed: {exc}") + + for artifact in pilot_result.get("artifacts", []): + insert_artifact( + artifact_id=str(uuid.uuid4()), + run_id=run_id, + call_id=call_id, + module_name=artifact.get("module_name", "integration-pilot"), + artifact_type=artifact.get("type"), + file_path=artifact.get("file_path", ""), + hash_value=artifact.get("hash"), + created_at=now_iso(), + ) + + orchestrator_dir = ARTIFACT_ROOT / "orchestrator" + orchestrator_dir.mkdir(parents=True, exist_ok=True) + + summary_path = orchestrator_dir / f"{run_id}-summary.json" + summary = { + "run_id": run_id, + "timestamp": run_created_at, + "orchestrator": MODULE_NAME, + "pilot_response": pilot_result, + } + summary_path.write_text(json.dumps(summary, indent=2), encoding="utf-8") + + insert_artifact( + artifact_id=str(uuid.uuid4()), + run_id=run_id, + call_id=None, + module_name=MODULE_NAME, + artifact_type="run_summary", + file_path=str(summary_path), + hash_value=None, + created_at=now_iso(), + ) + + final_response = { + "module_name": MODULE_NAME, + "version": MODULE_VERSION, + "status": "success", + "run_id": run_id, + "output": { + "message": "orchestrator executed pilot workflow successfully", + "pilot_result": pilot_result, + }, + "artifacts": [ + { + "module_name": MODULE_NAME, + "file_path": str(summary_path), + "type": "run_summary", + } + ], + } + + update_run( + run_id=run_id, + overall_status="success", + response_payload=final_response, + completed_at=now_iso(), + ) + + return final_response \ No newline at end of file From e9c8f45fb7e76f167a66375f45e4a31299614b15 Mon Sep 17 00:00:00 2001 From: Adrian Con Date: Wed, 18 Mar 2026 15:43:22 +0100 Subject: [PATCH 33/42] Removed the temporary beta file of main for the orchestrator. --- modules/orchestrator/app/main_beta.py | 439 -------------------------- 1 file changed, 439 deletions(-) delete mode 100644 modules/orchestrator/app/main_beta.py diff --git a/modules/orchestrator/app/main_beta.py b/modules/orchestrator/app/main_beta.py deleted file mode 100644 index eb6239d..0000000 --- a/modules/orchestrator/app/main_beta.py +++ /dev/null @@ -1,439 +0,0 @@ -import json -import os -import sqlite3 -import time -import uuid -from datetime import datetime, timezone -from pathlib import Path -from typing import Any, Dict, Optional - -import httpx -from fastapi import FastAPI, HTTPException - -app = FastAPI(title="OpenPolicyStack Orchestrator", version="0.1.0") - -MODULE_NAME = os.getenv("OPS_MODULE_NAME", "orchestrator") -MODULE_VERSION = "0.1.0" -ARTIFACT_ROOT = Path(os.getenv("OPS_ARTIFACT_ROOT", "/var/openpolicystack/artifacts")) -SQLITE_PATH = Path( - os.getenv( - "ORCHESTRATOR__SQLITE_PATH", - "/var/openpolicystack/metadata/orchestrator.db", - ) -) -INTEGRATION_PILOT_URL = os.getenv( - "ORCHESTRATOR__INTEGRATION_PILOT_URL", - "http://integration-pilot:8080", -) - - -def now_iso() -> str: - return datetime.now(timezone.utc).isoformat() - - -def ensure_paths() -> None: - ARTIFACT_ROOT.mkdir(parents=True, exist_ok=True) - SQLITE_PATH.parent.mkdir(parents=True, exist_ok=True) - - -def get_conn() -> sqlite3.Connection: - ensure_paths() - conn = sqlite3.connect(SQLITE_PATH) - conn.row_factory = sqlite3.Row - conn.execute("PRAGMA foreign_keys = ON;") - return conn - - -def init_db() -> None: - with get_conn() as conn: - conn.execute( - """ - CREATE TABLE IF NOT EXISTS runs ( - run_id TEXT PRIMARY KEY, - workflow_template TEXT NOT NULL, - overall_status TEXT NOT NULL, - created_at TEXT NOT NULL, - completed_at TEXT, - request_payload TEXT, - response_payload TEXT - ) - """ - ) - - conn.execute( - """ - CREATE TABLE IF NOT EXISTS module_calls ( - call_id TEXT PRIMARY KEY, - run_id TEXT NOT NULL, - module_name TEXT NOT NULL, - module_version TEXT, - call_sequence INTEGER NOT NULL, - status TEXT NOT NULL, - started_at TEXT NOT NULL, - completed_at TEXT, - execution_time_ms INTEGER, - request_payload TEXT, - response_payload TEXT, - error_message TEXT, - FOREIGN KEY (run_id) REFERENCES runs(run_id) - ) - """ - ) - - conn.execute( - """ - CREATE TABLE IF NOT EXISTS artifacts ( - artifact_id TEXT PRIMARY KEY, - run_id TEXT NOT NULL, - call_id TEXT, - module_name TEXT NOT NULL, - artifact_type TEXT, - file_path TEXT NOT NULL, - hash TEXT, - created_at TEXT NOT NULL, - FOREIGN KEY (run_id) REFERENCES runs(run_id), - FOREIGN KEY (call_id) REFERENCES module_calls(call_id) - ) - """ - ) - - conn.commit() - - -def insert_run( - run_id: str, - workflow_template: str, - overall_status: str, - created_at: str, - request_payload: Dict[str, Any], -) -> None: - with get_conn() as conn: - conn.execute( - """ - INSERT INTO runs ( - run_id, - workflow_template, - overall_status, - created_at, - request_payload, - response_payload - ) VALUES (?, ?, ?, ?, ?, ?) - """, - ( - run_id, - workflow_template, - overall_status, - created_at, - json.dumps(request_payload), - None, - ), - ) - conn.commit() - - -def update_run( - run_id: str, - overall_status: str, - response_payload: Dict[str, Any], - completed_at: Optional[str] = None, -) -> None: - with get_conn() as conn: - conn.execute( - """ - UPDATE runs - SET overall_status = ?, - completed_at = ?, - response_payload = ? - WHERE run_id = ? - """, - ( - overall_status, - completed_at, - json.dumps(response_payload), - run_id, - ), - ) - conn.commit() - - -def insert_module_call( - call_id: str, - run_id: str, - module_name: str, - call_sequence: int, - status: str, - started_at: str, - request_payload: Dict[str, Any], -) -> None: - with get_conn() as conn: - conn.execute( - """ - INSERT INTO module_calls ( - call_id, - run_id, - module_name, - module_version, - call_sequence, - status, - started_at, - completed_at, - execution_time_ms, - request_payload, - response_payload, - error_message - ) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) - """, - ( - call_id, - run_id, - module_name, - None, - call_sequence, - status, - started_at, - None, - None, - json.dumps(request_payload), - None, - None, - ), - ) - conn.commit() - - -def update_module_call( - call_id: str, - module_version: Optional[str], - status: str, - completed_at: str, - execution_time_ms: int, - response_payload: Optional[Dict[str, Any]] = None, - error_message: Optional[str] = None, -) -> None: - with get_conn() as conn: - conn.execute( - """ - UPDATE module_calls - SET module_version = ?, - status = ?, - completed_at = ?, - execution_time_ms = ?, - response_payload = ?, - error_message = ? - WHERE call_id = ? - """, - ( - module_version, - status, - completed_at, - execution_time_ms, - json.dumps(response_payload) if response_payload is not None else None, - error_message, - call_id, - ), - ) - conn.commit() - - -def insert_artifact( - artifact_id: str, - run_id: str, - call_id: Optional[str], - module_name: str, - artifact_type: Optional[str], - file_path: str, - hash_value: Optional[str], - created_at: str, -) -> None: - with get_conn() as conn: - conn.execute( - """ - INSERT INTO artifacts ( - artifact_id, - run_id, - call_id, - module_name, - artifact_type, - file_path, - hash, - created_at - ) VALUES (?, ?, ?, ?, ?, ?, ?, ?) - """, - ( - artifact_id, - run_id, - call_id, - module_name, - artifact_type, - file_path, - hash_value, - created_at, - ), - ) - conn.commit() - - -@app.on_event("startup") -def startup_event() -> None: - init_db() - - -@app.get("/health") -def health() -> Dict[str, Any]: - return { - "status": "ok", - "module_name": MODULE_NAME, - "version": MODULE_VERSION, - "pilot_url": INTEGRATION_PILOT_URL, - "sqlite_path": str(SQLITE_PATH), - } - - -@app.post("/execute") -async def execute(payload: Dict[str, Any]) -> Dict[str, Any]: - run_id = str(uuid.uuid4()) - run_created_at = now_iso() - - insert_run( - run_id=run_id, - workflow_template="pilot-workflow", - overall_status="running", - created_at=run_created_at, - request_payload=payload, - ) - - call_id = str(uuid.uuid4()) - call_started_at = now_iso() - - upstream_payload = { - "run_id": run_id, - "input": payload, - "requested_by": MODULE_NAME, - } - - insert_module_call( - call_id=call_id, - run_id=run_id, - module_name="integration-pilot", - call_sequence=1, - status="running", - started_at=call_started_at, - request_payload=upstream_payload, - ) - - try: - start_perf = time.perf_counter() - - async with httpx.AsyncClient(timeout=30.0) as client: - upstream_response = await client.post( - f"{INTEGRATION_PILOT_URL}/execute", - json=upstream_payload, - ) - upstream_response.raise_for_status() - pilot_result = upstream_response.json() - - execution_time_ms = int((time.perf_counter() - start_perf) * 1000) - call_completed_at = now_iso() - - update_module_call( - call_id=call_id, - module_version=pilot_result.get("version"), - status="success", - completed_at=call_completed_at, - execution_time_ms=execution_time_ms, - response_payload=pilot_result, - ) - - except Exception as exc: - execution_time_ms = int((time.perf_counter() - start_perf) * 1000) if "start_perf" in locals() else 0 - call_completed_at = now_iso() - run_completed_at = now_iso() - - update_module_call( - call_id=call_id, - module_version=None, - status="failed", - completed_at=call_completed_at, - execution_time_ms=execution_time_ms, - response_payload=None, - error_message=str(exc), - ) - - error_response = { - "module_name": MODULE_NAME, - "version": MODULE_VERSION, - "status": "failed", - "run_id": run_id, - "error": f"Pilot module call failed: {exc}", - } - - update_run( - run_id=run_id, - overall_status="failed", - response_payload=error_response, - completed_at=run_completed_at, - ) - - raise HTTPException(status_code=502, detail=f"Pilot module call failed: {exc}") - - for artifact in pilot_result.get("artifacts", []): - insert_artifact( - artifact_id=str(uuid.uuid4()), - run_id=run_id, - call_id=call_id, - module_name=artifact.get("module_name", "integration-pilot"), - artifact_type=artifact.get("type"), - file_path=artifact.get("file_path", ""), - hash_value=artifact.get("hash"), - created_at=now_iso(), - ) - - orchestrator_dir = ARTIFACT_ROOT / "orchestrator" - orchestrator_dir.mkdir(parents=True, exist_ok=True) - - summary_path = orchestrator_dir / f"{run_id}-summary.json" - summary = { - "run_id": run_id, - "timestamp": run_created_at, - "orchestrator": MODULE_NAME, - "pilot_response": pilot_result, - } - summary_path.write_text(json.dumps(summary, indent=2), encoding="utf-8") - - insert_artifact( - artifact_id=str(uuid.uuid4()), - run_id=run_id, - call_id=None, - module_name=MODULE_NAME, - artifact_type="run_summary", - file_path=str(summary_path), - hash_value=None, - created_at=now_iso(), - ) - - final_response = { - "module_name": MODULE_NAME, - "version": MODULE_VERSION, - "status": "success", - "run_id": run_id, - "output": { - "message": "orchestrator executed pilot workflow successfully", - "pilot_result": pilot_result, - }, - "artifacts": [ - { - "module_name": MODULE_NAME, - "file_path": str(summary_path), - "type": "run_summary", - } - ], - } - - update_run( - run_id=run_id, - overall_status="success", - response_payload=final_response, - completed_at=now_iso(), - ) - - return final_response \ No newline at end of file From 3a69c77432027618efd0e9a1066cb538030e1e4a Mon Sep 17 00:00:00 2001 From: Adrian Con Date: Wed, 18 Mar 2026 17:36:20 +0100 Subject: [PATCH 34/42] Hashing enhancement applied. --- modules/integration-pilot/app/main.py | 502 ++++++++++++++++++++++++-- 1 file changed, 475 insertions(+), 27 deletions(-) diff --git a/modules/integration-pilot/app/main.py b/modules/integration-pilot/app/main.py index c60eed5..90def85 100644 --- a/modules/integration-pilot/app/main.py +++ b/modules/integration-pilot/app/main.py @@ -1,70 +1,518 @@ import hashlib import json import os +import sqlite3 +import time import uuid from datetime import datetime, timezone from pathlib import Path -from typing import Any, Dict +from typing import Any, Dict, Optional -from fastapi import FastAPI +import httpx +from fastapi import FastAPI, HTTPException -app = FastAPI(title="OpenPolicyStack Integration Pilot", version="0.1.0") +app = FastAPI(title="OpenPolicyStack Orchestrator", version="0.1.0") -MODULE_NAME = os.getenv("OPS_MODULE_NAME", "integration-pilot") -MODULE_VERSION = os.getenv("PILOT_MODULE_VERSION", "0.1.0") +MODULE_NAME = os.getenv("OPS_MODULE_NAME", "orchestrator") +MODULE_VERSION = "0.1.0" ARTIFACT_ROOT = Path(os.getenv("OPS_ARTIFACT_ROOT", "/var/openpolicystack/artifacts")) +SQLITE_PATH = Path( + os.getenv( + "ORCHESTRATOR__SQLITE_PATH", + "/var/openpolicystack/metadata/orchestrator.db", + ) +) +INTEGRATION_PILOT_URL = os.getenv( + "ORCHESTRATOR__INTEGRATION_PILOT_URL", + "http://integration-pilot:8080", +) def now_iso() -> str: return datetime.now(timezone.utc).isoformat() +def ensure_paths() -> None: + ARTIFACT_ROOT.mkdir(parents=True, exist_ok=True) + SQLITE_PATH.parent.mkdir(parents=True, exist_ok=True) + + +def get_conn() -> sqlite3.Connection: + ensure_paths() + conn = sqlite3.connect(SQLITE_PATH) + conn.row_factory = sqlite3.Row + conn.execute("PRAGMA foreign_keys = ON;") + return conn + + +def canonical_json(value: Any) -> str: + """ + Deterministic JSON serialization for hashing and storage consistency. + """ + return json.dumps(value, sort_keys=True, separators=(",", ":"), ensure_ascii=False) + + +def sha256_text(text: str) -> str: + return hashlib.sha256(text.encode("utf-8")).hexdigest() + + +def hash_payload(payload: Optional[Dict[str, Any]]) -> Optional[str]: + if payload is None: + return None + return sha256_text(canonical_json(payload)) + + +def hash_file(file_path: Path) -> str: + digest = hashlib.sha256() + with file_path.open("rb") as f: + for chunk in iter(lambda: f.read(8192), b""): + digest.update(chunk) + return digest.hexdigest() + + +def column_exists(conn: sqlite3.Connection, table_name: str, column_name: str) -> bool: + rows = conn.execute(f"PRAGMA table_info({table_name})").fetchall() + return any(row[1] == column_name for row in rows) + + +def init_db() -> None: + with get_conn() as conn: + conn.execute( + """ + CREATE TABLE IF NOT EXISTS runs ( + run_id TEXT PRIMARY KEY, + workflow_template TEXT NOT NULL, + overall_status TEXT NOT NULL, + created_at TEXT NOT NULL, + completed_at TEXT, + request_payload TEXT, + response_payload TEXT + ) + """ + ) + + conn.execute( + """ + CREATE TABLE IF NOT EXISTS module_calls ( + call_id TEXT PRIMARY KEY, + run_id TEXT NOT NULL, + module_name TEXT NOT NULL, + module_version TEXT, + call_sequence INTEGER NOT NULL, + status TEXT NOT NULL, + started_at TEXT NOT NULL, + completed_at TEXT, + execution_time_ms INTEGER, + request_payload TEXT, + response_payload TEXT, + error_message TEXT, + FOREIGN KEY (run_id) REFERENCES runs(run_id) + ) + """ + ) + + conn.execute( + """ + CREATE TABLE IF NOT EXISTS artifacts ( + artifact_id TEXT PRIMARY KEY, + run_id TEXT NOT NULL, + call_id TEXT, + module_name TEXT NOT NULL, + artifact_type TEXT, + file_path TEXT NOT NULL, + hash TEXT, + created_at TEXT NOT NULL, + FOREIGN KEY (run_id) REFERENCES runs(run_id), + FOREIGN KEY (call_id) REFERENCES module_calls(call_id) + ) + """ + ) + + # Safe additive migration for hashing support + if not column_exists(conn, "runs", "request_payload_hash"): + conn.execute("ALTER TABLE runs ADD COLUMN request_payload_hash TEXT") + + if not column_exists(conn, "runs", "response_payload_hash"): + conn.execute("ALTER TABLE runs ADD COLUMN response_payload_hash TEXT") + + if not column_exists(conn, "module_calls", "request_payload_hash"): + conn.execute("ALTER TABLE module_calls ADD COLUMN request_payload_hash TEXT") + + if not column_exists(conn, "module_calls", "response_payload_hash"): + conn.execute("ALTER TABLE module_calls ADD COLUMN response_payload_hash TEXT") + + conn.commit() + + +def insert_run( + run_id: str, + workflow_template: str, + overall_status: str, + created_at: str, + request_payload: Dict[str, Any], +) -> None: + request_payload_json = canonical_json(request_payload) + request_payload_hash = sha256_text(request_payload_json) + + with get_conn() as conn: + conn.execute( + """ + INSERT INTO runs ( + run_id, + workflow_template, + overall_status, + created_at, + request_payload, + response_payload, + request_payload_hash, + response_payload_hash + ) VALUES (?, ?, ?, ?, ?, ?, ?, ?) + """, + ( + run_id, + workflow_template, + overall_status, + created_at, + request_payload_json, + None, + request_payload_hash, + None, + ), + ) + conn.commit() + + +def update_run( + run_id: str, + overall_status: str, + response_payload: Dict[str, Any], + completed_at: Optional[str] = None, +) -> None: + response_payload_json = canonical_json(response_payload) + response_payload_hash = sha256_text(response_payload_json) + + with get_conn() as conn: + conn.execute( + """ + UPDATE runs + SET overall_status = ?, + completed_at = ?, + response_payload = ?, + response_payload_hash = ? + WHERE run_id = ? + """, + ( + overall_status, + completed_at, + response_payload_json, + response_payload_hash, + run_id, + ), + ) + conn.commit() + + +def insert_module_call( + call_id: str, + run_id: str, + module_name: str, + call_sequence: int, + status: str, + started_at: str, + request_payload: Dict[str, Any], +) -> None: + request_payload_json = canonical_json(request_payload) + request_payload_hash = sha256_text(request_payload_json) + + with get_conn() as conn: + conn.execute( + """ + INSERT INTO module_calls ( + call_id, + run_id, + module_name, + module_version, + call_sequence, + status, + started_at, + completed_at, + execution_time_ms, + request_payload, + response_payload, + error_message, + request_payload_hash, + response_payload_hash + ) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) + """, + ( + call_id, + run_id, + module_name, + None, + call_sequence, + status, + started_at, + None, + None, + request_payload_json, + None, + None, + request_payload_hash, + None, + ), + ) + conn.commit() + + +def update_module_call( + call_id: str, + module_version: Optional[str], + status: str, + completed_at: str, + execution_time_ms: int, + response_payload: Optional[Dict[str, Any]] = None, + error_message: Optional[str] = None, +) -> None: + response_payload_json = ( + canonical_json(response_payload) if response_payload is not None else None + ) + response_payload_hash = ( + sha256_text(response_payload_json) if response_payload_json is not None else None + ) + + with get_conn() as conn: + conn.execute( + """ + UPDATE module_calls + SET module_version = ?, + status = ?, + completed_at = ?, + execution_time_ms = ?, + response_payload = ?, + error_message = ?, + response_payload_hash = ? + WHERE call_id = ? + """, + ( + module_version, + status, + completed_at, + execution_time_ms, + response_payload_json, + error_message, + response_payload_hash, + call_id, + ), + ) + conn.commit() + + +def insert_artifact( + artifact_id: str, + run_id: str, + call_id: Optional[str], + module_name: str, + artifact_type: Optional[str], + file_path: str, + hash_value: Optional[str], + created_at: str, +) -> None: + with get_conn() as conn: + conn.execute( + """ + INSERT INTO artifacts ( + artifact_id, + run_id, + call_id, + module_name, + artifact_type, + file_path, + hash, + created_at + ) VALUES (?, ?, ?, ?, ?, ?, ?, ?) + """, + ( + artifact_id, + run_id, + call_id, + module_name, + artifact_type, + file_path, + hash_value, + created_at, + ), + ) + conn.commit() + + +@app.on_event("startup") +def startup_event() -> None: + init_db() + + @app.get("/health") def health() -> Dict[str, Any]: return { "status": "ok", "module_name": MODULE_NAME, "version": MODULE_VERSION, + "pilot_url": INTEGRATION_PILOT_URL, + "sqlite_path": str(SQLITE_PATH), } @app.post("/execute") -def execute(payload: Dict[str, Any]) -> Dict[str, Any]: - execution_id = str(uuid.uuid4()) - timestamp = now_iso() +async def execute(payload: Dict[str, Any]) -> Dict[str, Any]: + run_id = str(uuid.uuid4()) + run_created_at = now_iso() - module_dir = ARTIFACT_ROOT / MODULE_NAME - module_dir.mkdir(parents=True, exist_ok=True) + insert_run( + run_id=run_id, + workflow_template="pilot-workflow", + overall_status="running", + created_at=run_created_at, + request_payload=payload, + ) - artifact_path = module_dir / f"{execution_id}.json" + call_id = str(uuid.uuid4()) + call_started_at = now_iso() - artifact_content = { - "execution_id": execution_id, - "timestamp": timestamp, - "received_payload": payload, - "message": "integration pilot executed successfully", + upstream_payload = { + "run_id": run_id, + "input": payload, + "requested_by": MODULE_NAME, } - serialized = json.dumps(artifact_content, indent=2) - artifact_path.write_text(serialized, encoding="utf-8") + insert_module_call( + call_id=call_id, + run_id=run_id, + module_name="integration-pilot", + call_sequence=1, + status="running", + started_at=call_started_at, + request_payload=upstream_payload, + ) - sha256_hash = hashlib.sha256(serialized.encode("utf-8")).hexdigest() + try: + start_perf = time.perf_counter() - return { + async with httpx.AsyncClient(timeout=30.0) as client: + upstream_response = await client.post( + f"{INTEGRATION_PILOT_URL}/execute", + json=upstream_payload, + ) + upstream_response.raise_for_status() + pilot_result = upstream_response.json() + + execution_time_ms = int((time.perf_counter() - start_perf) * 1000) + call_completed_at = now_iso() + + update_module_call( + call_id=call_id, + module_version=pilot_result.get("version"), + status="success", + completed_at=call_completed_at, + execution_time_ms=execution_time_ms, + response_payload=pilot_result, + ) + + except Exception as exc: + execution_time_ms = ( + int((time.perf_counter() - start_perf) * 1000) + if "start_perf" in locals() + else 0 + ) + call_completed_at = now_iso() + run_completed_at = now_iso() + + update_module_call( + call_id=call_id, + module_version=None, + status="failed", + completed_at=call_completed_at, + execution_time_ms=execution_time_ms, + response_payload=None, + error_message=str(exc), + ) + + error_response = { + "module_name": MODULE_NAME, + "version": MODULE_VERSION, + "status": "failed", + "run_id": run_id, + "error": f"Pilot module call failed: {exc}", + } + + update_run( + run_id=run_id, + overall_status="failed", + response_payload=error_response, + completed_at=run_completed_at, + ) + + raise HTTPException(status_code=502, detail=f"Pilot module call failed: {exc}") + + for artifact in pilot_result.get("artifacts", []): + insert_artifact( + artifact_id=str(uuid.uuid4()), + run_id=run_id, + call_id=call_id, + module_name=artifact.get("module_name", "integration-pilot"), + artifact_type=artifact.get("type"), + file_path=artifact.get("file_path", ""), + hash_value=artifact.get("hash"), + created_at=now_iso(), + ) + + orchestrator_dir = ARTIFACT_ROOT / "orchestrator" + orchestrator_dir.mkdir(parents=True, exist_ok=True) + + summary_path = orchestrator_dir / f"{run_id}-summary.json" + summary = { + "run_id": run_id, + "timestamp": run_created_at, + "orchestrator": MODULE_NAME, + "pilot_response": pilot_result, + } + summary_serialized = json.dumps(summary, indent=2, ensure_ascii=False) + summary_path.write_text(summary_serialized, encoding="utf-8") + summary_hash = hash_file(summary_path) + + insert_artifact( + artifact_id=str(uuid.uuid4()), + run_id=run_id, + call_id=None, + module_name=MODULE_NAME, + artifact_type="run_summary", + file_path=str(summary_path), + hash_value=summary_hash, + created_at=now_iso(), + ) + + final_response = { "module_name": MODULE_NAME, "version": MODULE_VERSION, "status": "success", + "run_id": run_id, "output": { - "message": "pilot module executed successfully", - "execution_id": execution_id, - "received_keys": sorted(list(payload.keys())), + "message": "orchestrator executed pilot workflow successfully", + "pilot_result": pilot_result, }, "artifacts": [ { "module_name": MODULE_NAME, - "file_path": str(artifact_path), - "hash": sha256_hash, - "type": "pilot_output", + "file_path": str(summary_path), + "hash": summary_hash, + "type": "run_summary", } ], - } \ No newline at end of file + } + + update_run( + run_id=run_id, + overall_status="success", + response_payload=final_response, + completed_at=now_iso(), + ) + + return final_response \ No newline at end of file From 838b755230cc507d9d57209439f9a333ecc0117a Mon Sep 17 00:00:00 2001 From: Adrian Con Date: Wed, 18 Mar 2026 17:44:04 +0100 Subject: [PATCH 35/42] Eliminated changes. --- modules/integration-pilot/app/main.py | 502 ++------------------------ 1 file changed, 27 insertions(+), 475 deletions(-) diff --git a/modules/integration-pilot/app/main.py b/modules/integration-pilot/app/main.py index 90def85..c60eed5 100644 --- a/modules/integration-pilot/app/main.py +++ b/modules/integration-pilot/app/main.py @@ -1,518 +1,70 @@ import hashlib import json import os -import sqlite3 -import time import uuid from datetime import datetime, timezone from pathlib import Path -from typing import Any, Dict, Optional +from typing import Any, Dict -import httpx -from fastapi import FastAPI, HTTPException +from fastapi import FastAPI -app = FastAPI(title="OpenPolicyStack Orchestrator", version="0.1.0") +app = FastAPI(title="OpenPolicyStack Integration Pilot", version="0.1.0") -MODULE_NAME = os.getenv("OPS_MODULE_NAME", "orchestrator") -MODULE_VERSION = "0.1.0" +MODULE_NAME = os.getenv("OPS_MODULE_NAME", "integration-pilot") +MODULE_VERSION = os.getenv("PILOT_MODULE_VERSION", "0.1.0") ARTIFACT_ROOT = Path(os.getenv("OPS_ARTIFACT_ROOT", "/var/openpolicystack/artifacts")) -SQLITE_PATH = Path( - os.getenv( - "ORCHESTRATOR__SQLITE_PATH", - "/var/openpolicystack/metadata/orchestrator.db", - ) -) -INTEGRATION_PILOT_URL = os.getenv( - "ORCHESTRATOR__INTEGRATION_PILOT_URL", - "http://integration-pilot:8080", -) def now_iso() -> str: return datetime.now(timezone.utc).isoformat() -def ensure_paths() -> None: - ARTIFACT_ROOT.mkdir(parents=True, exist_ok=True) - SQLITE_PATH.parent.mkdir(parents=True, exist_ok=True) - - -def get_conn() -> sqlite3.Connection: - ensure_paths() - conn = sqlite3.connect(SQLITE_PATH) - conn.row_factory = sqlite3.Row - conn.execute("PRAGMA foreign_keys = ON;") - return conn - - -def canonical_json(value: Any) -> str: - """ - Deterministic JSON serialization for hashing and storage consistency. - """ - return json.dumps(value, sort_keys=True, separators=(",", ":"), ensure_ascii=False) - - -def sha256_text(text: str) -> str: - return hashlib.sha256(text.encode("utf-8")).hexdigest() - - -def hash_payload(payload: Optional[Dict[str, Any]]) -> Optional[str]: - if payload is None: - return None - return sha256_text(canonical_json(payload)) - - -def hash_file(file_path: Path) -> str: - digest = hashlib.sha256() - with file_path.open("rb") as f: - for chunk in iter(lambda: f.read(8192), b""): - digest.update(chunk) - return digest.hexdigest() - - -def column_exists(conn: sqlite3.Connection, table_name: str, column_name: str) -> bool: - rows = conn.execute(f"PRAGMA table_info({table_name})").fetchall() - return any(row[1] == column_name for row in rows) - - -def init_db() -> None: - with get_conn() as conn: - conn.execute( - """ - CREATE TABLE IF NOT EXISTS runs ( - run_id TEXT PRIMARY KEY, - workflow_template TEXT NOT NULL, - overall_status TEXT NOT NULL, - created_at TEXT NOT NULL, - completed_at TEXT, - request_payload TEXT, - response_payload TEXT - ) - """ - ) - - conn.execute( - """ - CREATE TABLE IF NOT EXISTS module_calls ( - call_id TEXT PRIMARY KEY, - run_id TEXT NOT NULL, - module_name TEXT NOT NULL, - module_version TEXT, - call_sequence INTEGER NOT NULL, - status TEXT NOT NULL, - started_at TEXT NOT NULL, - completed_at TEXT, - execution_time_ms INTEGER, - request_payload TEXT, - response_payload TEXT, - error_message TEXT, - FOREIGN KEY (run_id) REFERENCES runs(run_id) - ) - """ - ) - - conn.execute( - """ - CREATE TABLE IF NOT EXISTS artifacts ( - artifact_id TEXT PRIMARY KEY, - run_id TEXT NOT NULL, - call_id TEXT, - module_name TEXT NOT NULL, - artifact_type TEXT, - file_path TEXT NOT NULL, - hash TEXT, - created_at TEXT NOT NULL, - FOREIGN KEY (run_id) REFERENCES runs(run_id), - FOREIGN KEY (call_id) REFERENCES module_calls(call_id) - ) - """ - ) - - # Safe additive migration for hashing support - if not column_exists(conn, "runs", "request_payload_hash"): - conn.execute("ALTER TABLE runs ADD COLUMN request_payload_hash TEXT") - - if not column_exists(conn, "runs", "response_payload_hash"): - conn.execute("ALTER TABLE runs ADD COLUMN response_payload_hash TEXT") - - if not column_exists(conn, "module_calls", "request_payload_hash"): - conn.execute("ALTER TABLE module_calls ADD COLUMN request_payload_hash TEXT") - - if not column_exists(conn, "module_calls", "response_payload_hash"): - conn.execute("ALTER TABLE module_calls ADD COLUMN response_payload_hash TEXT") - - conn.commit() - - -def insert_run( - run_id: str, - workflow_template: str, - overall_status: str, - created_at: str, - request_payload: Dict[str, Any], -) -> None: - request_payload_json = canonical_json(request_payload) - request_payload_hash = sha256_text(request_payload_json) - - with get_conn() as conn: - conn.execute( - """ - INSERT INTO runs ( - run_id, - workflow_template, - overall_status, - created_at, - request_payload, - response_payload, - request_payload_hash, - response_payload_hash - ) VALUES (?, ?, ?, ?, ?, ?, ?, ?) - """, - ( - run_id, - workflow_template, - overall_status, - created_at, - request_payload_json, - None, - request_payload_hash, - None, - ), - ) - conn.commit() - - -def update_run( - run_id: str, - overall_status: str, - response_payload: Dict[str, Any], - completed_at: Optional[str] = None, -) -> None: - response_payload_json = canonical_json(response_payload) - response_payload_hash = sha256_text(response_payload_json) - - with get_conn() as conn: - conn.execute( - """ - UPDATE runs - SET overall_status = ?, - completed_at = ?, - response_payload = ?, - response_payload_hash = ? - WHERE run_id = ? - """, - ( - overall_status, - completed_at, - response_payload_json, - response_payload_hash, - run_id, - ), - ) - conn.commit() - - -def insert_module_call( - call_id: str, - run_id: str, - module_name: str, - call_sequence: int, - status: str, - started_at: str, - request_payload: Dict[str, Any], -) -> None: - request_payload_json = canonical_json(request_payload) - request_payload_hash = sha256_text(request_payload_json) - - with get_conn() as conn: - conn.execute( - """ - INSERT INTO module_calls ( - call_id, - run_id, - module_name, - module_version, - call_sequence, - status, - started_at, - completed_at, - execution_time_ms, - request_payload, - response_payload, - error_message, - request_payload_hash, - response_payload_hash - ) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) - """, - ( - call_id, - run_id, - module_name, - None, - call_sequence, - status, - started_at, - None, - None, - request_payload_json, - None, - None, - request_payload_hash, - None, - ), - ) - conn.commit() - - -def update_module_call( - call_id: str, - module_version: Optional[str], - status: str, - completed_at: str, - execution_time_ms: int, - response_payload: Optional[Dict[str, Any]] = None, - error_message: Optional[str] = None, -) -> None: - response_payload_json = ( - canonical_json(response_payload) if response_payload is not None else None - ) - response_payload_hash = ( - sha256_text(response_payload_json) if response_payload_json is not None else None - ) - - with get_conn() as conn: - conn.execute( - """ - UPDATE module_calls - SET module_version = ?, - status = ?, - completed_at = ?, - execution_time_ms = ?, - response_payload = ?, - error_message = ?, - response_payload_hash = ? - WHERE call_id = ? - """, - ( - module_version, - status, - completed_at, - execution_time_ms, - response_payload_json, - error_message, - response_payload_hash, - call_id, - ), - ) - conn.commit() - - -def insert_artifact( - artifact_id: str, - run_id: str, - call_id: Optional[str], - module_name: str, - artifact_type: Optional[str], - file_path: str, - hash_value: Optional[str], - created_at: str, -) -> None: - with get_conn() as conn: - conn.execute( - """ - INSERT INTO artifacts ( - artifact_id, - run_id, - call_id, - module_name, - artifact_type, - file_path, - hash, - created_at - ) VALUES (?, ?, ?, ?, ?, ?, ?, ?) - """, - ( - artifact_id, - run_id, - call_id, - module_name, - artifact_type, - file_path, - hash_value, - created_at, - ), - ) - conn.commit() - - -@app.on_event("startup") -def startup_event() -> None: - init_db() - - @app.get("/health") def health() -> Dict[str, Any]: return { "status": "ok", "module_name": MODULE_NAME, "version": MODULE_VERSION, - "pilot_url": INTEGRATION_PILOT_URL, - "sqlite_path": str(SQLITE_PATH), } @app.post("/execute") -async def execute(payload: Dict[str, Any]) -> Dict[str, Any]: - run_id = str(uuid.uuid4()) - run_created_at = now_iso() +def execute(payload: Dict[str, Any]) -> Dict[str, Any]: + execution_id = str(uuid.uuid4()) + timestamp = now_iso() - insert_run( - run_id=run_id, - workflow_template="pilot-workflow", - overall_status="running", - created_at=run_created_at, - request_payload=payload, - ) + module_dir = ARTIFACT_ROOT / MODULE_NAME + module_dir.mkdir(parents=True, exist_ok=True) - call_id = str(uuid.uuid4()) - call_started_at = now_iso() + artifact_path = module_dir / f"{execution_id}.json" - upstream_payload = { - "run_id": run_id, - "input": payload, - "requested_by": MODULE_NAME, + artifact_content = { + "execution_id": execution_id, + "timestamp": timestamp, + "received_payload": payload, + "message": "integration pilot executed successfully", } - insert_module_call( - call_id=call_id, - run_id=run_id, - module_name="integration-pilot", - call_sequence=1, - status="running", - started_at=call_started_at, - request_payload=upstream_payload, - ) - - try: - start_perf = time.perf_counter() - - async with httpx.AsyncClient(timeout=30.0) as client: - upstream_response = await client.post( - f"{INTEGRATION_PILOT_URL}/execute", - json=upstream_payload, - ) - upstream_response.raise_for_status() - pilot_result = upstream_response.json() - - execution_time_ms = int((time.perf_counter() - start_perf) * 1000) - call_completed_at = now_iso() + serialized = json.dumps(artifact_content, indent=2) + artifact_path.write_text(serialized, encoding="utf-8") - update_module_call( - call_id=call_id, - module_version=pilot_result.get("version"), - status="success", - completed_at=call_completed_at, - execution_time_ms=execution_time_ms, - response_payload=pilot_result, - ) + sha256_hash = hashlib.sha256(serialized.encode("utf-8")).hexdigest() - except Exception as exc: - execution_time_ms = ( - int((time.perf_counter() - start_perf) * 1000) - if "start_perf" in locals() - else 0 - ) - call_completed_at = now_iso() - run_completed_at = now_iso() - - update_module_call( - call_id=call_id, - module_version=None, - status="failed", - completed_at=call_completed_at, - execution_time_ms=execution_time_ms, - response_payload=None, - error_message=str(exc), - ) - - error_response = { - "module_name": MODULE_NAME, - "version": MODULE_VERSION, - "status": "failed", - "run_id": run_id, - "error": f"Pilot module call failed: {exc}", - } - - update_run( - run_id=run_id, - overall_status="failed", - response_payload=error_response, - completed_at=run_completed_at, - ) - - raise HTTPException(status_code=502, detail=f"Pilot module call failed: {exc}") - - for artifact in pilot_result.get("artifacts", []): - insert_artifact( - artifact_id=str(uuid.uuid4()), - run_id=run_id, - call_id=call_id, - module_name=artifact.get("module_name", "integration-pilot"), - artifact_type=artifact.get("type"), - file_path=artifact.get("file_path", ""), - hash_value=artifact.get("hash"), - created_at=now_iso(), - ) - - orchestrator_dir = ARTIFACT_ROOT / "orchestrator" - orchestrator_dir.mkdir(parents=True, exist_ok=True) - - summary_path = orchestrator_dir / f"{run_id}-summary.json" - summary = { - "run_id": run_id, - "timestamp": run_created_at, - "orchestrator": MODULE_NAME, - "pilot_response": pilot_result, - } - summary_serialized = json.dumps(summary, indent=2, ensure_ascii=False) - summary_path.write_text(summary_serialized, encoding="utf-8") - summary_hash = hash_file(summary_path) - - insert_artifact( - artifact_id=str(uuid.uuid4()), - run_id=run_id, - call_id=None, - module_name=MODULE_NAME, - artifact_type="run_summary", - file_path=str(summary_path), - hash_value=summary_hash, - created_at=now_iso(), - ) - - final_response = { + return { "module_name": MODULE_NAME, "version": MODULE_VERSION, "status": "success", - "run_id": run_id, "output": { - "message": "orchestrator executed pilot workflow successfully", - "pilot_result": pilot_result, + "message": "pilot module executed successfully", + "execution_id": execution_id, + "received_keys": sorted(list(payload.keys())), }, "artifacts": [ { "module_name": MODULE_NAME, - "file_path": str(summary_path), - "hash": summary_hash, - "type": "run_summary", + "file_path": str(artifact_path), + "hash": sha256_hash, + "type": "pilot_output", } ], - } - - update_run( - run_id=run_id, - overall_status="success", - response_payload=final_response, - completed_at=now_iso(), - ) - - return final_response \ No newline at end of file + } \ No newline at end of file From dd8105d36d3ca5d535c78fb01fecf37a48f7baa4 Mon Sep 17 00:00:00 2001 From: Adrian Con Date: Wed, 18 Mar 2026 17:44:55 +0100 Subject: [PATCH 36/42] Changed the main.py I was suppossed to. --- modules/orchestrator/app/main.py | 105 +++++++++++++++++++++++++++---- 1 file changed, 92 insertions(+), 13 deletions(-) diff --git a/modules/orchestrator/app/main.py b/modules/orchestrator/app/main.py index eb6239d..90def85 100644 --- a/modules/orchestrator/app/main.py +++ b/modules/orchestrator/app/main.py @@ -1,3 +1,4 @@ +import hashlib import json import os import sqlite3 @@ -44,6 +45,36 @@ def get_conn() -> sqlite3.Connection: return conn +def canonical_json(value: Any) -> str: + """ + Deterministic JSON serialization for hashing and storage consistency. + """ + return json.dumps(value, sort_keys=True, separators=(",", ":"), ensure_ascii=False) + + +def sha256_text(text: str) -> str: + return hashlib.sha256(text.encode("utf-8")).hexdigest() + + +def hash_payload(payload: Optional[Dict[str, Any]]) -> Optional[str]: + if payload is None: + return None + return sha256_text(canonical_json(payload)) + + +def hash_file(file_path: Path) -> str: + digest = hashlib.sha256() + with file_path.open("rb") as f: + for chunk in iter(lambda: f.read(8192), b""): + digest.update(chunk) + return digest.hexdigest() + + +def column_exists(conn: sqlite3.Connection, table_name: str, column_name: str) -> bool: + rows = conn.execute(f"PRAGMA table_info({table_name})").fetchall() + return any(row[1] == column_name for row in rows) + + def init_db() -> None: with get_conn() as conn: conn.execute( @@ -97,6 +128,19 @@ def init_db() -> None: """ ) + # Safe additive migration for hashing support + if not column_exists(conn, "runs", "request_payload_hash"): + conn.execute("ALTER TABLE runs ADD COLUMN request_payload_hash TEXT") + + if not column_exists(conn, "runs", "response_payload_hash"): + conn.execute("ALTER TABLE runs ADD COLUMN response_payload_hash TEXT") + + if not column_exists(conn, "module_calls", "request_payload_hash"): + conn.execute("ALTER TABLE module_calls ADD COLUMN request_payload_hash TEXT") + + if not column_exists(conn, "module_calls", "response_payload_hash"): + conn.execute("ALTER TABLE module_calls ADD COLUMN response_payload_hash TEXT") + conn.commit() @@ -107,6 +151,9 @@ def insert_run( created_at: str, request_payload: Dict[str, Any], ) -> None: + request_payload_json = canonical_json(request_payload) + request_payload_hash = sha256_text(request_payload_json) + with get_conn() as conn: conn.execute( """ @@ -116,15 +163,19 @@ def insert_run( overall_status, created_at, request_payload, - response_payload - ) VALUES (?, ?, ?, ?, ?, ?) + response_payload, + request_payload_hash, + response_payload_hash + ) VALUES (?, ?, ?, ?, ?, ?, ?, ?) """, ( run_id, workflow_template, overall_status, created_at, - json.dumps(request_payload), + request_payload_json, + None, + request_payload_hash, None, ), ) @@ -137,19 +188,24 @@ def update_run( response_payload: Dict[str, Any], completed_at: Optional[str] = None, ) -> None: + response_payload_json = canonical_json(response_payload) + response_payload_hash = sha256_text(response_payload_json) + with get_conn() as conn: conn.execute( """ UPDATE runs SET overall_status = ?, completed_at = ?, - response_payload = ? + response_payload = ?, + response_payload_hash = ? WHERE run_id = ? """, ( overall_status, completed_at, - json.dumps(response_payload), + response_payload_json, + response_payload_hash, run_id, ), ) @@ -165,6 +221,9 @@ def insert_module_call( started_at: str, request_payload: Dict[str, Any], ) -> None: + request_payload_json = canonical_json(request_payload) + request_payload_hash = sha256_text(request_payload_json) + with get_conn() as conn: conn.execute( """ @@ -180,8 +239,10 @@ def insert_module_call( execution_time_ms, request_payload, response_payload, - error_message - ) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) + error_message, + request_payload_hash, + response_payload_hash + ) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) """, ( call_id, @@ -193,8 +254,10 @@ def insert_module_call( started_at, None, None, - json.dumps(request_payload), + request_payload_json, + None, None, + request_payload_hash, None, ), ) @@ -210,6 +273,13 @@ def update_module_call( response_payload: Optional[Dict[str, Any]] = None, error_message: Optional[str] = None, ) -> None: + response_payload_json = ( + canonical_json(response_payload) if response_payload is not None else None + ) + response_payload_hash = ( + sha256_text(response_payload_json) if response_payload_json is not None else None + ) + with get_conn() as conn: conn.execute( """ @@ -219,7 +289,8 @@ def update_module_call( completed_at = ?, execution_time_ms = ?, response_payload = ?, - error_message = ? + error_message = ?, + response_payload_hash = ? WHERE call_id = ? """, ( @@ -227,8 +298,9 @@ def update_module_call( status, completed_at, execution_time_ms, - json.dumps(response_payload) if response_payload is not None else None, + response_payload_json, error_message, + response_payload_hash, call_id, ), ) @@ -345,7 +417,11 @@ async def execute(payload: Dict[str, Any]) -> Dict[str, Any]: ) except Exception as exc: - execution_time_ms = int((time.perf_counter() - start_perf) * 1000) if "start_perf" in locals() else 0 + execution_time_ms = ( + int((time.perf_counter() - start_perf) * 1000) + if "start_perf" in locals() + else 0 + ) call_completed_at = now_iso() run_completed_at = now_iso() @@ -398,7 +474,9 @@ async def execute(payload: Dict[str, Any]) -> Dict[str, Any]: "orchestrator": MODULE_NAME, "pilot_response": pilot_result, } - summary_path.write_text(json.dumps(summary, indent=2), encoding="utf-8") + summary_serialized = json.dumps(summary, indent=2, ensure_ascii=False) + summary_path.write_text(summary_serialized, encoding="utf-8") + summary_hash = hash_file(summary_path) insert_artifact( artifact_id=str(uuid.uuid4()), @@ -407,7 +485,7 @@ async def execute(payload: Dict[str, Any]) -> Dict[str, Any]: module_name=MODULE_NAME, artifact_type="run_summary", file_path=str(summary_path), - hash_value=None, + hash_value=summary_hash, created_at=now_iso(), ) @@ -424,6 +502,7 @@ async def execute(payload: Dict[str, Any]) -> Dict[str, Any]: { "module_name": MODULE_NAME, "file_path": str(summary_path), + "hash": summary_hash, "type": "run_summary", } ], From 7e0493d02dca817e87764ad0eed7dc4a2b58b5db Mon Sep 17 00:00:00 2001 From: Adrian Con Date: Wed, 18 Mar 2026 18:12:21 +0100 Subject: [PATCH 37/42] Refactor execute function to use stable input hashing and improve JSON serialization. Removed UUID and timestamp, replacing them with a deterministic input hash for artifact filenames. --- modules/integration-pilot/app/main.py | 48 +++++++++++++++++---------- 1 file changed, 31 insertions(+), 17 deletions(-) diff --git a/modules/integration-pilot/app/main.py b/modules/integration-pilot/app/main.py index c60eed5..a8e644f 100644 --- a/modules/integration-pilot/app/main.py +++ b/modules/integration-pilot/app/main.py @@ -1,8 +1,6 @@ import hashlib import json import os -import uuid -from datetime import datetime, timezone from pathlib import Path from typing import Any, Dict @@ -15,8 +13,15 @@ ARTIFACT_ROOT = Path(os.getenv("OPS_ARTIFACT_ROOT", "/var/openpolicystack/artifacts")) -def now_iso() -> str: - return datetime.now(timezone.utc).isoformat() +def canonical_json(value: Any) -> str: + """ + Deterministic JSON serialization used for stable content hashing. + """ + return json.dumps(value, sort_keys=True, separators=(",", ":"), ensure_ascii=False) + + +def sha256_text(text: str) -> str: + return hashlib.sha256(text.encode("utf-8")).hexdigest() @app.get("/health") @@ -30,25 +35,32 @@ def health() -> Dict[str, Any]: @app.post("/execute") def execute(payload: Dict[str, Any]) -> Dict[str, Any]: - execution_id = str(uuid.uuid4()) - timestamp = now_iso() - module_dir = ARTIFACT_ROOT / MODULE_NAME module_dir.mkdir(parents=True, exist_ok=True) - artifact_path = module_dir / f"{execution_id}.json" + stable_input = payload.get("input", payload) + stable_input_hash = sha256_text(canonical_json(stable_input)) + + artifact_filename = f"{stable_input_hash}.json" + artifact_path = module_dir / artifact_filename artifact_content = { - "execution_id": execution_id, - "timestamp": timestamp, - "received_payload": payload, + "input_hash": stable_input_hash, + "received_payload": stable_input, "message": "integration pilot executed successfully", + "module_name": MODULE_NAME, + "module_version": MODULE_VERSION, } - serialized = json.dumps(artifact_content, indent=2) - artifact_path.write_text(serialized, encoding="utf-8") + serialized_artifact = json.dumps( + artifact_content, + sort_keys=True, + indent=2, + ensure_ascii=False, + ) + artifact_path.write_text(serialized_artifact, encoding="utf-8") - sha256_hash = hashlib.sha256(serialized.encode("utf-8")).hexdigest() + artifact_hash = sha256_text(serialized_artifact) return { "module_name": MODULE_NAME, @@ -56,14 +68,16 @@ def execute(payload: Dict[str, Any]) -> Dict[str, Any]: "status": "success", "output": { "message": "pilot module executed successfully", - "execution_id": execution_id, - "received_keys": sorted(list(payload.keys())), + "input_hash": stable_input_hash, + "received_keys": sorted(list(stable_input.keys())) + if isinstance(stable_input, dict) + else [], }, "artifacts": [ { "module_name": MODULE_NAME, "file_path": str(artifact_path), - "hash": sha256_hash, + "hash": artifact_hash, "type": "pilot_output", } ], From 1df636caefd09b009556e6f659da81f4ba1c371c Mon Sep 17 00:00:00 2001 From: Adrian Con Date: Wed, 18 Mar 2026 19:51:27 +0100 Subject: [PATCH 38/42] Minimal changes for testing E2. --- modules/integration-pilot/app/main.py | 4 +- modules/integration-pilot/app/main.py.bak | 84 +++++++++++++++++++++++ 2 files changed, 87 insertions(+), 1 deletion(-) create mode 100644 modules/integration-pilot/app/main.py.bak diff --git a/modules/integration-pilot/app/main.py b/modules/integration-pilot/app/main.py index a8e644f..642e627 100644 --- a/modules/integration-pilot/app/main.py +++ b/modules/integration-pilot/app/main.py @@ -9,7 +9,7 @@ app = FastAPI(title="OpenPolicyStack Integration Pilot", version="0.1.0") MODULE_NAME = os.getenv("OPS_MODULE_NAME", "integration-pilot") -MODULE_VERSION = os.getenv("PILOT_MODULE_VERSION", "0.1.0") +MODULE_VERSION = os.getenv("PILOT_MODULE_VERSION", "0.1.1") ARTIFACT_ROOT = Path(os.getenv("OPS_ARTIFACT_ROOT", "/var/openpolicystack/artifacts")) @@ -50,6 +50,7 @@ def execute(payload: Dict[str, Any]) -> Dict[str, Any]: "message": "integration pilot executed successfully", "module_name": MODULE_NAME, "module_version": MODULE_VERSION, + "processing_profile": "normalized-v2", } serialized_artifact = json.dumps( @@ -72,6 +73,7 @@ def execute(payload: Dict[str, Any]) -> Dict[str, Any]: "received_keys": sorted(list(stable_input.keys())) if isinstance(stable_input, dict) else [], + "processing_profile": "normalized-v2", }, "artifacts": [ { diff --git a/modules/integration-pilot/app/main.py.bak b/modules/integration-pilot/app/main.py.bak new file mode 100644 index 0000000..a8e644f --- /dev/null +++ b/modules/integration-pilot/app/main.py.bak @@ -0,0 +1,84 @@ +import hashlib +import json +import os +from pathlib import Path +from typing import Any, Dict + +from fastapi import FastAPI + +app = FastAPI(title="OpenPolicyStack Integration Pilot", version="0.1.0") + +MODULE_NAME = os.getenv("OPS_MODULE_NAME", "integration-pilot") +MODULE_VERSION = os.getenv("PILOT_MODULE_VERSION", "0.1.0") +ARTIFACT_ROOT = Path(os.getenv("OPS_ARTIFACT_ROOT", "/var/openpolicystack/artifacts")) + + +def canonical_json(value: Any) -> str: + """ + Deterministic JSON serialization used for stable content hashing. + """ + return json.dumps(value, sort_keys=True, separators=(",", ":"), ensure_ascii=False) + + +def sha256_text(text: str) -> str: + return hashlib.sha256(text.encode("utf-8")).hexdigest() + + +@app.get("/health") +def health() -> Dict[str, Any]: + return { + "status": "ok", + "module_name": MODULE_NAME, + "version": MODULE_VERSION, + } + + +@app.post("/execute") +def execute(payload: Dict[str, Any]) -> Dict[str, Any]: + module_dir = ARTIFACT_ROOT / MODULE_NAME + module_dir.mkdir(parents=True, exist_ok=True) + + stable_input = payload.get("input", payload) + stable_input_hash = sha256_text(canonical_json(stable_input)) + + artifact_filename = f"{stable_input_hash}.json" + artifact_path = module_dir / artifact_filename + + artifact_content = { + "input_hash": stable_input_hash, + "received_payload": stable_input, + "message": "integration pilot executed successfully", + "module_name": MODULE_NAME, + "module_version": MODULE_VERSION, + } + + serialized_artifact = json.dumps( + artifact_content, + sort_keys=True, + indent=2, + ensure_ascii=False, + ) + artifact_path.write_text(serialized_artifact, encoding="utf-8") + + artifact_hash = sha256_text(serialized_artifact) + + return { + "module_name": MODULE_NAME, + "version": MODULE_VERSION, + "status": "success", + "output": { + "message": "pilot module executed successfully", + "input_hash": stable_input_hash, + "received_keys": sorted(list(stable_input.keys())) + if isinstance(stable_input, dict) + else [], + }, + "artifacts": [ + { + "module_name": MODULE_NAME, + "file_path": str(artifact_path), + "hash": artifact_hash, + "type": "pilot_output", + } + ], + } \ No newline at end of file From 10883696b33c7df73e78b46ad1117ae12c60ad85 Mon Sep 17 00:00:00 2001 From: Adrian Con Date: Wed, 18 Mar 2026 19:59:55 +0100 Subject: [PATCH 39/42] Changed to prove E2. --- compose.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/compose.yaml b/compose.yaml index 546185e..8040aa2 100644 --- a/compose.yaml +++ b/compose.yaml @@ -52,7 +52,7 @@ services: OPS_LOG_LEVEL: INFO OPS_ARTIFACT_ROOT: /var/openpolicystack/artifacts OPS_ORCHESTRATOR_URL: http://orchestrator:8080 - PILOT_MODULE_VERSION: 0.1.0 + PILOT_MODULE_VERSION: 0.1.1 ports: - "8101:8080" volumes: From bd42c1884805a9aa7e60c72a2475b915c23832d1 Mon Sep 17 00:00:00 2001 From: Adrian Con Date: Thu, 19 Mar 2026 14:43:14 +0100 Subject: [PATCH 40/42] Added a controlled failure trigger to test E5. --- modules/integration-pilot/app/main.py | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/modules/integration-pilot/app/main.py b/modules/integration-pilot/app/main.py index 642e627..85bf89c 100644 --- a/modules/integration-pilot/app/main.py +++ b/modules/integration-pilot/app/main.py @@ -39,6 +39,11 @@ def execute(payload: Dict[str, Any]) -> Dict[str, Any]: module_dir.mkdir(parents=True, exist_ok=True) stable_input = payload.get("input", payload) + + # Controlled failure trigger for E5 + if isinstance(stable_input, dict) and stable_input.get("force_error") is True: + raise ValueError("Controlled E5 failure: forced error triggered") + stable_input_hash = sha256_text(canonical_json(stable_input)) artifact_filename = f"{stable_input_hash}.json" From a3e3132aeb76a72a2da62ad197f269c315fcf2b0 Mon Sep 17 00:00:00 2001 From: Adrian Con Date: Fri, 10 Apr 2026 12:07:45 +0200 Subject: [PATCH 41/42] Commented-out the failure trigger implemented previously for E5. --- modules/integration-pilot/app/main.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/modules/integration-pilot/app/main.py b/modules/integration-pilot/app/main.py index 85bf89c..09c6d63 100644 --- a/modules/integration-pilot/app/main.py +++ b/modules/integration-pilot/app/main.py @@ -41,8 +41,8 @@ def execute(payload: Dict[str, Any]) -> Dict[str, Any]: stable_input = payload.get("input", payload) # Controlled failure trigger for E5 - if isinstance(stable_input, dict) and stable_input.get("force_error") is True: - raise ValueError("Controlled E5 failure: forced error triggered") + # if isinstance(stable_input, dict) and stable_input.get("force_error") is True: + # raise ValueError("Controlled E5 failure: forced error triggered") stable_input_hash = sha256_text(canonical_json(stable_input)) From 4d8bb98354bc44ed1cd63313416309c2cfa35d1f Mon Sep 17 00:00:00 2001 From: Adrian Con Date: Fri, 10 Apr 2026 12:34:40 +0200 Subject: [PATCH 42/42] Added md file to the root of the branch. --- ADRIAN_CON_THESIS_SCOPE.md | 46 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 46 insertions(+) create mode 100644 ADRIAN_CON_THESIS_SCOPE.md diff --git a/ADRIAN_CON_THESIS_SCOPE.md b/ADRIAN_CON_THESIS_SCOPE.md new file mode 100644 index 0000000..9d57b53 --- /dev/null +++ b/ADRIAN_CON_THESIS_SCOPE.md @@ -0,0 +1,46 @@ +# Thesis Scope – Adrian Con García + +This repository is part of the broader OpenPolicyStack project. +This document clarifies the scope of the work developed and evaluated in the corresponding thesis. + +## Scope of Contribution + +The thesis focuses specifically on the design, implementation, and evaluation of the **orchestration layer** and its integration interface. + +The evaluated software artifact consists of: + +- `modules/orchestrator/` → central orchestration service (primary contribution) +- `modules/integration-pilot/` → controlled validation module used to test integration and evaluation conditions +- `compose.yaml` → minimal deployment configuration used to run the system + +Other modules present in the repository are part of the wider collaborative project and are **not part of the evaluated contribution**. + +## Evaluated System State + +The version contained in this branch corresponds to the **instrumented evaluation state** of the system. + +Starting from a working end-to-end orchestration prototype (baseline MVP), the system was incrementally extended to enable empirical evaluation of the following properties: + +- reproducibility +- traceability +- artifact integrity +- execution trace reconstruction +- failure handling robustness + +These properties were evaluated through a structured experimental framework (E1–E5) as described in the thesis. + +## Important Notes + +- The orchestrator was extended with structured metadata capture and hashing mechanisms to support empirical validation. +- The integration-pilot module was intentionally used as a controlled environment to isolate and test orchestration behavior before integrating external modules. +- A controlled failure trigger used exclusively for evaluation purposes has been disabled in this version. + +## How to Navigate + +For reviewers interested in the evaluated artifact: + +1. Start with: `modules/orchestrator/` +2. See integration behavior in: `modules/integration-pilot/` +3. Use `compose.yaml` to understand how services are connected + +This subset of the repository corresponds to the system evaluated in the thesis. \ No newline at end of file