Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
155 changes: 155 additions & 0 deletions .github/skills/create-model-pack/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,155 @@
---
name: create-model-pack
description: Create or update a CodeQL model pack of `.model.yml` data extension files for an unmodeled (or under-modeled) library or framework, including local repo-scoped extensions under `.github/codeql/extensions/` and reusable model packs under `languages/<language>/custom/src/`. Use when a user asks to "model a library", "add a data extension", "add sources/sinks/summaries/barriers/barrier-guards for <library>", "create a model pack", or wants CodeQL to recognize calls in a third-party package that currently produce no findings.
---

# Create a CodeQL Model Pack

This skill describes the end-to-end procedure for authoring a CodeQL data extension (a `.model.yml` file) and packaging it either as a repo-local extension or as a reusable model pack. It complements the reference documentation in [`.github/prompts/data_extensions_development.prompt.md`](../../prompts/data_extensions_development.prompt.md) and the language-specific data extension prompts (e.g. [`python_data_extension_development.prompt.md`](../../prompts/python_data_extension_development.prompt.md), [`java_data_extension_development.prompt.md`](../../prompts/java_data_extension_development.prompt.md)).

Once the model pack is ready to ship to other repositories or to org-wide Default Setup, follow up with the [`publish-model-pack`](../publish-model-pack/SKILL.md) skill.

## When to use this skill

Trigger this skill when the user wants to:

- Add CodeQL coverage for a library/framework that produces no findings today.
- Add or correct sources, sinks, summaries, barriers (sanitizers), or barrier guards (validators) for a specific package.
- Bootstrap a new `.model.yml` file under `.github/codeql/extensions/` (single-repo) or under `languages/<language>/custom/src/` (reusable pack).

If the user instead wants to write a custom CodeQL `.ql` query, use the query development prompts rather than this skill.

## Prerequisites

- The `codeql` CLI is available (preinstalled in this template's environment via [`.github/workflows/copilot-setup-steps.yml`](../../workflows/copilot-setup-steps.yml)).
- A CodeQL database for the target language is available, or sample code from which one can be built with `codeql database create`.
- Familiarity with the two tuple formats:
- **API Graph format** β€” Python, Ruby, JavaScript/TypeScript (3–5 columns).
- **MaD format** β€” Java/Kotlin, C#, Go, C/C++ (9–10 columns; includes `subtypes` and `provenance`).

See the "Two Model Formats" and "Quick reference" tables in [`data_extensions_development.prompt.md`](../../prompts/data_extensions_development.prompt.md) for the canonical column layouts and examples.

## Procedure

### 1. Identify the target library and language

- Confirm the library name, version, and the CodeQL language it targets (`python`, `ruby`, `javascript`, `java`, `csharp`, `go`, `cpp`, `actions`).
- Confirm whether the language uses **API Graph** or **MaD** tuples β€” pick the wrong format and the extension will silently fail to load.
- Skim the library's public API surface (docs, type stubs, or source) so you can classify methods in the next step.

### 2. Classify the API surface

For each public method, function, or class on the library, ask:

1. Does it return data from outside the program (network, file, env, stdin)? β†’ **sourceModel** (pick a `kind` in the appropriate threat model β€” usually `remote`).
2. Does it consume data in a security-sensitive operation (SQL, exec, path, redirect, eval, deserialize)? β†’ **sinkModel** (pick a `kind` matching the vulnerability class, e.g. `sql-injection`, `command-injection`, `path-injection`).
3. Does it pass data through opaque library code (encode, decode, wrap, copy, iterate)? β†’ **summaryModel** with `kind: taint` (derived) or `kind: value` (identity).
4. Does it sanitize data so its output is safe for a specific sink kind? β†’ **barrierModel** (`kind` must match the sink kind it neutralizes).
5. Does it return a boolean indicating whether data is safe? β†’ **barrierGuardModel** with the appropriate `acceptingValue` (`"true"` or `"false"`) and matching `kind`.
6. Is the type a subclass of something already modeled? β†’ **typeModel** (API Graph languages only) or set `subtypes: True` in the MaD tuple.
7. Did the auto-generated model assign a wrong summary? β†’ **neutralModel** to suppress it.

A complete chain of source β†’ (summary\*) β†’ sink is required for end-to-end findings; missing a single hop will cause false negatives.

### 3. Choose the deployment scope

Decide between two paths and the directory layout follows:

- **Single-repo shortcut** β€” drop `.model.yml` files directly under `.github/codeql/extensions/<pack-name>/` in the consuming repo. **No `qlpack.yml` is required**; Code Scanning auto-loads extensions from this directory. Use this when the models only need to apply to one repo and you do not want to version-publish them.
- **Reusable model pack** β€” create the files under a pack directory in this template (e.g. `languages/<language>/custom/src/models/`) with a `qlpack.yml` declaring `extensionTargets` and `dataExtensions`. Use this when the models will be consumed by multiple repos or by org-wide Default Setup. Publishing is handled by the [`publish-model-pack`](../publish-model-pack/SKILL.md) skill.

### 4. Author the `.model.yml` file(s)

- Use the naming convention `<library>-<module>.model.yml` (lowercase, hyphen-separated). Split per logical module rather than putting an entire ecosystem in one file β€” e.g. `databricks-sql.model.yml`, `databricks-sdk.model.yml`.
- Begin each file with the standard header and the extensible predicates that apply, for example:

```yaml
extensions:
- addsTo:
pack: codeql/<language>-all
extensible: sinkModel
data:
# API Graph (Python/Ruby/JS): [type, path, kind]
- ['mylib', 'Member[connect].ReturnValue.Member[execute].Argument[0]', 'sql-injection']
# MaD (Java/C#/Go/C++): [package, type, subtypes, name, signature, ext, input, kind, provenance]
# - ['java.sql', 'Statement', true, 'execute', '(String)', '', 'Argument[0]', 'sql-injection', 'manual']
- addsTo:
pack: codeql/<language>-all
extensible: summaryModel
data: []
```

- Every row must have the exact column count for that extensible predicate β€” see the "Two Model Formats" tables in [`data_extensions_development.prompt.md`](../../prompts/data_extensions_development.prompt.md). An invalid row will fail the engine.
- Use `provenance: 'manual'` (MaD) for hand-written rows; reserve `'df-generated'` for output of the model generator.

### 5. Configure `qlpack.yml` (model-pack path only)

Skip this step if you chose the `.github/codeql/extensions/` shortcut in step 3.

For a reusable pack (e.g. `languages/<language>/custom/src/qlpack.yml`), add or confirm:

```yaml
name: <org>/<language>-<pack-name>
version: 0.0.1
library: true
extensionTargets:
codeql/<language>-all: '*'
dataExtensions:
- models/**/*.yml
```

- `library: true` β€” model packs are always libraries, never queries.
- `extensionTargets` β€” names the upstream pack (and version range) the extensions extend.
- `dataExtensions` β€” a glob that picks up every `.model.yml` you author in step 4.

### 6. Test locally with `codeql query run`

Validate the model pack against a real database before relying on it:

```bash
codeql query run \
--database=/path/to/db \
--additional-packs=<path-to-pack-dir> \
--output=/tmp/results.bqrs \
-- <path-to-relevant-query>.ql

codeql bqrs decode --format=text /tmp/results.bqrs
```

- For published packs, swap `--additional-packs=<dir>` for `--model-packs=<org>/<pack>@<range>`.
- Pick a query whose sink kind matches what you modeled (e.g. a `sql-injection` query when adding SQL sinks). See [`codeql query run`](../../../resources/cli/codeql/codeql_query_run.prompt.md).

### 7. Run unit tests with `codeql test run`

`codeql test run` does **not** accept `--model-packs`; data extensions are wired in via `qlpack.yml`. The test pack must depend on the model pack, then:

```bash
codeql test run \
--additional-packs=<path-to-model-pack-dir> \
--keep-databases \
--show-extractor-output \
-- languages/<language>/<pack-basename>/test/<QueryBasename>/
```

Add a small test case under `languages/<language>/custom/test/` (or your project's equivalent) that exercises the new source/sink/summary chain end-to-end and accept its `.expected` output once you have confirmed it is correct. See [`codeql test run`](../../../resources/cli/codeql/codeql_test_run.prompt.md).

### 8. Decide on next steps

- If the `.model.yml` lives under `.github/codeql/extensions/` of the consuming repo, you are done β€” Code Scanning will load it on the next analysis.
- If you authored a reusable model pack and want it to apply across an organization, continue with the [`publish-model-pack`](../publish-model-pack/SKILL.md) skill.

## Validation checklist

- [ ] Correct tuple format for the language (API Graph vs MaD).
- [ ] Every row has the exact column count for its extensible predicate.
- [ ] Sink/barrier `kind` values match across the chain (e.g. a `sql-injection` barrier must guard a `sql-injection` sink).
- [ ] At least one end-to-end test exercises the new model and produces the expected finding.
- [ ] `qlpack.yml` `dataExtensions` glob actually matches the new files (verify by running `codeql resolve library-path`).
- [ ] No regressions in pre-existing tests under the same pack.

## Related resources

- [`data_extensions_development.prompt.md`](../../prompts/data_extensions_development.prompt.md) β€” reference for tuple formats, threat models, and access path syntax.
- Language-specific data extension prompts in [`.github/prompts/`](../../prompts/) (one per supported language).
- [`publish-model-pack`](../publish-model-pack/SKILL.md) β€” follow-up skill for shipping the pack to GHCR and Default Setup.
- [`codeql query run`](../../../resources/cli/codeql/codeql_query_run.prompt.md) and [`codeql test run`](../../../resources/cli/codeql/codeql_test_run.prompt.md) β€” CLI references used in steps 6 and 7.
131 changes: 131 additions & 0 deletions .github/skills/publish-model-pack/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
---
name: publish-model-pack
description: Publish an existing CodeQL model pack to GitHub Container Registry (GHCR) with `codeql pack create` / `codeql pack publish`, and configure it for org-wide use under Code Scanning Default Setup. Use when a user asks to "publish a model pack", "push a model pack to GHCR", "release a new version of <pack>", "add a model pack to Default Setup", or "make my custom data extensions apply across the organization".
---

# Publish a CodeQL Model Pack

This skill describes the procedure for shipping an existing CodeQL model pack β€” built with the [`create-model-pack`](../create-model-pack/SKILL.md) skill or already present under `languages/<language>/custom/src/` β€” to GHCR and wiring it into org-wide Code Scanning Default Setup.

This is the right skill **only when the consumers must include other repositories** in your organization. If the data extensions are needed only by one repository, prefer the `.github/codeql/extensions/` shortcut described in the [`create-model-pack`](../create-model-pack/SKILL.md) skill β€” no publish step is required.

## When to use this skill

Trigger this skill when the user wants to:

- Push a new or updated model pack to GHCR.
- Release a new semver version of an existing model pack.
- Configure an org so Default Setup automatically picks up a custom model pack.
- Diagnose why a published model pack is not being applied during Code Scanning analyses.

## Prerequisites

- The model pack already exists locally and has at least one valid `.model.yml`. If not, run the [`create-model-pack`](../create-model-pack/SKILL.md) skill first.
- The `codeql` CLI is available and authenticated to GHCR. On agent runners, the standard `GITHUB_TOKEN` (with `packages: write`) is sufficient; locally you may need `gh auth login` or a PAT exported as `CODEQL_REGISTRIES_AUTH` / `GITHUB_TOKEN`.
- You have write access (`packages: write`) to the GHCR namespace named in the pack's `name` field (e.g. `<org>/<language>-<pack-name>`).
- For the org-wide configuration step, you must have organization-owner or "Manage Code Security settings" permission for the target org.

## Procedure

### 1. Verify `qlpack.yml` is publish-ready

Open the pack's `qlpack.yml` (typically `languages/<language>/custom/src/qlpack.yml`) and confirm:

```yaml
name: <org>/<language>-<pack-name> # must match the GHCR org/repo namespace you can publish to
version: 1.0.0 # semver β€” see step 5 for version bumps
library: true # model packs are always libraries
extensionTargets:
codeql/<language>-all: '*' # or a tighter range like ^1.0.0
dataExtensions:
- models/**/*.yml # glob must actually match your .model.yml files
```

Sanity checks:

- `name` is fully qualified (`<scope>/<pack>`); the scope must be a GHCR namespace you can push to.
- `version` is a valid semver string and is **strictly greater** than the latest version already on GHCR (publishing the same version will fail).
- `extensionTargets` references the upstream pack the extensions extend (`codeql/<language>-all`). The version range determines which CodeQL releases the pack is compatible with.
- `dataExtensions` glob resolves to the expected file list β€” confirm with:

```bash
ls -1 $(dirname <path-to-qlpack.yml>)/models/**/*.yml
```

### 2. Build the pack with `codeql pack create`

From the directory containing `qlpack.yml`:

```bash
codeql pack create \
--output=/tmp/codeql-pack-out \
.
```

- The output directory will contain a versioned subtree (`<scope>/<pack>/<version>/`) ready for upload.
- `codeql pack create` will fail fast on malformed `.model.yml` rows or unresolved `extensionTargets`. Fix any reported errors before proceeding. Run `codeql pack create -h -vv` for full help.

### 3. Publish to GHCR with `codeql pack publish`

```bash
codeql pack publish .
```

- `codeql pack publish` re-runs the build then pushes the resulting OCI artifact to `ghcr.io/<scope>/<pack>:<version>` (and updates the `latest` tag).
- Authentication: ensure `GITHUB_TOKEN` (or a PAT with `write:packages`) is exported. On a workflow runner, set `permissions: { packages: write }` on the job. Run `codeql pack publish -h -vv` for full help.
- Confirm the push by either checking the package under `https://github.com/orgs/<scope>/packages` or running:

```bash
codeql pack download <scope>/<pack>@<version>
```

### 4. Configure org-wide Default Setup

To apply the published model pack to every Default Setup analysis in the org:

1. Navigate to the org settings: **Code security β†’ Global settings β†’ CodeQL analysis** (also accessible via **Security β†’ Advanced Security β†’ Global settings β†’ Expand CodeQL analysis** depending on the UI version).
2. Under **Model packs**, click **Add model pack** and enter `<scope>/<pack>` (optionally pinned to a version range, e.g. `<scope>/<pack>@^1.0.0`).
3. Save. Default Setup will pick up the pack on the next scheduled or push-triggered analysis for repos that target the relevant language.

References:

- [Configure organization-level CodeQL model packs](https://github.blog/changelog/2024-04-16-configure-organization-level-codeql-model-packs-for-github-code-scanning/)
- [Extending CodeQL coverage with model packs in Default Setup](https://docs.github.com/en/code-security/how-tos/find-and-fix-code-vulnerabilities/manage-your-configuration/editing-your-configuration-of-default-setup#extending-codeql-coverage-with-codeql-model-packs-in-default-setup)
- [Configuring Default Setup at scale](https://docs.github.com/en/code-security/how-tos/secure-at-scale/configure-organization-security/configure-specific-tools/configuring-default-setup-for-code-scanning-at-scale)

### 5. Version management

- Use **semver** for `version`. Bump `patch` for additive rows that don't change semantics, `minor` for new model categories or substantial new coverage, `major` for breaking changes (renames, removals, format changes).
- For each release, bump `version` in `qlpack.yml` **before** running `codeql pack publish` β€” re-publishing the same version fails.
- If the org-level configuration uses a range (e.g. `@^1.0.0` or no pin at all), Default Setup automatically resolves the **latest matching** version on every run; consumers do not need to take any action to receive a new minor/patch release.
- If the org-level configuration is pinned to an exact version, you must update it after each release.

### 6. Validate the published pack is being applied

Pick a repository covered by Default Setup that contains code exercising the new models, then:

1. Trigger a Code Scanning run (push to the default branch or click **Re-run all jobs** on the latest CodeQL workflow).
2. Open the workflow logs for the CodeQL Analyze job and look for log lines confirming the pack was downloaded and its data extensions were loaded β€” typically lines containing `<scope>/<pack>` and the resolved version, alongside extension counts.
3. Confirm that new alerts attributable to the new sources/sinks/summaries appear in the Code Scanning alerts view (or, if you intentionally added barriers/neutrals, that previously-flagged false-positive alerts are now suppressed).

If the pack does not appear in the logs:

- Re-check that `name` in `qlpack.yml` matches exactly what is configured in the org settings.
- Verify the version range in org settings (or `extensionTargets` in the pack) is satisfiable by what's published.
- Confirm the consumer repo's language is included in the pack's `extensionTargets` (e.g. a `codeql/python-all` extension only fires for Python repos).
- Pull the pack manually with `codeql pack download <scope>/<pack>@<version>` to rule out access/visibility problems.

## Validation checklist

- [ ] `qlpack.yml` `version` strictly greater than the previously published version.
- [ ] `codeql pack create` succeeds with no errors or warnings about unknown rows.
- [ ] `codeql pack publish` reports a successful push and the package is visible under the org's GHCR packages.
- [ ] The pack is listed under the org's Default Setup model packs configuration.
- [ ] A subsequent CodeQL workflow run logs the pack as loaded and surfaces the expected new alerts (or suppressions).

## Related resources

- [`create-model-pack`](../create-model-pack/SKILL.md) β€” upstream skill that produces the model pack consumed here.
- [`data_extensions_development.prompt.md`](../../prompts/data_extensions_development.prompt.md) β€” reference for `qlpack.yml` shape (`extensionTargets`, `dataExtensions`) and the workflow context.
- [`codeql pack install`](../../../resources/cli/codeql/codeql_pack_install.prompt.md) β€” companion CLI reference; for `pack create`, `pack publish`, and `pack download` use `codeql <subcommand> -h -vv`.
- [CodeQL now supports sanitizers and validators in models-as-data](https://github.blog/changelog/2026-04-21-codeql-now-supports-sanitizers-and-validators-in-models-as-data/) β€” recent capability that may motivate a pack version bump.
32 changes: 32 additions & 0 deletions languages/actions/custom/src/codeql-pack.lock.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
---
lockVersion: 1.0.0
dependencies:
codeql/actions-all:
version: 0.4.33
codeql/concepts:
version: 0.0.21
codeql/controlflow:
version: 2.0.31
codeql/dataflow:
version: 2.1.3
codeql/javascript-all:
version: 2.6.27
codeql/mad:
version: 1.0.47
codeql/regex:
version: 1.0.47
codeql/ssa:
version: 2.0.23
codeql/threat-models:
version: 1.0.47
codeql/tutorial:
version: 1.0.47
codeql/typetracking:
version: 2.0.31
codeql/util:
version: 2.0.34
codeql/xml:
version: 1.0.47
codeql/yaml:
version: 1.0.47
compiled: false
Loading
Loading