Skip to content

Conversation

@yahyaghani
Copy link
Contributor

@yahyaghani yahyaghani commented Oct 27, 2025

Description

This PR implements the ProjectDiscovery Cloud integration with the changelogs and export datastream.

Current Status:

  • ✅ CEL input with offset-based pagination
  • ✅ ECS-compliant field mapping
  • ✅ Request tracing support for debugging
  • ✅ Docker mock server for testing
  • ✅ Pipeline and system tests
  • ✅ Rally benchmark configuration

Implementation Details:

  • Uses CEL (Common Expression Language) input for efficient API polling
  • Pulls vulnerability status change events from /v1/scans/vuln/changelogs
  • Supports configurable batch size, time windows, and collection intervals
  • Publisher pipeline host enrichment disabled (agentless-ready)

Relates to #15061

TODO:

  • Implement second datastream for /v1/scans/results/export endpoint (all vulnerability results)
  • Final documentation and field mapping review

@yahyaghani yahyaghani requested a review from a team as a code owner October 27, 2025 00:19
@yahyaghani yahyaghani marked this pull request as draft October 27, 2025 00:20
@andrewkroh andrewkroh added documentation Improvements or additions to documentation. Applied to PRs that modify *.md files. New Integration Issue or pull request for creating a new integration package. labels Oct 27, 2025
Copy link
Contributor

@clement-fouque clement-fouque left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't been able to ingest data into the stack. I have the following error:

"error": {
      "message": [
        "failed eval: ERROR: <input>:26:32: no such overload\n |       (resp.StatusCode == 200) ?\n | ...............................^",
        "Processor json with tag json_event_original in pipeline logs-projectdiscovery_cloud.changelogs-0.1.1 failed with message: field [original] not present as part of path [event.original]"
      ]
    },

Comment on lines 10 to 32
- name: base_url
type: text
title: ProjectDiscovery Cloud API Base URL
description: The base URL for the ProjectDiscovery Cloud API (e.g., https://api.projectdiscovery.io)
multi: false
required: true
show_user: true
default: https://api.projectdiscovery.io
- name: api_key
type: password
title: API Key
description: The API key for authenticating to ProjectDiscovery Cloud.
multi: false
required: true
show_user: true
secret: true
- name: team_id
type: text
title: Team ID
description: The Team ID for your ProjectDiscovery Cloud account.
multi: false
required: true
show_user: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- name: base_url
type: text
title: ProjectDiscovery Cloud API Base URL
description: The base URL for the ProjectDiscovery Cloud API (e.g., https://api.projectdiscovery.io)
multi: false
required: true
show_user: true
default: https://api.projectdiscovery.io
- name: api_key
type: password
title: API Key
description: The API key for authenticating to ProjectDiscovery Cloud.
multi: false
required: true
show_user: true
secret: true
- name: team_id
type: text
title: Team ID
description: The Team ID for your ProjectDiscovery Cloud account.
multi: false
required: true
show_user: true

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those properties are already defined at the “parent” level. As they will be consistent across all data streams, it’s best to define them at the parent level.

Image

**Minimum versions:**
- Kibana: `^9.1.0`
- Elasticsearch: Compatible with Kibana version
- Elastic subscription: `platinum`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it's false since it should be available in the basic/open-source subscription.

@@ -0,0 +1,499 @@
# ProjectDiscovery Cloud Integration
Copy link
Contributor

@clement-fouque clement-fouque Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation is automatically generated if you define it in packages/XXX/_dev/build/docs/README.md

@yahyaghani yahyaghani force-pushed the feat/projectdiscovery-cloud-integration branch from 5d9cfc9 to 3c4a36e Compare December 9, 2025 01:13
@elastic-vault-github-plugin-prod

🚀 Benchmarks report

To see the full report comment with /test benchmark fullreport

Copy link
Contributor

@qcorporation qcorporation left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll want to modify

  • integrations/.github/ISSUE_TEMPLATE/integration_bug.yml
  • integrations/.github/ISSUE_TEMPLATE/integration_feature_request.yml

warning: incomplete review - we'll need to come back to this

@@ -0,0 +1,21 @@
# newer versions go on top
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll want to collapse this into one entry as there will be one PR to reference to merge into main

@@ -0,0 +1,191 @@
# ProjectDiscovery Cloud Integration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll most likely want to follow the documentation template that's been created by our team: https://www.elastic.co/docs/extend/integrations/documentation-guidelines

Reach out to @mjwolf and the Docs II team as he'll have the ability to auto-generate this documentation for you.

- security
conditions:
kibana:
version: "^9.1.0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to offer this integration to 8.x stack as well?


type: integration
categories:
- security
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might want to consider adding -cloud here

title: ProjectDiscovery Cloud
description: Collect vulnerability changelogs and export results from ProjectDiscovery Cloud
inputs:
- type: cel
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this cel input require ssl configuraiton?

type: keyword
- name: remediation
type: text
- name: reference
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use vulnerability.reference?

type: keyword
- name: category
type: keyword
- name: request
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use any of the ecs http.request.* fields?

type: keyword
- name: request
type: keyword
- name: response
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use any of the http.response.* fields?

- name: projectdiscovery
type: group
fields:
- name: target
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you use destination.* within ecs, possibly destination.address

type: keyword
- name: vuln_hash
type: keyword
- name: scan_id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use vulnerability.report_id

Copy link
Contributor

@clement-fouque clement-fouque left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I completed my review by adding multiple comments. Please let me know if you need clarification.

Comment on lines 22 to 28
# Custom vulnerability fields not in ECS 9.2.0
- name: vulnerability.status
type: keyword
description: The state of the vulnerability (e.g., open, closed, resolved).
- name: vulnerability.scanner.type
type: keyword
description: The type of vulnerability scanner used (e.g., nuclei).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned by Dan here:

If you can avoid adding capitalised fields to published products, that would be ideal

I believe they should be removed.

@@ -0,0 +1,96 @@
config_version: 2
interval: {{interval}}
resource.max_executions: 1000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

max_executions has a default value of 1000. I think it must be either removed, either we should add a field in the configuration (example).

Comment on lines 81 to 84
- rename:
field: json.vuln_status
target_field: vulnerability.status
ignore_missing: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This field doesn't exist in vulnerability ECS field. It must be removed.

Comment on lines 100 to 102
- set:
field: vulnerability.scanner.type
value: nuclei
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This field doesn't exist in vulnerability ECS field. It must be removed.

state.with(
{
"base": state.url.trim_right("/") + "/v1/scans/vuln/changelogs",
"q": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For readability, it would be easier to rename p as post and q as query

@@ -0,0 +1,33 @@
- external: ecs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it's required as ECS fields could be inherited (to be confirmed). For example, in the Qualys GAV, cloud ECS fields are not manually defined: https://github.com/elastic/integrations/tree/main/packages/qualys_gav/data_stream/asset/fields

multi: false
required: false
show_user: true
default: low
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe default should be removed (or set to low,medium,high,critical).

Comment on lines 65 to 84
- date:
field: json.created_at
target_field: '@timestamp'
formats:
- ISO8601
if: ctx.json?.created_at != null && ctx.json.created_at != ''
on_failure:
- append:
field: error.message
value: 'Processor {{{_ingest.on_failure_processor_type}}} with tag {{{_ingest.on_failure_processor_tag}}} in pipeline {{{_ingest.on_failure_pipeline}}} failed with message: {{{_ingest.on_failure_message}}}'
- date:
field: json.updated_at
target_field: '@timestamp'
formats:
- ISO8601
if: ctx.json?.created_at == null && ctx.json?.updated_at != null && ctx.json.updated_at != ''
on_failure:
- append:
field: error.message
value: 'Processor {{{_ingest.on_failure_processor_type}}} with tag {{{_ingest.on_failure_processor_tag}}} in pipeline {{{_ingest.on_failure_pipeline}}} failed with message: {{{_ingest.on_failure_message}}}'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the behaviour of modifying the @timestamp based on the created_at or updated_at fields. It makes trending impossible, unless we are created a transform that will store daily values.

Image

I would be in favour to delete them in order to store the full export at each interval.

Comment on lines +34 to +39
- set:
field: event.module
value: projectdiscovery_cloud
- set:
field: event.dataset
value: projectdiscovery_cloud.changelogs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you align field names, dataset name and tags to either projectdiscovery or projectdiscovery_cloud ?

@@ -0,0 +1,89 @@
title: Collect Vulnerability Results from ProjectDiscovery Cloud
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know the root cause but fields nesting is not working on export datastream. It's working on changelogs though.

Image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changelogs

Image

@yahyaghani yahyaghani force-pushed the feat/projectdiscovery-cloud-integration branch 5 times, most recently from a68ca88 to ca7cc91 Compare December 18, 2025 09:49
@yahyaghani yahyaghani changed the title [DRAFT] Add ProjectDiscovery Cloud integration with changelog datastream Add ProjectDiscovery Cloud integration with changelog datastream Dec 18, 2025
@yahyaghani yahyaghani marked this pull request as ready for review December 18, 2025 10:27
- Implements changelogs datastream for incremental vulnerability updates
- Includes optional target field sanitizer (gated by 'sanitize_target' tag) to clean malformed API responses
- Pipeline tests for both pass-through and sanitized cases
- Full ECS mapping with vulnerability fields
- Supports configurable interval, batch_size, and time_window
- Auto-generated documentation in _dev/build/docs/README.md
yahyaghani added a commit to yahyaghani/integrations that referenced this pull request Jan 2, 2026
…ngelog per Quan’s review (elastic#15760)

Summary
- Map HTTP bodies to ECS:
  - http.request.body.content
  - http.response.body.content
- Map target → destination.address with IP/domain derivation
- Map scan_id → vulnerability.report_id
- Use vulnerability.reference exclusively (remove vendor duplicate)
- Use host.hostname exclusively (remove vendor duplicate)
- Map vendor tags → ECS tags; remove json.tags
- Set vulnerability.scanner.vendor=ProjectDiscovery; remove non-ECS scanner.type
- Remove vendor-specific duplicate fields (projectdiscovery.*)
- Add cloud category to package manifest
- Widen Kibana version support to ^8.18.0 || ^9.0.0
- Collapse changelog into a single 0.1.1 entry (per package guidelines)

Deferred (intentional)
- SSL/TLS configuration: optional; defer until a concrete need (custom CA, TLS-inspecting proxy, self-signed certs). If required, follow standard CEL pattern (manifest var + resource.ssl wiring).
- Documentation template: defer until @mjwolf returns to coordinate auto-generation and template adoption (align badges/structure).
- Issue templates: repo-wide infra; Quan indicated these will be handled centrally.

Impact
- Improves ECS compliance and consistency across data streams
- Reduces noise by removing non-ECS and vendor-duplicate fields
- Improves package discoverability (cloud category) and broadens compatibility (8.18+)

References
- Addresses Quan’s requested changes
- Changelog entry linked to PR elastic#15760
@yahyaghani yahyaghani force-pushed the feat/projectdiscovery-cloud-integration branch from ca7cc91 to e6ac7cd Compare January 2, 2026 06:30
yahyaghani added a commit to yahyaghani/integrations that referenced this pull request Jan 2, 2026
…ngelog per Quan’s review (elastic#15760)

Summary
- Map HTTP bodies to ECS:
  - http.request.body.content
  - http.response.body.content
- Map target → destination.address with IP/domain derivation
- Map scan_id → vulnerability.report_id
- Use vulnerability.reference exclusively (remove vendor duplicate)
- Use host.hostname exclusively (remove vendor duplicate)
- Map vendor tags → ECS tags; remove json.tags
- Set vulnerability.scanner.vendor=ProjectDiscovery; remove non-ECS scanner.type
- Remove vendor-specific duplicate fields (projectdiscovery.*)
- Add cloud category to package manifest
- Widen Kibana version support to ^8.18.0 || ^9.0.0
- Collapse changelog into a single 0.1.1 entry (per package guidelines)

Deferred (intentional)
- SSL/TLS configuration: optional; defer until a concrete need (custom CA, TLS-inspecting proxy, self-signed certs). If required, follow standard CEL pattern (manifest var + resource.ssl wiring).
- Documentation template: defer until @mjwolf returns to coordinate auto-generation and template adoption (align badges/structure).
- Issue templates: repo-wide infra; Quan indicated these will be handled centrally.

Impact
- Improves ECS compliance and consistency across data streams
- Reduces noise by removing non-ECS and vendor-duplicate fields
- Improves package discoverability (cloud category) and broadens compatibility (8.18+)

References
- Addresses Quan’s requested changes
- Changelog entry linked to PR elastic#15760
@yahyaghani yahyaghani force-pushed the feat/projectdiscovery-cloud-integration branch from e6ac7cd to e3f83fc Compare January 2, 2026 06:57
…ngelog per Quan’s review (elastic#15760)

Summary
- Map HTTP bodies to ECS:
  - http.request.body.content
  - http.response.body.content
- Map target → destination.address with IP/domain derivation
- Map scan_id → vulnerability.report_id
- Use vulnerability.reference exclusively (remove vendor duplicate)
- Use host.hostname exclusively (remove vendor duplicate)
- Map vendor tags → ECS tags; remove json.tags
- Set vulnerability.scanner.vendor=ProjectDiscovery; remove non-ECS scanner.type
- Remove vendor-specific duplicate fields (projectdiscovery.*)
- Add cloud category to package manifest
- Widen Kibana version support to ^8.18.0 || ^9.0.0
- Collapse changelog into a single 0.1.1 entry (per package guidelines)

Deferred (intentional)
- SSL/TLS configuration: optional; defer until a concrete need (custom CA, TLS-inspecting proxy, self-signed certs). If required, follow standard CEL pattern (manifest var + resource.ssl wiring).
- Documentation template: defer until @mjwolf returns to coordinate auto-generation and template adoption (align badges/structure).
- Issue templates: repo-wide infra; Quan indicated these will be handled centrally.

Impact
- Improves ECS compliance and consistency across data streams
- Reduces noise by removing non-ECS and vendor-duplicate fields
- Improves package discoverability (cloud category) and broadens compatibility (8.18+)

References
- Addresses Quan’s requested changes
- Changelog entry linked to PR elastic#15760
@yahyaghani yahyaghani force-pushed the feat/projectdiscovery-cloud-integration branch from e3f83fc to c3b370b Compare January 3, 2026 04:34
yahyaghani and others added 2 commits January 3, 2026 06:45
Implement Clement's requested changes for changelogs data stream:

- Switch vulnerability field mappings from `rename` to `set/copy_from` for better transparency
- Remove non-ECS `vulnerability.status` field mapping
- Add vendor-specific `projectdiscovery.vuln_status` field to preserve status information
- Update event message template to use `{{projectdiscovery.vuln_status}}`
- Remove `vulnerability.status` field definition from schema (not in ECS spec)
- Add `projectdiscovery.vuln_status` keyword field to schema

This change ensures better ECS compliance while preserving all vendor-specific
data in the projectdiscovery namespace, aligning with the preserve_duplicate_custom_fields pattern.
Address review feedback from Clement Fouque, including critical system test fixes and four enhancement areas.

## System Test Fixes

Fixed CEL syntax errors blocking system tests:
- Added missing closing parentheses in state.with() calls
- Removed trailing commas in CEL object literals
- Removed publisher_pipeline.disable_host setting that blocked data routing

System tests now pass: changelogs (4 hits), export (1 hit)

## SSL/TLS Configuration Support

Added optional SSL configuration for enterprise environments:
- New `ssl` variable in both data stream manifests
- Supports verification_mode, certificate_authorities, ca_trusted_fingerprint
- Default: all commented out, doesn't affect standard HTTPS

## Export Timestamp Semantics

Changed export data stream to use ingestion time for @timestamp:
- Removed date processors that toggled between created_at/updated_at
- Vendor timestamps preserved in projectdiscovery.created_at/updated_at
- Enables proper trending for snapshot-style exports

## Field Organization

- Removed redundant ecs.yml files (inherited via index templates)
- Removed subobjects: false from export manifest (fixes field nesting)
- Removed default: low from severity filter (matches "export all" description)

## Test Results

- ✅ Package check: Pass
- ✅ Pipeline tests: Pass (changelogs 4/4, export 2/2)
- ✅ System tests: Pass (4 hits, 1 hit)

Fixes elastic#15061

Co-authored-by: Clement Fouque <[email protected]>
@yahyaghani yahyaghani force-pushed the feat/projectdiscovery-cloud-integration branch from 7d39066 to 88d1688 Compare January 4, 2026 23:10
@yahyaghani yahyaghani force-pushed the feat/projectdiscovery-cloud-integration branch from 2e45cee to d0dc964 Compare January 5, 2026 03:10
@elasticmachine
Copy link

💚 Build Succeeded

History

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation. Applied to PRs that modify *.md files. New Integration Issue or pull request for creating a new integration package.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants