Skip to content

Conversation

@shige
Copy link
Member

@shige shige commented Nov 27, 2025

User description

Summary

  • Add shared buildAiGatewayHeaders function in vector-stores/shared/ for consistent header construction
  • Add headers parameter to RAG ingest pipeline options
  • Pass stripe-customer-id headers through GitHub blob, PR, and issue ingest functions
  • Refactor document ingest to use the shared buildAiGatewayHeaders function

This ensures AI Gateway requests during GitHub repository ingestion include Stripe customer ID for billing attribution, completing the header propagation across all vector store ingest flows.

Related Issue

Part of #2233

Testing

#2289 (comment)


PR Type

Enhancement


Description

  • Create shared buildAiGatewayHeaders utility function for consistent header construction across vector store ingestion flows

  • Add headers parameter to RAG ingest pipeline options for AI Gateway request propagation

  • Pass stripe-customer-id headers through GitHub blob, PR, and issue ingest functions

  • Refactor document ingest to use shared utility function, improving code reusability and maintainability


Diagram Walkthrough

flowchart LR
  A["Shared buildAiGatewayHeaders"] --> B["Document Ingest"]
  A --> C["GitHub Blob Ingest"]
  A --> D["GitHub PR Ingest"]
  A --> E["GitHub Issue Ingest"]
  C --> F["RAG Pipeline"]
  D --> F
  E --> F
  B --> F
  F --> G["AI Gateway Requests with Billing Attribution"]
Loading

File Walkthrough

Relevant files
Enhancement
ai-gateway-headers.ts
Create shared AI Gateway headers utility function               

apps/studio.giselles.ai/lib/vector-stores/shared/ai-gateway-headers.ts

  • New shared utility function buildAiGatewayHeaders for building AI
    Gateway headers with Stripe customer ID
  • Extracts common header construction logic previously duplicated in
    document ingest
  • Handles billing attribution through stripe-customer-id and
    stripe-restricted-access-key headers
  • Logs warning for pro/team plans without customer ID
+29/-0   
process-repository.ts
Add header generation to GitHub content processors             

apps/studio.giselles.ai/lib/vector-stores/github/ingest/process-repository.ts

  • Import shared buildAiGatewayHeaders function
  • Call buildAiGatewayHeaders in blob, PR, and issue content processors
  • Pass generated headers to respective ingest functions for billing
    attribution
  • Ensure consistent header propagation across all GitHub content types
+10/-0   
ingest-github-blobs.ts
Add headers parameter to blob ingest function                       

apps/studio.giselles.ai/lib/vector-stores/github/ingest/blobs/ingest-github-blobs.ts

  • Add optional headers parameter to function signature
  • Pass headers through to RAG ingest pipeline configuration
  • Enable AI Gateway billing attribution for blob ingestion
+2/-0     
ingest-github-pull-requests.ts
Add headers parameter to PR ingest function                           

apps/studio.giselles.ai/lib/vector-stores/github/ingest/pull-requests/ingest-github-pull-requests.ts

  • Add optional headers parameter to function signature
  • Pass headers through to RAG ingest pipeline configuration
  • Enable AI Gateway billing attribution for pull request ingestion
+2/-0     
ingest-github-issues.ts
Add headers parameter to issue ingest function                     

apps/studio.giselles.ai/lib/vector-stores/github/ingest/issues/ingest-github-issues.ts

  • Add optional headers parameter to function signature
  • Pass headers through to RAG ingest pipeline configuration
  • Enable AI Gateway billing attribution for issue ingestion
+2/-0     
pipeline.ts
Add headers support to RAG ingest pipeline                             

packages/rag/src/ingest/pipeline.ts

  • Add optional headers property to IngestPipelineOptions interface
  • Pass headers to embedder creation when using AI Gateway transport
  • Enable header propagation through RAG pipeline to AI Gateway requests
+4/-0     
Refactoring
ingest-document.ts
Refactor document ingest to use shared headers utility     

apps/studio.giselles.ai/lib/vector-stores/document/ingest/ingest-document.ts

  • Remove local buildAiGatewayHeaders function implementation
  • Import and use shared buildAiGatewayHeaders from vector-stores/shared
  • Update function call to pass team object instead of telemetryContext
  • Maintain existing billing attribution functionality with refactored
    code
+4/-25   

Summary by CodeRabbit

  • Refactor

    • Consolidated header handling across document and GitHub ingestion pipelines for improved consistency and maintainability.
    • Introduced a shared header builder utility to standardize AI Gateway request headers.
  • New Features

    • GitHub ingestion APIs (issues, pull requests, blobs) and the ingestion pipeline now accept and forward optional per-team request headers.

✏️ Tip: You can customize this high-level summary in your review settings.


Note

Introduces a shared header builder and threads Stripe billing headers through document, GitHub, and RAG ingest paths to AI Gateway.

  • Vector Stores:
    • Shared utility: Add buildAiGatewayHeaders in vector-stores/shared for billing attribution (Stripe headers, referer/title; warns when missing ID).
    • Document ingest: Remove local header builder; use shared buildAiGatewayHeaders(team) and pass headers to generateEmbeddings.
  • GitHub Ingest:
    • Add optional headers to ingestGitHubBlobs, ingestGitHubPullRequests, ingestGitHubIssues; construct via buildAiGatewayHeaders(team) in process-repository and pass through to pipelines.
  • RAG Pipeline:
    • Extend IngestPipelineOptions with headers; forward to embedder when using gateway transport.

Written by Cursor Bugbot for commit baed4dd. This will update automatically on new commits. Configure here.

@shige shige self-assigned this Nov 27, 2025
Copilot AI review requested due to automatic review settings November 27, 2025 14:21
@changeset-bot
Copy link

changeset-bot bot commented Nov 27, 2025

⚠️ No Changeset found

Latest commit: baed4dd

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

💥 An error occurred when fetching the changed packages and changesets in this PR
Some errors occurred when validating the changesets config:
The package or glob expression "giselles-ai" is specified in the `ignore` option but it is not found in the project. You may have misspelled the package name or provided an invalid glob expression. Note that glob expressions must be defined according to https://www.npmjs.com/package/micromatch.

@giselles-ai
Copy link

giselles-ai bot commented Nov 27, 2025

Finished running flow.

Step 1
🟢
On Pull Request OpenedStatus: Success Updated: Nov 27, 2025 2:21pm
Step 2
🟢
Manual QAStatus: Success Updated: Nov 27, 2025 2:24pm
🟢
Prompt for AI AgentsStatus: Success Updated: Nov 27, 2025 2:24pm
Step 3
🟢
Create a Comment for PRStatus: Success Updated: Nov 27, 2025 2:26pm
Step 4
🟢
Create Pull Request CommentStatus: Success Updated: Nov 27, 2025 2:27pm

@vercel
Copy link

vercel bot commented Nov 27, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
giselle Ready Ready Preview Comment Dec 2, 2025 0:42am
ui Ready Ready Preview Comment Dec 2, 2025 0:42am

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 27, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Adds a shared buildAiGatewayHeaders utility and threads optional headers?: Record<string,string> through document and GitHub ingestion entry points and the ingest pipeline so AI Gateway headers are constructed and propagated into embedder configuration.

Changes

Cohort / File(s) Summary
Shared header utility
apps/studio.giselles.ai/lib/vector-stores/shared/ai-gateway-headers.ts
New `buildAiGatewayHeaders(team: TeamWithSubscription
Document ingestion
apps/studio.giselles.ai/lib/vector-stores/document/ingest/ingest-document.ts
Removed local buildAiGatewayHeaders implementation and imports the shared buildAiGatewayHeaders; passes telemetryContext?.team ?? null to it.
GitHub ingestion entry points
apps/studio.giselles.ai/lib/vector-stores/github/ingest/blobs/ingest-github-blobs.ts, .../issues/ingest-github-issues.ts, .../pull-requests/ingest-github-pull-requests.ts
Each public ingest function signature gains an optional headers?: Record<string,string> parameter and forwards headers: params.headers into createPipeline options.
GitHub orchestration
apps/studio.giselles.ai/lib/vector-stores/github/ingest/process-repository.ts
Imports buildAiGatewayHeaders, constructs per-team headers via buildAiGatewayHeaders(team), and supplies those headers to calls to ingest blobs, issues, and pull requests.
Pipeline infra
packages/rag/src/ingest/pipeline.ts
IngestPipelineOptions gains headers?: Record<string,string> and headers are propagated into embedder options when configured to use the gateway.

Sequence Diagram

sequenceDiagram
    participant Orch as process-repository
    participant Builder as buildAiGatewayHeaders
    participant Blobs as ingestGitHubBlobs
    participant Issues as ingestGitHubIssues
    participant PRs as ingestGitHubPullRequests
    participant Pipeline as IngestPipeline
    participant Embed as EmbedderConfig

    Orch->>Builder: buildAiGatewayHeaders(team)
    Builder-->>Orch: headers

    Orch->>Blobs: ingestGitHubBlobs(..., headers)
    Blobs->>Pipeline: createPipeline({..., headers})
    Pipeline->>Embed: propagate headers when gateway enabled

    Orch->>Issues: ingestGitHubIssues(..., headers)
    Issues->>Pipeline: createPipeline({..., headers})
    Pipeline->>Embed: propagate headers when gateway enabled

    Orch->>PRs: ingestGitHubPullRequests(..., headers)
    PRs->>Pipeline: createPipeline({..., headers})
    Pipeline->>Embed: propagate headers when gateway enabled
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Review buildAiGatewayHeaders conditional logic around Stripe/customer ID and warning path.
  • Verify each ingestion entry point forwards headers consistently into createPipeline.
  • Confirm IngestPipelineOptions.headers is only applied to embedder config when gateway usage is detected.

Poem

🐰 I nibble headers, stitch them tight,
From team to pipeline, feather-light,
Blobs and issues, PRs in row,
Through gateway doors the secrets flow,
A little hop, ingestion aglow. 🥕

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main objective of the PR: adding stripe-customer-id header support to the GitHub ingest flow, which is the primary change across multiple ingest functions.
Docstring Coverage ✅ Passed Docstring coverage is 85.71% which is sufficient. The required threshold is 80.00%.
Description check ✅ Passed The PR description comprehensively covers all required template sections with clear context, objectives, and implementation details.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/add-ai-gateway-headers-to-github-ingest

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@qodo-merge-for-open-source
Copy link

qodo-merge-for-open-source bot commented Nov 27, 2025

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
Missing API key validation

Description: The stripe-restricted-access-key is retrieved from environment variable and could be an
empty string if not set, potentially allowing unauthorized access to Stripe billing
endpoints.
ai-gateway-headers.ts [20-21]

Referred Code
headers["stripe-restricted-access-key"] =
	process.env.STRIPE_AI_GATEWAY_RESTRICTED_ACCESS_KEY ?? "";
Ticket Compliance
🟡
🎫 #2233
🟢 Accurately track token consumption for each customer
Use Vercel AI Gateway as LLM proxy to track token usage
Monitor API requests and provide usage data (token counts) to report to Stripe
Successfully report usage data to the corresponding Stripe subscription item
Ensure Stripe invoices are generated correctly based on the reported token usage
Implement usage-based billing system based on LLM token consumption
Use Stripe's Usage-Based Billing for handling billing and invoicing
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

🔴
Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status:
Sensitive Data Logging: The warning log at line 23-25 exposes the team ID which may be considered sensitive
information that should not be logged.

Referred Code
console.warn(
	`Stripe customer ID not found for vector store ingest (team: ${team?.id})`,
);

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status:
Unvalidated Headers Propagation: Headers from external input are passed through without validation or sanitization, which
could potentially allow header injection attacks.

Referred Code
	headers: useGateway ? options.headers : undefined,
},

Learn more about managing compliance generic rules or creating your own custom rules

  • Update
Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

@qodo-merge-for-open-source
Copy link

qodo-merge-for-open-source bot commented Nov 27, 2025

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Prevent sending invalid billing headers

To prevent invalid billing headers, ensure both stripeCustomerId and
stripeRestrictedAccessKey are present before adding them to the request, and add
more specific warnings if either is missing.

apps/studio.giselles.ai/lib/vector-stores/shared/ai-gateway-headers.ts [17-26]

-const stripeCustomerId = team?.activeCustomerId ?? undefined;
-if (stripeCustomerId !== undefined) {
+const stripeCustomerId = team?.activeCustomerId;
+const stripeRestrictedAccessKey =
+	process.env.STRIPE_AI_GATEWAY_RESTRICTED_ACCESS_KEY;
+
+if (stripeCustomerId && stripeRestrictedAccessKey) {
 	headers["stripe-customer-id"] = stripeCustomerId;
-	headers["stripe-restricted-access-key"] =
-		process.env.STRIPE_AI_GATEWAY_RESTRICTED_ACCESS_KEY ?? "";
+	headers["stripe-restricted-access-key"] = stripeRestrictedAccessKey;
 } else if (team?.plan === "pro" || team?.plan === "team") {
-	console.warn(
-		`Stripe customer ID not found for vector store ingest (team: ${team?.id})`,
-	);
+	if (!stripeCustomerId) {
+		console.warn(
+			`Stripe customer ID not found for vector store ingest (team: ${team?.id})`,
+		);
+	}
+	if (!stripeRestrictedAccessKey) {
+		console.warn(
+			`STRIPE_AI_GATEWAY_RESTRICTED_ACCESS_KEY is not set. Billing attribution will be skipped for team ${team?.id}.`,
+		);
+	}
 }
  • Apply / Chat
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies that sending an empty stripe-restricted-access-key is risky and improves robustness by ensuring both customer ID and key are present before adding billing headers, which is crucial for correct billing attribution.

Medium
Learned
best practice
Use proper logging library

Replace console.warn with a proper logging library that supports configurable
log levels. This ensures diagnostic output can be controlled in different
environments.

apps/studio.giselles.ai/lib/vector-stores/shared/ai-gateway-headers.ts [22-26]

 } else if (team?.plan === "pro" || team?.plan === "team") {
-    console.warn(
+    logger.warn(
         `Stripe customer ID not found for vector store ingest (team: ${team?.id})`,
     );
 }
  • Apply / Chat
Suggestion importance[1-10]: 6

__

Why:
Relevant best practice - Remove debugging console.log statements from production code before merging. If diagnostic logging is needed, use proper logging libraries with configurable log levels or feature flags to control output in different environments.

Low
  • Update

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements Stripe customer ID header propagation for GitHub repository ingestion flows to enable proper billing attribution through the AI Gateway. The implementation creates a shared header building function and integrates it across all GitHub content type ingestion paths (blobs, PRs, issues) while refactoring document ingestion to use the same shared utility.

  • Adds shared buildAiGatewayHeaders function for consistent header construction across all vector store ingest flows
  • Extends ingest pipeline to support optional headers parameter for AI Gateway requests
  • Integrates Stripe customer ID headers into GitHub blob, PR, and issue ingest functions

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
apps/studio.giselles.ai/lib/vector-stores/shared/ai-gateway-headers.ts New shared utility function for building AI Gateway headers with Stripe customer ID
packages/rag/src/ingest/pipeline.ts Added optional headers parameter to pipeline options and passes them to embedder when using gateway
apps/studio.giselles.ai/lib/vector-stores/github/ingest/process-repository.ts Builds AI Gateway headers and passes them to blob, PR, and issue ingest functions
apps/studio.giselles.ai/lib/vector-stores/github/ingest/blobs/ingest-github-blobs.ts Added headers parameter to support AI Gateway billing attribution
apps/studio.giselles.ai/lib/vector-stores/github/ingest/pull-requests/ingest-github-pull-requests.ts Added headers parameter to support AI Gateway billing attribution
apps/studio.giselles.ai/lib/vector-stores/github/ingest/issues/ingest-github-issues.ts Added headers parameter to support AI Gateway billing attribution
apps/studio.giselles.ai/lib/vector-stores/document/ingest/ingest-document.ts Refactored to use shared buildAiGatewayHeaders function instead of local implementation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

);
}

return headers;
Copy link

Copilot AI Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function always returns headers even when empty (only containing http-referer and x-title). For consistency with the original document ingest implementation and to avoid passing unnecessary headers when not using the gateway, consider returning undefined when no Stripe customer ID is present and the team is not on a paid plan.

Suggested change
return headers;
if (stripeCustomerId !== undefined || team?.plan === "pro" || team?.plan === "team") {
return headers;
}
return undefined;

Copilot uses AI. Check for mistakes.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion! I'd prefer to keep the current behavior for the following reasons:

http-referer and x-title are useful for request tracking on the AI Gateway side, independent of Stripe billing
attribution. If we return undefined for non-paid users, we'd lose this tracking capability for their requests as well.

Also, looking at generate-embeddings.ts:111, headers are only applied when using the gateway:
headers: useGateway ? headers : undefined,

So there's no concern about passing unnecessary headers when not using the gateway.

},
});

const headers = buildAiGatewayHeaders(team);
Copy link

Copilot AI Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The headers are built three separate times for each content type (blobs, PRs, issues), but they depend only on the team which is the same for all three. Consider building the headers once before the switch statement and reusing them across all content processors to avoid redundant computation.

Copilot uses AI. Check for mistakes.
},
});

const headers = buildAiGatewayHeaders(team);
Copy link

Copilot AI Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The headers are built three separate times for each content type (blobs, PRs, issues), but they depend only on the team which is the same for all three. Consider building the headers once before the switch statement and reusing them across all content processors to avoid redundant computation.

Copilot uses AI. Check for mistakes.
},
});

const headers = buildAiGatewayHeaders(team);
Copy link

Copilot AI Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The headers are built three separate times for each content type (blobs, PRs, issues), but they depend only on the team which is the same for all three. Consider building the headers once before the switch statement and reusing them across all content processors to avoid redundant computation.

Copilot uses AI. Check for mistakes.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback!

I considered this, but I'd prefer to keep the current structure for a couple of reasons:

  1. buildAiGatewayHeaders is a lightweight function (simple object creation with a conditional), so the performance impact of calling it multiple times is negligible.
  2. Each processor in CONTENT_PROCESSORS is currently self-contained, handling its own dependencies. Passing headers from outside would add coupling between the caller and processors.

That said, if this becomes a concern in the future (e.g., if the function becomes more expensive), we can easily refactor it then.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e7ada29 and 56c08a4.

📒 Files selected for processing (7)
  • apps/studio.giselles.ai/lib/vector-stores/document/ingest/ingest-document.ts (2 hunks)
  • apps/studio.giselles.ai/lib/vector-stores/github/ingest/blobs/ingest-github-blobs.ts (2 hunks)
  • apps/studio.giselles.ai/lib/vector-stores/github/ingest/issues/ingest-github-issues.ts (2 hunks)
  • apps/studio.giselles.ai/lib/vector-stores/github/ingest/process-repository.ts (4 hunks)
  • apps/studio.giselles.ai/lib/vector-stores/github/ingest/pull-requests/ingest-github-pull-requests.ts (2 hunks)
  • apps/studio.giselles.ai/lib/vector-stores/shared/ai-gateway-headers.ts (1 hunks)
  • packages/rag/src/ingest/pipeline.ts (2 hunks)
🧰 Additional context used
📓 Path-based instructions (7)
**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.{ts,tsx,js,jsx}: Favor clear, descriptive names and type annotations over clever tricks
If you need a multi-paragraph comment, refactor until intent is obvious

**/*.{ts,tsx,js,jsx}: Use async/await and proper error handling
Variables and functions should use camelCase naming
Booleans and functions should use is, has, can, should prefixes
Function names should clearly indicate purpose

Files:

  • apps/studio.giselles.ai/lib/vector-stores/shared/ai-gateway-headers.ts
  • apps/studio.giselles.ai/lib/vector-stores/github/ingest/issues/ingest-github-issues.ts
  • apps/studio.giselles.ai/lib/vector-stores/github/ingest/pull-requests/ingest-github-pull-requests.ts
  • apps/studio.giselles.ai/lib/vector-stores/github/ingest/blobs/ingest-github-blobs.ts
  • apps/studio.giselles.ai/lib/vector-stores/github/ingest/process-repository.ts
  • packages/rag/src/ingest/pipeline.ts
  • apps/studio.giselles.ai/lib/vector-stores/document/ingest/ingest-document.ts
**/*.{ts,tsx}

📄 CodeRabbit inference engine (.cursor/rules/development-guide.mdc)

**/*.{ts,tsx}: MUST run pnpm biome check --write [filename] after EVERY code modification
All code changes must be formatted using Biome before being committed
Use Biome for formatting with tab indentation and double quotes
Follow organized imports pattern (enabled in biome.json)
Use TypeScript for type safety; avoid any types
Use Next.js patterns for web applications
Use async/await for asynchronous code rather than promises
Error handling: use try/catch blocks and propagate errors appropriately
Use kebab-case for all filenames
Use PascalCase for React components and classes
Use camelCase for variables, functions, and methods
Use prefixes like is, has, can, should for boolean variables (e.g., isEnabled, hasPermission)
Use prefixes like is, has, can, should for boolean functions (e.g., isTriggerRequiringCallsign(), hasActiveSubscription()) instead of imperative verbs
Use verbs or verb phrases for function naming that clearly indicate purpose (e.g., calculateTotalPrice(), not process())

Use PascalCase for React component and class names

Use TypeScript and avoid any

Files:

  • apps/studio.giselles.ai/lib/vector-stores/shared/ai-gateway-headers.ts
  • apps/studio.giselles.ai/lib/vector-stores/github/ingest/issues/ingest-github-issues.ts
  • apps/studio.giselles.ai/lib/vector-stores/github/ingest/pull-requests/ingest-github-pull-requests.ts
  • apps/studio.giselles.ai/lib/vector-stores/github/ingest/blobs/ingest-github-blobs.ts
  • apps/studio.giselles.ai/lib/vector-stores/github/ingest/process-repository.ts
  • packages/rag/src/ingest/pipeline.ts
  • apps/studio.giselles.ai/lib/vector-stores/document/ingest/ingest-document.ts
**/*.{js,ts,tsx,jsx,py,java,cs,cpp,c,go,rb,php,swift,kt,scala,rs,dart}

📄 CodeRabbit inference engine (.cursor/rules/language-support.mdc)

Write all code comments in English

Files:

  • apps/studio.giselles.ai/lib/vector-stores/shared/ai-gateway-headers.ts
  • apps/studio.giselles.ai/lib/vector-stores/github/ingest/issues/ingest-github-issues.ts
  • apps/studio.giselles.ai/lib/vector-stores/github/ingest/pull-requests/ingest-github-pull-requests.ts
  • apps/studio.giselles.ai/lib/vector-stores/github/ingest/blobs/ingest-github-blobs.ts
  • apps/studio.giselles.ai/lib/vector-stores/github/ingest/process-repository.ts
  • packages/rag/src/ingest/pipeline.ts
  • apps/studio.giselles.ai/lib/vector-stores/document/ingest/ingest-document.ts
**/*

📄 CodeRabbit inference engine (.cursor/rules/naming-guide.mdc)

Use kebab-case for file names (e.g., user-profile.ts, api-client.tsx)

Files:

  • apps/studio.giselles.ai/lib/vector-stores/shared/ai-gateway-headers.ts
  • apps/studio.giselles.ai/lib/vector-stores/github/ingest/issues/ingest-github-issues.ts
  • apps/studio.giselles.ai/lib/vector-stores/github/ingest/pull-requests/ingest-github-pull-requests.ts
  • apps/studio.giselles.ai/lib/vector-stores/github/ingest/blobs/ingest-github-blobs.ts
  • apps/studio.giselles.ai/lib/vector-stores/github/ingest/process-repository.ts
  • packages/rag/src/ingest/pipeline.ts
  • apps/studio.giselles.ai/lib/vector-stores/document/ingest/ingest-document.ts
**/*.{js,ts,jsx,tsx}

📄 CodeRabbit inference engine (.cursor/rules/naming-guide.mdc)

**/*.{js,ts,jsx,tsx}: Use camelCase for variable names, functions, and methods
Use verbs or verb phrases for function names to clearly indicate what the function does (e.g., calculateTotalPrice(), validateUserInput())
Use nouns or noun phrases for variable names to describe what the variable represents
Use boolean prefixes (is, has, can, should) for boolean variables and functions returning boolean values (e.g., isEnabled, hasPermission, isTriggerRequiringCallsign())

**/*.{js,ts,jsx,tsx}: Run pnpm biome check --write [filename] after every code change
All code must be formatted with Biome before commit

Files:

  • apps/studio.giselles.ai/lib/vector-stores/shared/ai-gateway-headers.ts
  • apps/studio.giselles.ai/lib/vector-stores/github/ingest/issues/ingest-github-issues.ts
  • apps/studio.giselles.ai/lib/vector-stores/github/ingest/pull-requests/ingest-github-pull-requests.ts
  • apps/studio.giselles.ai/lib/vector-stores/github/ingest/blobs/ingest-github-blobs.ts
  • apps/studio.giselles.ai/lib/vector-stores/github/ingest/process-repository.ts
  • packages/rag/src/ingest/pipeline.ts
  • apps/studio.giselles.ai/lib/vector-stores/document/ingest/ingest-document.ts
**/*.{ts,tsx,js,jsx,css}

📄 CodeRabbit inference engine (AGENTS.md)

Files should use kebab-case naming

Files:

  • apps/studio.giselles.ai/lib/vector-stores/shared/ai-gateway-headers.ts
  • apps/studio.giselles.ai/lib/vector-stores/github/ingest/issues/ingest-github-issues.ts
  • apps/studio.giselles.ai/lib/vector-stores/github/ingest/pull-requests/ingest-github-pull-requests.ts
  • apps/studio.giselles.ai/lib/vector-stores/github/ingest/blobs/ingest-github-blobs.ts
  • apps/studio.giselles.ai/lib/vector-stores/github/ingest/process-repository.ts
  • packages/rag/src/ingest/pipeline.ts
  • apps/studio.giselles.ai/lib/vector-stores/document/ingest/ingest-document.ts
**/*.{tsx,ts}

📄 CodeRabbit inference engine (AGENTS.md)

Components should use PascalCase naming

Files:

  • apps/studio.giselles.ai/lib/vector-stores/shared/ai-gateway-headers.ts
  • apps/studio.giselles.ai/lib/vector-stores/github/ingest/issues/ingest-github-issues.ts
  • apps/studio.giselles.ai/lib/vector-stores/github/ingest/pull-requests/ingest-github-pull-requests.ts
  • apps/studio.giselles.ai/lib/vector-stores/github/ingest/blobs/ingest-github-blobs.ts
  • apps/studio.giselles.ai/lib/vector-stores/github/ingest/process-repository.ts
  • packages/rag/src/ingest/pipeline.ts
  • apps/studio.giselles.ai/lib/vector-stores/document/ingest/ingest-document.ts
🧠 Learnings (1)
📚 Learning: 2025-11-25T03:07:07.498Z
Learnt from: CR
Repo: giselles-ai/giselle PR: 0
File: internal-packages/workflow-designer-ui/src/editor/properties-panel/trigger-node-properties-panel/providers/github-trigger/AGENTS.md:0-0
Timestamp: 2025-11-25T03:07:07.498Z
Learning: Applies to internal-packages/workflow-designer-ui/src/editor/properties-panel/trigger-node-properties-panel/providers/github-trigger/**/*.tsx : Build `GitHubFlowTriggerEvent` with optional callsign or labels properties and derive `outputs` from event payload keys before calling `client.configureTrigger()`.

Applied to files:

  • apps/studio.giselles.ai/lib/vector-stores/github/ingest/issues/ingest-github-issues.ts
  • apps/studio.giselles.ai/lib/vector-stores/github/ingest/pull-requests/ingest-github-pull-requests.ts
  • apps/studio.giselles.ai/lib/vector-stores/github/ingest/process-repository.ts
🧬 Code graph analysis (3)
apps/studio.giselles.ai/lib/vector-stores/shared/ai-gateway-headers.ts (2)
apps/studio.giselles.ai/services/teams/types.ts (1)
  • TeamWithSubscription (18-18)
apps/studio.giselles.ai/next.config.ts (1)
  • headers (77-105)
apps/studio.giselles.ai/lib/vector-stores/github/ingest/process-repository.ts (3)
apps/studio.giselles.ai/lib/vector-stores/shared/ai-gateway-headers.ts (1)
  • buildAiGatewayHeaders (8-29)
apps/studio.giselles.ai/lib/vector-stores/github/ingest/blobs/ingest-github-blobs.ts (1)
  • ingestGitHubBlobs (21-69)
apps/studio.giselles.ai/lib/vector-stores/github/ingest/pull-requests/ingest-github-pull-requests.ts (1)
  • ingestGitHubPullRequests (23-73)
apps/studio.giselles.ai/lib/vector-stores/document/ingest/ingest-document.ts (1)
apps/studio.giselles.ai/lib/vector-stores/shared/ai-gateway-headers.ts (1)
  • buildAiGatewayHeaders (8-29)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Cursor Bugbot
  • GitHub Check: check
🔇 Additional comments (9)
apps/studio.giselles.ai/lib/vector-stores/github/ingest/issues/ingest-github-issues.ts (1)

24-24: LGTM! Headers parameter properly propagated.

The optional headers parameter is correctly added to the function signature and properly propagated to the ingestion pipeline. This aligns with the pattern used in blob and pull request ingestion.

Also applies to: 84-84

apps/studio.giselles.ai/lib/vector-stores/document/ingest/ingest-document.ts (1)

14-14: LGTM! Successfully refactored to use shared header builder.

The refactoring consolidates header construction logic into the shared buildAiGatewayHeaders function, eliminating code duplication. The call correctly passes the team from the telemetry context.

Also applies to: 269-271

packages/rag/src/ingest/pipeline.ts (2)

49-50: LGTM! Optional headers parameter added to pipeline options.

The addition of the optional headers parameter to IngestPipelineOptions enables header propagation through the ingestion pipeline.


119-119: LGTM! Headers correctly passed only when using gateway.

The conditional logic ensures headers are only passed to the embedder when using the AI Gateway transport, which is the correct behavior since headers are only needed for gateway requests.

apps/studio.giselles.ai/lib/vector-stores/github/ingest/process-repository.ts (3)

18-18: LGTM! Headers properly constructed and passed for blob ingestion.

The headers are built using the shared buildAiGatewayHeaders function with the team context and correctly passed to the ingestGitHubBlobs function.

Also applies to: 80-80, 88-88


115-115: LGTM! Headers properly constructed and passed for pull request ingestion.

The headers are built using the shared buildAiGatewayHeaders function with the team context and correctly passed to the ingestGitHubPullRequests function.

Also applies to: 123-123


154-154: LGTM! Headers properly constructed and passed for issue ingestion.

The headers are built using the shared buildAiGatewayHeaders function with the team context and correctly passed to the ingestGitHubIssues function.

Also applies to: 162-162

apps/studio.giselles.ai/lib/vector-stores/github/ingest/pull-requests/ingest-github-pull-requests.ts (1)

29-29: LGTM! Headers parameter properly propagated.

The optional headers parameter is correctly added to the function signature and properly propagated to the ingestion pipeline, consistent with the pattern used in blob and issue ingestion.

Also applies to: 63-63

apps/studio.giselles.ai/lib/vector-stores/github/ingest/blobs/ingest-github-blobs.ts (1)

27-27: LGTM! Headers parameter properly propagated.

The optional headers parameter is correctly added to the function signature and properly propagated to the ingestion pipeline, consistent with the pattern used in pull request and issue ingestion.

Also applies to: 59-59

Comment on lines +17 to +21
const stripeCustomerId = team?.activeCustomerId ?? undefined;
if (stripeCustomerId !== undefined) {
headers["stripe-customer-id"] = stripeCustomerId;
headers["stripe-restricted-access-key"] =
process.env.STRIPE_AI_GATEWAY_RESTRICTED_ACCESS_KEY ?? "";
Copy link
Contributor

@coderabbitai coderabbitai bot Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Empty string fallback for restricted access key may cause silent failures.

Line 21 falls back to an empty string when STRIPE_AI_GATEWAY_RESTRICTED_ACCESS_KEY is not set. This could lead to authentication failures that are difficult to debug. Consider either requiring the environment variable when a customer ID is present, or logging a warning about the missing key.

Apply this diff to add a warning:

 	const stripeCustomerId = team?.activeCustomerId ?? undefined;
 	if (stripeCustomerId !== undefined) {
+		const restrictedAccessKey = process.env.STRIPE_AI_GATEWAY_RESTRICTED_ACCESS_KEY;
+		if (!restrictedAccessKey) {
+			console.warn(
+				`STRIPE_AI_GATEWAY_RESTRICTED_ACCESS_KEY not set for team with customer ID (team: ${team?.id})`,
+			);
+		}
 		headers["stripe-customer-id"] = stripeCustomerId;
-		headers["stripe-restricted-access-key"] =
-			process.env.STRIPE_AI_GATEWAY_RESTRICTED_ACCESS_KEY ?? "";
+		headers["stripe-restricted-access-key"] = restrictedAccessKey ?? "";
 	} else if (team?.plan === "pro" || team?.plan === "team") {
🤖 Prompt for AI Agents
In apps/studio.giselles.ai/lib/vector-stores/shared/ai-gateway-headers.ts around
lines 17 to 21, the code falls back to an empty string for
STRIPE_AI_GATEWAY_RESTRICTED_ACCESS_KEY which can cause silent auth failures;
when stripeCustomerId exists, check
process.env.STRIPE_AI_GATEWAY_RESTRICTED_ACCESS_KEY and if it's missing, either
throw or at minimum log a clear warning including team/stripeCustomerId context,
and only set headers["stripe-restricted-access-key"] when a non-empty env value
exists (do not leave an empty string in the header).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Fetch details of the issue #2338
gh issue view 2338 --repo giselles-ai/giselle --json number,title,body,author,state,url

Length of output: 2376


@shige, thanks for creating issue #2338 to track this! 👍 The follow-up will ensure the missing STRIPE_AI_GATEWAY_RESTRICTED_ACCESS_KEY is properly handled with appropriate warnings.


🧠 Learnings used
Learnt from: shige
Repo: giselles-ai/giselle PR: 2290
File: apps/studio.giselles.ai/db/schema.ts:109-169
Timestamp: 2025-11-28T00:51:04.041Z
Learning: The team is consciously using Stripe API version 2025-11-17.preview for Stripe v2 billing cadence (stripeBillingCadenceHistories) and pricing plan subscription (stripePricingPlanSubscriptionHistories) features in apps/studio.giselles.ai/db/schema.ts with awareness that preview APIs may not be production-safe.

@giselles-ai
Copy link

giselles-ai bot commented Nov 27, 2025

🔍 QA Testing Assistant by Giselle

📋 Manual QA Checklist

Based on the changes in this PR, here are the key areas to test manually:

  • GitHub Ingestion (Files/Blobs) - Billing Header Present: Select a team on a "pro" or "team" plan with a valid Stripe Customer ID. Initiate a new GitHub repository ingestion or re-sync, ensuring "Files" content is processed. Monitor outgoing requests to the AI Gateway for embedding and verify the stripe-customer-id header is present with the correct value.
  • GitHub Ingestion (Pull Requests) - Billing Header Present: Using a "pro" or "team" plan with a Stripe Customer ID, ingest a repository with Pull Requests. Monitor outgoing requests to the AI Gateway during PR content ingestion and verify the stripe-customer-id header is present.
  • GitHub Ingestion (Issues) - Billing Header Present: Using a "pro" or "team" plan with a Stripe Customer ID, ingest a repository with Issues. Monitor outgoing requests to the AI Gateway during Issue content ingestion and verify the stripe-customer-id header is present.
  • Document Ingestion - Billing Header Present: Select a team on a "pro" or "team" plan with a valid Stripe Customer ID. Upload a new document for ingestion. Monitor outgoing requests to the AI Gateway and verify that the refactored code correctly adds the stripe-customer-id header.
  • GitHub Ingestion (Internal Plan) - No Header/Warning: Use a test team on a "Internal" plan. Initiate any GitHub repository ingestion. Monitor outgoing requests and server logs. Verify that the requests do not contain the stripe-customer-id header and that no warning message about a missing customer ID is logged.

✨ Prompt for AI Agents

Use the following prompts with Cursor or Claude Code to automate E2E testing:

📝 E2E Test Generation Prompt
You are an expert QA engineer specializing in automated E2E testing with Playwright. Your task is to generate a comprehensive E2E test suite for a pull request that adds billing attribution headers to our data ingestion flows.

Please analyze the following context and instructions to create a new Playwright test file. The tests must be robust, maintainable, and ready for CI integration.

---

### **1. Context Summary**

*   **PR Goal:** The primary change is to add a `stripe-customer-id` header to all AI Gateway requests initiated during data ingestion. This is crucial for billing and usage attribution.
*   **Technical Implementation:**
    *   A new shared function, `buildAiGatewayHeaders`, has been created to standardize header creation.
    *   This function is now used by the **GitHub ingest flow** (for blobs/files, pull requests, and issues) and has been refactored into the existing **document ingest flow**.
*   **Key User Flows Affected:**
    1.  Connecting a new GitHub repository and ingesting its content.
    2.  Uploading a standard document for ingestion.
*   **Critical Paths for Testing:**
    *   Verify that for users on paid plans, the `stripe-customer-id` header is correctly added to outgoing embedding requests.
    *   Verify the header is *not* added for users on free plans.
    *   Confirm that the refactoring of the document ingest flow did not cause any regressions.
    *   Verify a specific server-side warning is logged when a paid-plan team is missing a `stripe-customer-id`.

---

### **2. Test Scenarios**

Create tests covering the following scenarios. Group them logically using `test.describe()`.

*   **`describe('GitHub Ingestion Billing Attribution')`**
    *   **Happy Path (Pro/Team Plan):**
        *   **Given** a user is logged in and belongs to a 'pro' plan team with a valid `activeCustomerId`.
        *   **When** the user connects a GitHub repository for ingestion.
        *   **Then** all subsequent AI Gateway requests for embedding files, issues, and PRs **must** include the `stripe-customer-id` header with the correct customer ID.
    *   **Negative Path (Free Plan):**
        *   **Given** a user is logged in and belongs to a 'free' plan team.
        *   **When** the user connects a GitHub repository for ingestion.
        *   **Then** the AI Gateway requests **must not** include the `stripe-customer-id` header.
    *   **Edge Case (Paid Plan without Customer ID):**
        *   **Given** a user is logged in and belongs to a 'pro' plan team but `activeCustomerId` is `null`.
        *   **When** the user connects a GitHub repository for ingestion.
        *   **Then** the AI Gateway requests **must not** include the `stripe-customer-id` header.
        *   **And** a warning message `Stripe customer ID not found for vector store ingest` should be logged by the server (verify this by listening to console events or checking logs if possible).

*   **`describe('Document Ingestion Billing Attribution (Regression Test)')`**
    *   **Happy Path (Pro/Team Plan):**
        *   **Given** a user is logged in and belongs to a 'pro' plan team with a valid `activeCustomerId`.
        *   **When** the user uploads a document for ingestion.
        *   **Then** the AI Gateway request for embedding **must** include the `stripe-customer-id` header.
    *   **Negative Path (Free Plan):**
        *   **Given** a user is logged in and belongs to a 'free' plan team.
        *   **When** the user uploads a document for ingestion.
        *   **Then** the AI Gateway request **must not** include the `stripe-customer-id` header.

---

### **3. Playwright Implementation Instructions**

*   **File Structure:** Create a new test file: `tests/e2e/billing-attribution.spec.ts`.
*   **Test Setup (Data Mocking):**
    *   Use a `beforeEach` hook to set up the necessary test state. Assume a helper function `setupTestUser({ plan: 'pro' | 'free', customerId: string | null })` exists to create the user and team in the database with the desired subscription status.
    *   Example setup:
        ```typescript
        import { test, expect } from '@playwright/test';
        import { setupTestUser, cleanupTestUser } from '../test-helpers'; // Fictional helper path

        test.describe('GitHub Ingestion Billing Attribution', () => {
          let user;

          test.afterEach(async () => {
            await cleanupTestUser(user.id);
          });

          test('should include stripe-customer-id for pro plan', async ({ page }) => {
            const customerId = 'cus_test_12345';
            user = await setupTestUser({ plan: 'pro', customerId });
            await loginAs(user, page); // Assume a login helper
            // ... test logic
          });
        });
        ```

*   **Network Interception (The Core of the Test):**
    *   The primary method of verification will be intercepting network requests. Use `page.route()` to intercept calls made to the AI Gateway.
    *   The URL pattern to intercept will likely be a backend route that proxies to the gateway, e.g., `**/api/v1/embeddings/create` or similar. Please use the most appropriate application-specific URL.
    *   Inside the route handler, inspect the request headers.

*   **Assertions:**
    *   **To check for header existence:**
        ```typescript
        // Intercept the network request
        await page.route('**/api/v1/embeddings/create', async (route) => {
          const headers = await route.request().allHeaders();
          
          // Assertion for pro user
          expect(headers).toHaveProperty('stripe-customer-id');
          expect(headers['stripe-customer-id']).toBe('cus_test_12345');
          
          await route.continue();
        });
        
        // Trigger the action that makes the call
        await page.getByTestId('start-github-ingest-button').click();
        ```
    *   **To check for header absence:**
        ```typescript
        await page.route('**/api/v1/embeddings/create', async (route) => {
          const headers = await route.request().allHeaders();
          
          // Assertion for free user
          expect(headers).not.toHaveProperty('stripe-customer-id');
          
          await route.continue();
        });
        ```
    *   **To check for console warnings:**
        *   For the edge case where a paid user has no customer ID, listen for console messages. Since this is a server-side log, you'll need to confirm if our framework bubbles these up to the browser console during development or if we need a separate mechanism. For now, implement it by listening to the page's console events.
        ```typescript
        const messages: string[] = [];
        page.on('console', msg => {
          if (msg.type() === 'warning') {
            messages.push(msg.text());
          }
        });
        
        // ... trigger the ingestion
        
        // Assert that the warning was eventually logged
        await expect.poll(() => messages.some(msg => msg.includes('Stripe customer ID not found'))).toBe(true);
        ```

---

### **4. MCP Integration Guidelines (for context)**

*   **Execution Command:** The tests will be run via a command like `mcp playwright:run --file tests/e2e/billing-attribution.spec.ts`.
*   **Environment:** The test environment must have access to all necessary environment variables, especially `STRIPE_AI_GATEWAY_RESTRICTED_ACCESS_KEY` and database credentials, likely via a `.env` file loaded by MCP. Your test code should not hardcode these secrets.

---

### **5. CI-Ready Code Requirements**

*   **Atomicity & Independence:** Each `test()` should be fully independent. All setup (user creation, login) must happen within `test` or `beforeEach`, and all cleanup (user deletion) within `afterEach` or `afterAll`. Do not rely on state from a previous test.
*   **Selectors:** Use `data-testid` attributes for selecting elements wherever possible to make tests resilient to UI refactoring. Avoid using CSS classes or text content for primary selectors.
*   **Clarity:** Use descriptive test names (e.g., `test('should NOT include stripe-customer-id header for free plan document ingest')`).
*   **Error Handling:** Playwright's hooks (`beforeEach`/`afterEach`) and automatic waiting mechanisms should suffice. No special `try/catch` blocks are needed unless you are handling an expected error condition.
*   **Parallelism:** By making tests atomic, they will be ready for parallel execution in CI, which is Playwright's default behavior.

Please proceed with generating the `tests/e2e/billing-attribution.spec.ts` file based on these instructions.

shige added 4 commits December 2, 2025 21:14
Create a shared utility function in vector-stores/shared/ to build AI Gateway headers with Stripe customer ID for billing attribution.
Replace local buildAiGatewayHeaders implementation with the shared utility function for consistency across vector store ingestion flows.
- Add headers parameter to RAG ingest pipeline options
- Pass headers through GitHub blob, PR, and issue ingest functions
- Call buildAiGatewayHeaders in processRepository

This ensures AI Gateway requests during GitHub repository ingestion include Stripe customer ID for billing attribution.
The function always returns a headers object, never undefined.
Remove `| undefined` from the return type for accuracy.
@shige
Copy link
Member Author

shige commented Dec 3, 2025

Thank you! 🚀

@shige shige merged commit 06d8b2b into main Dec 3, 2025
12 checks passed
@shige shige deleted the feat/add-ai-gateway-headers-to-github-ingest branch December 3, 2025 04:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants