Skip to content

Conversation

@brendan-kellam
Copy link
Contributor

@brendan-kellam brendan-kellam commented Jan 31, 2026

Changes the /api/repos endpoint to use git show (instead of zoekt) to fetch file contents.

Fixes #574

Summary by CodeRabbit

  • New Features

    • Source endpoint now fetches code for any revision (not limited to previously indexed revisions).
    • Source responses include richer repo metadata, branch info, and web URLs.
  • Improvements

    • Repositories now track and surface default branch information.
    • Automatic language detection added for files based on filename/extension.
  • Chores

    • Added a new runtime language-data dependency.

✏️ Tip: You can customize this high-level summary in your review settings.

@github-actions

This comment has been minimized.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 31, 2026

Walkthrough

Switches the file source API to fetch files via git (git show) for arbitrary revisions, adds filename-based language detection, records repo defaultBranch in backend and DB, and updates related signatures and error handling.

Changes

Cohort / File(s) Summary
Changelog
CHANGELOG.md
Documents API change: /api/source now supports fetching source for any revision (not limited to zoekt-indexed revisions).
Web dependencies
packages/web/package.json
Adds linguist-languages ^9.3.1.
File source API & language detection
packages/web/src/features/search/fileSourceApi.ts, packages/web/src/lib/languageDetection.ts, packages/web/src/lib/serviceError.ts
Replaces zoekt/IR retrieval with git-based git show flow; resolves repo via Prisma; returns enriched FileSourceResponse (repo metadata, web URLs, branch, detected language); changes fileNotFound to synchronous return type.
Backend repo metadata & defaultBranch
packages/backend/src/repoCompileUtils.ts, packages/backend/src/repoIndexManager.ts, packages/backend/src/github.ts, packages/backend/src/repoCompileUtils.test.ts
Adds defaultBranch to repo records across providers, records local default branch during index completion, extends OctokitRepository type, and updates test mocks to include getLocalDefaultBranch.
Database schema & migration
packages/db/prisma/schema.prisma, packages/db/prisma/migrations/..._add_default_branch_to_repo_table/migration.sql
Adds optional defaultBranch (String?) to Repo model and migration to add defaultBranch TEXT column.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client
    participant Server as Web API
    participant DB as Prisma
    participant Git as Git (git show)
    participant LD as LangDetect

    Client->>Server: GET /api/source?repo=&path=&ref=
    Server->>DB: find repository by name
    DB-->>Server: repository metadata (incl. defaultBranch)
    Server->>Git: git show "<ref or defaultBranch>:<path>"
    alt file exists
        Git-->>Server: file contents
        Server->>LD: detectLanguageFromFilename(path)
        LD-->>Server: language
        Server-->>Client: FileSourceResponse (contents + repo metadata + language + web URLs)
    else file missing or invalid ref
        Git-->>Server: error
        Server-->>Client: ServiceError (fileNotFound or unexpectedError)
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • msukkari
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately describes the main change: switching from zoekt to git show for fetching file contents in the web package.
Linked Issues check ✅ Passed The changes fully address issue #574: git show replaces zoekt for file retrieval, enabling arbitrary commit SHA access and binary file support.
Out of Scope Changes check ✅ Passed Changes include necessary supporting infrastructure (language detection, default branch tracking) directly enabling the core git show functionality, with no unrelated modifications.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch bkellam/fix-SOU-89

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@claude

This comment was marked as off-topic.

@claude

This comment was marked as off-topic.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@packages/web/src/features/search/fileSourceApi.ts`:
- Around line 35-37: The current check in fileSourceApi.ts returns
unexpectedError(...) for invalid git refs, which yields a 500; change this to
return a 4xx client error instead by using an existing bad request error factory
or adding a new invalidReference error in serviceError.ts, then replace the
unexpectedError call in the block that checks errorMessage.includes('unknown
revision') || errorMessage.includes('bad revision') to return the appropriate
client error (e.g., invalidReference(gitRef) or badRequest(...)) so the API
correctly signals a client-side bad request; update any imports/usages in
fileSourceApi.ts to use the new/appropriate error factory.
- Around line 49-54: externalWebUrl is built using gitRef which defaults to
'HEAD' when ref is undefined, producing invalid browse URLs; update the logic
around getCodeHostBrowseFileAtBranchUrl: if ref is undefined, do not pass 'HEAD'
— either resolve the repository's default branch (use repo.default_branch or
repo.defaultBranch if present) and use that as branchName, or explicitly set
externalWebUrl to undefined when no explicit ref is provided; modify the code
that computes gitRef/externalWebUrl to check for ref === undefined and branch
resolution via repo.default_branch before calling
getCodeHostBrowseFileAtBranchUrl (referencing gitRef,
repo.default_branch/repo.defaultBranch, and externalWebUrl).

In `@packages/web/src/lib/languageDetection.ts`:
- Around line 6-13: The extension map population uses raw casing from
linguistLanguages which can mismatch later lookups that lowercase the queried
extension; update the loop that fills extensionToLanguage (the for ... of
Object.entries(linguistLanguages) block) to normalize extension keys (e.g.,
convert ext to lowercase) before calling extensionToLanguage.set so stored keys
match the lowercase lookup, and ensure you still skip falsy/missing extensions
as before.

Comment on lines 35 to 37
if (errorMessage.includes('unknown revision') || errorMessage.includes('bad revision')) {
return unexpectedError(`Invalid git reference: ${gitRef}`);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Invalid git reference should return a client error, not a server error.

An invalid/unknown revision supplied by the client is a bad request (4xx), not an internal server error (5xx). Using unexpectedError returns status 500, which misrepresents the error source.

🐛 Proposed fix to use appropriate error status

Consider adding a dedicated invalidReference error or using a generic bad request error:

             if (errorMessage.includes('unknown revision') || errorMessage.includes('bad revision')) {
-                return unexpectedError(`Invalid git reference: ${gitRef}`);
+                return {
+                    statusCode: 400,
+                    errorCode: ErrorCode.INVALID_REQUEST_BODY,
+                    message: `Invalid git reference: ${gitRef}`,
+                } satisfies ServiceError;
             }

Alternatively, add a new error factory in serviceError.ts for this case.

🤖 Prompt for AI Agents
In `@packages/web/src/features/search/fileSourceApi.ts` around lines 35 - 37, The
current check in fileSourceApi.ts returns unexpectedError(...) for invalid git
refs, which yields a 500; change this to return a 4xx client error instead by
using an existing bad request error factory or adding a new invalidReference
error in serviceError.ts, then replace the unexpectedError call in the block
that checks errorMessage.includes('unknown revision') ||
errorMessage.includes('bad revision') to return the appropriate client error
(e.g., invalidReference(gitRef) or badRequest(...)) so the API correctly signals
a client-side bad request; update any imports/usages in fileSourceApi.ts to use
the new/appropriate error factory.

Comment on lines +49 to +54
const externalWebUrl = getCodeHostBrowseFileAtBranchUrl({
webUrl: repo.webUrl,
codeHostType: repo.external_codeHostType,
branchName: gitRef,
filePath,
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n packages/web/src/features/search/fileSourceApi.ts | head -80

Repository: sourcebot-dev/sourcebot

Length of output: 3321


🏁 Script executed:

# Find where gitRef is defined and how it defaults to 'HEAD'
rg -n "gitRef" packages/web/src/features/search/fileSourceApi.ts -B5 -A2

Repository: sourcebot-dev/sourcebot

Length of output: 1251


🏁 Script executed:

# Find the getCodeHostBrowseFileAtBranchUrl function definition
fd . packages/web packages/backend -name "*.ts" -type f | xargs grep -l "getCodeHostBrowseFileAtBranchUrl" | head -5

Repository: sourcebot-dev/sourcebot

Length of output: 297


🏁 Script executed:

# Search for getCodeHostBrowseFileAtBranchUrl function definition
rg -n "getCodeHostBrowseFileAtBranchUrl" packages/web packages/backend -t ts -B2 -A15 | grep -A15 "function\|export\|const getCodeHostBrowseFileAtBranchUrl"

Repository: sourcebot-dev/sourcebot

Length of output: 2341


🏁 Script executed:

# Also find the exact file containing this function
fd . packages/web packages/backend -name "*.ts" -type f | xargs grep -l "getCodeHostBrowseFileAtBranchUrl"

Repository: sourcebot-dev/sourcebot

Length of output: 299


🏁 Script executed:

sed -n '376,450p' packages/web/src/lib/utils.ts

Repository: sourcebot-dev/sourcebot

Length of output: 2283


🏁 Script executed:

# Check for tests related to this function
rg -n "getCodeHostBrowseFileAtBranchUrl" packages/ --type ts -g "*.test.*"

Repository: sourcebot-dev/sourcebot

Length of output: 49


🏁 Script executed:

# Check if there's any handling of 'HEAD' in external URL generation
rg -n "'HEAD'" packages/web/src/lib/utils.ts -B2 -A2

Repository: sourcebot-dev/sourcebot

Length of output: 49


🏁 Script executed:

# Check if ref being undefined is actually a valid case (check fileSourceApi usage/tests)
fd . packages/ -name "*.test.ts" -o -name "*.spec.ts" | xargs grep -l "fileSourceApi\|getFileSource" 2>/dev/null | head -3

Repository: sourcebot-dev/sourcebot

Length of output: 297


🏁 Script executed:

# Search for test files containing fileSourceApi
find packages -type f \( -name "*.test.ts" -o -name "*.spec.ts" \)

Repository: sourcebot-dev/sourcebot

Length of output: 746


🏁 Script executed:

# Check how ref parameter is used in FileSourceRequest type
rg -n "FileSourceRequest" packages/web/src/features/search/types.ts -A10

Repository: sourcebot-dev/sourcebot

Length of output: 505


🏁 Script executed:

# Find the fileSourceRequestSchema definition
rg -n "fileSourceRequestSchema" packages/web/src/features/search/types.ts -B5 -A5

Repository: sourcebot-dev/sourcebot

Length of output: 585


External URL will be invalid when ref is undefined and gitRef defaults to 'HEAD'.

When ref is not provided, gitRef becomes 'HEAD', which is then passed directly to getCodeHostBrowseFileAtBranchUrl as branchName. However, major code hosts (GitHub, GitLab, Gitea, Azure DevOps, Bitbucket) do not support 'HEAD' as a valid reference in browse URLs—they require actual branch names or commit SHAs. This results in broken external URLs.

Consider either resolving 'HEAD' to the actual default branch name before passing it to the function, or returning undefined for externalWebUrl when ref is not explicitly provided.

🤖 Prompt for AI Agents
In `@packages/web/src/features/search/fileSourceApi.ts` around lines 49 - 54,
externalWebUrl is built using gitRef which defaults to 'HEAD' when ref is
undefined, producing invalid browse URLs; update the logic around
getCodeHostBrowseFileAtBranchUrl: if ref is undefined, do not pass 'HEAD' —
either resolve the repository's default branch (use repo.default_branch or
repo.defaultBranch if present) and use that as branchName, or explicitly set
externalWebUrl to undefined when no explicit ref is provided; modify the code
that computes gitRef/externalWebUrl to check for ref === undefined and branch
resolution via repo.default_branch before calling
getCodeHostBrowseFileAtBranchUrl (referencing gitRef,
repo.default_branch/repo.defaultBranch, and externalWebUrl).

@brendan-kellam
Copy link
Contributor Author

CC @CyberT17 - this change allows for fetching files on any revision

@brendan-kellam brendan-kellam changed the title chore(web): use git show for fetching file source chore(web): use git show for fetching file contents Jan 31, 2026
@claude
Copy link

claude bot commented Jan 31, 2026

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@packages/web/src/features/search/fileSourceApi.ts`:
- Line 29: filePath is passed directly into git.raw which risks directory
traversal; before calling git.raw in fileSourceApi.ts validate the filePath
using the existing isPathValid helper (and import it), reject or throw a clear
error (or return an empty result) if isPathValid(filePath) is false (covering
../ and null bytes), and only call git.raw(['show', `${gitRef}:${filePath}`])
when validation passes.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@packages/web/src/features/search/fileSourceApi.ts`:
- Around line 29-31: The endpoint currently uses git.raw(...) to populate source
(variable "source") and assumes text content (schema has source: z.string() and
uses detectLanguageFromFilename()), so make the intent explicit by validating
and rejecting binary files: before calling git.raw or returning source, fetch
the file as a Buffer (or use a binary-safe git plumbing command), detect binary
content (e.g., via a small byte-inspection or an isBinaryFile utility), and if
binary return a 4xx error indicating "binary files not supported"; keep using
git.raw and z.string() only for confirmed text files and ensure
detectLanguageFromFilename() is only applied after that check.

@brendan-kellam brendan-kellam merged commit bd3c915 into main Jan 31, 2026
9 checks passed
@brendan-kellam brendan-kellam deleted the bkellam/fix-SOU-89 branch January 31, 2026 04:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[chore] Use git for file source

2 participants