-
Notifications
You must be signed in to change notification settings - Fork 216
chore(web): use git show for fetching file contents
#829
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This comment has been minimized.
This comment has been minimized.
WalkthroughSwitches the file source API to fetch files via git (git show) for arbitrary revisions, adds filename-based language detection, records repo defaultBranch in backend and DB, and updates related signatures and error handling. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client
participant Server as Web API
participant DB as Prisma
participant Git as Git (git show)
participant LD as LangDetect
Client->>Server: GET /api/source?repo=&path=&ref=
Server->>DB: find repository by name
DB-->>Server: repository metadata (incl. defaultBranch)
Server->>Git: git show "<ref or defaultBranch>:<path>"
alt file exists
Git-->>Server: file contents
Server->>LD: detectLanguageFromFilename(path)
LD-->>Server: language
Server-->>Client: FileSourceResponse (contents + repo metadata + language + web URLs)
else file missing or invalid ref
Git-->>Server: error
Server-->>Client: ServiceError (fileNotFound or unexpectedError)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@packages/web/src/features/search/fileSourceApi.ts`:
- Around line 35-37: The current check in fileSourceApi.ts returns
unexpectedError(...) for invalid git refs, which yields a 500; change this to
return a 4xx client error instead by using an existing bad request error factory
or adding a new invalidReference error in serviceError.ts, then replace the
unexpectedError call in the block that checks errorMessage.includes('unknown
revision') || errorMessage.includes('bad revision') to return the appropriate
client error (e.g., invalidReference(gitRef) or badRequest(...)) so the API
correctly signals a client-side bad request; update any imports/usages in
fileSourceApi.ts to use the new/appropriate error factory.
- Around line 49-54: externalWebUrl is built using gitRef which defaults to
'HEAD' when ref is undefined, producing invalid browse URLs; update the logic
around getCodeHostBrowseFileAtBranchUrl: if ref is undefined, do not pass 'HEAD'
— either resolve the repository's default branch (use repo.default_branch or
repo.defaultBranch if present) and use that as branchName, or explicitly set
externalWebUrl to undefined when no explicit ref is provided; modify the code
that computes gitRef/externalWebUrl to check for ref === undefined and branch
resolution via repo.default_branch before calling
getCodeHostBrowseFileAtBranchUrl (referencing gitRef,
repo.default_branch/repo.defaultBranch, and externalWebUrl).
In `@packages/web/src/lib/languageDetection.ts`:
- Around line 6-13: The extension map population uses raw casing from
linguistLanguages which can mismatch later lookups that lowercase the queried
extension; update the loop that fills extensionToLanguage (the for ... of
Object.entries(linguistLanguages) block) to normalize extension keys (e.g.,
convert ext to lowercase) before calling extensionToLanguage.set so stored keys
match the lowercase lookup, and ensure you still skip falsy/missing extensions
as before.
| if (errorMessage.includes('unknown revision') || errorMessage.includes('bad revision')) { | ||
| return unexpectedError(`Invalid git reference: ${gitRef}`); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Invalid git reference should return a client error, not a server error.
An invalid/unknown revision supplied by the client is a bad request (4xx), not an internal server error (5xx). Using unexpectedError returns status 500, which misrepresents the error source.
🐛 Proposed fix to use appropriate error status
Consider adding a dedicated invalidReference error or using a generic bad request error:
if (errorMessage.includes('unknown revision') || errorMessage.includes('bad revision')) {
- return unexpectedError(`Invalid git reference: ${gitRef}`);
+ return {
+ statusCode: 400,
+ errorCode: ErrorCode.INVALID_REQUEST_BODY,
+ message: `Invalid git reference: ${gitRef}`,
+ } satisfies ServiceError;
}Alternatively, add a new error factory in serviceError.ts for this case.
🤖 Prompt for AI Agents
In `@packages/web/src/features/search/fileSourceApi.ts` around lines 35 - 37, The
current check in fileSourceApi.ts returns unexpectedError(...) for invalid git
refs, which yields a 500; change this to return a 4xx client error instead by
using an existing bad request error factory or adding a new invalidReference
error in serviceError.ts, then replace the unexpectedError call in the block
that checks errorMessage.includes('unknown revision') ||
errorMessage.includes('bad revision') to return the appropriate client error
(e.g., invalidReference(gitRef) or badRequest(...)) so the API correctly signals
a client-side bad request; update any imports/usages in fileSourceApi.ts to use
the new/appropriate error factory.
| const externalWebUrl = getCodeHostBrowseFileAtBranchUrl({ | ||
| webUrl: repo.webUrl, | ||
| codeHostType: repo.external_codeHostType, | ||
| branchName: gitRef, | ||
| filePath, | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
cat -n packages/web/src/features/search/fileSourceApi.ts | head -80Repository: sourcebot-dev/sourcebot
Length of output: 3321
🏁 Script executed:
# Find where gitRef is defined and how it defaults to 'HEAD'
rg -n "gitRef" packages/web/src/features/search/fileSourceApi.ts -B5 -A2Repository: sourcebot-dev/sourcebot
Length of output: 1251
🏁 Script executed:
# Find the getCodeHostBrowseFileAtBranchUrl function definition
fd . packages/web packages/backend -name "*.ts" -type f | xargs grep -l "getCodeHostBrowseFileAtBranchUrl" | head -5Repository: sourcebot-dev/sourcebot
Length of output: 297
🏁 Script executed:
# Search for getCodeHostBrowseFileAtBranchUrl function definition
rg -n "getCodeHostBrowseFileAtBranchUrl" packages/web packages/backend -t ts -B2 -A15 | grep -A15 "function\|export\|const getCodeHostBrowseFileAtBranchUrl"Repository: sourcebot-dev/sourcebot
Length of output: 2341
🏁 Script executed:
# Also find the exact file containing this function
fd . packages/web packages/backend -name "*.ts" -type f | xargs grep -l "getCodeHostBrowseFileAtBranchUrl"Repository: sourcebot-dev/sourcebot
Length of output: 299
🏁 Script executed:
sed -n '376,450p' packages/web/src/lib/utils.tsRepository: sourcebot-dev/sourcebot
Length of output: 2283
🏁 Script executed:
# Check for tests related to this function
rg -n "getCodeHostBrowseFileAtBranchUrl" packages/ --type ts -g "*.test.*"Repository: sourcebot-dev/sourcebot
Length of output: 49
🏁 Script executed:
# Check if there's any handling of 'HEAD' in external URL generation
rg -n "'HEAD'" packages/web/src/lib/utils.ts -B2 -A2Repository: sourcebot-dev/sourcebot
Length of output: 49
🏁 Script executed:
# Check if ref being undefined is actually a valid case (check fileSourceApi usage/tests)
fd . packages/ -name "*.test.ts" -o -name "*.spec.ts" | xargs grep -l "fileSourceApi\|getFileSource" 2>/dev/null | head -3Repository: sourcebot-dev/sourcebot
Length of output: 297
🏁 Script executed:
# Search for test files containing fileSourceApi
find packages -type f \( -name "*.test.ts" -o -name "*.spec.ts" \)Repository: sourcebot-dev/sourcebot
Length of output: 746
🏁 Script executed:
# Check how ref parameter is used in FileSourceRequest type
rg -n "FileSourceRequest" packages/web/src/features/search/types.ts -A10Repository: sourcebot-dev/sourcebot
Length of output: 505
🏁 Script executed:
# Find the fileSourceRequestSchema definition
rg -n "fileSourceRequestSchema" packages/web/src/features/search/types.ts -B5 -A5Repository: sourcebot-dev/sourcebot
Length of output: 585
External URL will be invalid when ref is undefined and gitRef defaults to 'HEAD'.
When ref is not provided, gitRef becomes 'HEAD', which is then passed directly to getCodeHostBrowseFileAtBranchUrl as branchName. However, major code hosts (GitHub, GitLab, Gitea, Azure DevOps, Bitbucket) do not support 'HEAD' as a valid reference in browse URLs—they require actual branch names or commit SHAs. This results in broken external URLs.
Consider either resolving 'HEAD' to the actual default branch name before passing it to the function, or returning undefined for externalWebUrl when ref is not explicitly provided.
🤖 Prompt for AI Agents
In `@packages/web/src/features/search/fileSourceApi.ts` around lines 49 - 54,
externalWebUrl is built using gitRef which defaults to 'HEAD' when ref is
undefined, producing invalid browse URLs; update the logic around
getCodeHostBrowseFileAtBranchUrl: if ref is undefined, do not pass 'HEAD' —
either resolve the repository's default branch (use repo.default_branch or
repo.defaultBranch if present) and use that as branchName, or explicitly set
externalWebUrl to undefined when no explicit ref is provided; modify the code
that computes gitRef/externalWebUrl to check for ref === undefined and branch
resolution via repo.default_branch before calling
getCodeHostBrowseFileAtBranchUrl (referencing gitRef,
repo.default_branch/repo.defaultBranch, and externalWebUrl).
|
CC @CyberT17 - this change allows for fetching files on any revision |
git show for fetching file sourcegit show for fetching file contents
Code reviewNo issues found. Checked for bugs and CLAUDE.md compliance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@packages/web/src/features/search/fileSourceApi.ts`:
- Line 29: filePath is passed directly into git.raw which risks directory
traversal; before calling git.raw in fileSourceApi.ts validate the filePath
using the existing isPathValid helper (and import it), reject or throw a clear
error (or return an empty result) if isPathValid(filePath) is false (covering
../ and null bytes), and only call git.raw(['show', `${gitRef}:${filePath}`])
when validation passes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@packages/web/src/features/search/fileSourceApi.ts`:
- Around line 29-31: The endpoint currently uses git.raw(...) to populate source
(variable "source") and assumes text content (schema has source: z.string() and
uses detectLanguageFromFilename()), so make the intent explicit by validating
and rejecting binary files: before calling git.raw or returning source, fetch
the file as a Buffer (or use a binary-safe git plumbing command), detect binary
content (e.g., via a small byte-inspection or an isBinaryFile utility), and if
binary return a 4xx error indicating "binary files not supported"; keep using
git.raw and z.string() only for confirmed text files and ensure
detectLanguageFromFilename() is only applied after that check.
Changes the
/api/reposendpoint to usegit show(instead of zoekt) to fetch file contents.Fixes #574
Summary by CodeRabbit
New Features
Improvements
Chores
✏️ Tip: You can customize this high-level summary in your review settings.