Included updated and created date for git docs #374

juanpablosalas · 2025-12-15T20:29:06Z

As suggested by Jason, I included the last updated and created dates for files retrieved by the git scraper. This is done using the git python library and the first and last commit associated with each file.

lucalavezzo

Thanks @juanpablosalas, minor comments, but then we can merge. I would like to pull this into #377 and build some things on top of it.

lucalavezzo · 2025-12-19T18:30:44Z

src/data_manager/collectors/scrapers/integrations/git_scraper.py

            clone_from_url = url.replace("gitlab", f"{self.git_username}:{self.git_token}@gitlab")
        elif "github" in url:
-            clone_from_url = url.replace("github", f"{self.git_username}:{self.git_token}@github")
+            clone_from_url = url#.replace("github", f"{self.git_username}:{self.git_token}@github")


This is a non-trivial change in behavior, I think: didn't this allow you to pull from private repos?

lucalavezzo · 2025-12-19T18:31:40Z

src/data_manager/collectors/scrapers/integrations/git_scraper.py

+            last_updated_at = self._get_last_updated_date(repo=repo, file_path=markdown_path)
            resource = ScrapedResource(
                url=current_url,
                content=text_content,


We could think about putting the file name, date, repo, etc. in the text_content to make it easier for things like BM25 to find files via query. What do you think?

included updated and created date for git docs

9acd577

lucalavezzo reviewed Dec 19, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Included updated and created date for git docs #374

Included updated and created date for git docs #374

Uh oh!

juanpablosalas commented Dec 15, 2025

Uh oh!

lucalavezzo left a comment

Uh oh!

lucalavezzo Dec 19, 2025

Uh oh!

lucalavezzo Dec 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Included updated and created date for git docs #374

Are you sure you want to change the base?

Included updated and created date for git docs #374

Uh oh!

Conversation

juanpablosalas commented Dec 15, 2025

Uh oh!

lucalavezzo left a comment

Choose a reason for hiding this comment

Uh oh!

lucalavezzo Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

lucalavezzo Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants