Skip to content

Add cache bundle bootstrap and improve abstract ingestion#12

Open
StevenZ904 wants to merge 4 commits intoKyle-Kyle:masterfrom
StevenZ904:feature/cache-bundle-bootstrap
Open

Add cache bundle bootstrap and improve abstract ingestion#12
StevenZ904 wants to merge 4 commits intoKyle-Kyle:masterfrom
StevenZ904:feature/cache-bundle-bootstrap

Conversation

@StevenZ904
Copy link
Copy Markdown

Summary

  • add cached multi-source abstract ingestion and stronger DB rebuild behavior
  • add cache bundle export/install support for bootstraping from a published snapshot
  • update README with bundle bootstrap instructions and the published Drive link

Validation

  • pytest -q
  • clean-environment install, bundle download, bundle install, query, and incremental build smoke test

@Kyle-Kyle
Copy link
Copy Markdown
Owner

Hmm. So, the main purpose of the bundle is to avoid duplicate requests when updating the database? Seems reasonable to me. But I don't like the implementation.

But I have a few comments and questions:

  • I don't think putting a bundle on Google drive is a good idea: people downloading from the same link may have different content if it is edited. This is especially problematic since we do unzip on this untrusted data and do file extraction logic by ourselves. A better approach will be hosting the bundle on GitHub release page.
  • Is the code generated by AI? Some logic seems unnecessary and fragile. Personally, I don't like huge chunks of AI-generated code (AI-generated patches that can be verified by humans are fine). I have had a terrible experience with it (it can bury landmines in the code that are difficult to debug).
  • And I'll suggest that you to implement one logic in each commit.

Anyway. I'll carefully review and rewrite the code so that it is human-readable when I have some more cycles.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants