-
-
Notifications
You must be signed in to change notification settings - Fork 9.5k
feat(core): New siteConfig future.experimental_vcs API + future.experimental_faster.gitEagerVcs flag
#11512
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…on + config validation tests
✅ [V2]
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
Size Change: +23.4 kB (+0.2%) Total Size: 11.8 MB
ℹ️ View Unchanged
|
⚡️ Lighthouse report for the deploy preview of this PR
|
|
Size Change: +24 kB (+0.2%) Total Size: 12.3 MB
ℹ️ View Unchanged
|
siteConfig.future.experimental_vcs API to read file metadata from Git & other systemsfuture.experimental_vcs API + future.experimental_faster.gitEagerVcs flag
|
@slorber do you mind if I suggest a test case? my-docusaurus-site/
├── .git/
├── package.json
├── docusaurus.config.js
└── external-docs/ <-- an inner repository, could be a git submodule
├── .git/
├── README.md
└── docs/
├── intro.md
├── guide.md
└── tutorials/
└── how-to.mdWhen scanning for the
PS: I have not read your implementation entirely, maybe you already account for it. |
@felipecrs I have implemented something that should work for your case. It assumes that you are using git submodules, which afaik is your case. The logic looks like this:
async function loadAllGitFilesInfoMap(cwd: string): Promise<GitFileInfoMap> {
const roots = await PerfLogger.async('Reading Git root dirs', () =>
getGitAllRepoRoots(cwd),
);
const allMaps: GitFileInfoMap[] = await Promise.all(
roots.map(async (root) => {
const map = await PerfLogger.async(
`Reading Git history for repo ${logger.path(basename(root))}`,
() => getGitRepositoryFilesInfo(root),
);
return resolveFileInfoMapPaths(root, map);
}),
);
return mergeFileMaps(allMaps);
}We have tests covering that. However, if you use nested Git repositories without submodules, we do not support that yet, but we could improve it later to support more cases. It seems not so easy to implement a Note that you can provide your own implementation. If the "git-eager" one doesn't work for you, it's now possible to implement in userland, and show us your working implementation that we could eventually add to Docusaurus core later. What I mean is: it doesn't have to be perfect, and we can ship this PR in its current state:
I'm going to merge this PR asap, but I am willing to keep improving from the initial "git-eager" strategy implementation! I have used this Git superproject as a test case: https://github.com/slorber/docusaurus-git-submodules-tests It contains 3 large submodule repositories: Docusaurus, React Native, React Native Website. The current implementation is able to read over 50k commits and retrieve a map of 45k file infos over the 3 repositories in ~3s on my Mac M3 (ran in parallel with other Docusaurus things loading). |
|
That's cool, thanks for considering it upfront. I don't use git submodules though, I just have automation to clone the repositories at their latest revisions in a git-ignored sub-directory of my website repository. I also have automation to initialize multiple branches of the same repository in different sub-directories, as the source from my versioned docs come from those branches (you may recall from #11288). But maybe I can suggest a simpler approach (not in TS, sorry): $ git clone https://gerrit.googlesource.com/gerrit --recurse-submodules
$ find . -name .git \( -type d -o -type f \) -prune -exec sh -c 'git -C "$(dirname "$1")" rev-parse --show-toplevel 2>/dev/null' -- {} \;
/home/felipecrs/repos/gerrit/plugins/delete-project
/home/felipecrs/repos/gerrit/plugins/replication
/home/felipecrs/repos/gerrit/plugins/gitiles
/home/felipecrs/repos/gerrit/plugins/singleusergroup
/home/felipecrs/repos/gerrit/plugins/commit-message-length-validator
/home/felipecrs/repos/gerrit/plugins/webhooks
/home/felipecrs/repos/gerrit/plugins/download-commands
/home/felipecrs/repos/gerrit/plugins/reviewnotes
/home/felipecrs/repos/gerrit/plugins/plugin-manager
/home/felipecrs/repos/gerrit/plugins/codemirror-editor
/home/felipecrs/repos/gerrit/plugins/hooks
/home/felipecrs/repos/gerrit/polymer-bridges
/home/felipecrs/repos/gerrit
/home/felipecrs/repos/gerrit/modules/jgit
/home/felipecrs/repos/gerrit/modules/java-prettifyWhich should work for any case, regardless of submodules, worktrees, or not. It would even work for recursive submodules (which I suppose you have not accounted for yet). And it's probably fast enough. :) |
|
I'm open to improving the implementation, so if you want to write a PR please do ;) We also need to support Windows and other OS, so we need to ensure the implementation works across all OS. Also, this logic to "find" the repositories should rather be fast: we don't want to slow down the most common scenario (single Git repo) to cover the 1% advanced needs. If we can't find a sensible default implementation, it may be better to offer a "composable" approach that lets you provide a hardcoded list of Git roots instead of detecting those. |
|
Or maybe some lazy load approach: if discovering date for a given file fails, check if it has its own repository. If it does, scan that repository as well.
Right. Maybe someday. For now anyway there is |
I thought about that kind of implementation, but IMHO it's not ideal because it is likely to lead to a lot of useless commands for untracked files. We have plugins that codegen MDX lots of MDX files (often from OpenAPI schemas, for example), and those files are untracked. We can't run one or more shell commands for each of these files, so the implementation needs to be more clever to skip useless work. |
Docusaurus Faster
This PR is part of the Docusaurus Faster project aiming at reducing production build times
TLDR
Your site uses the
showLastUpdateAuthorandshowLastUpdateTimeoptions of the docs/blog/page plugin?It could build faster if you turn on Docusaurus Faster:
Alternatively, you can also use this option, which enables the same improvements:
The impact on build times can be significant for:
Problem
We want to improve site build performance.
It turns out that our "last updated at" feature is a real bottleneck.
For large sites, it leads to the inefficient execution of thousands of
git logcommands, delaying the pluginloadContent()phase significantly.This performance bottleneck has been largely described here:
git logcommands to read last update metadata #11208In this PR, we'll focus on solutions we implemented to overcome this perf bottleneck.
Solution
This PR implements:
siteConfig.future.experimental_vcs: an option to let you write your own VCS integration, defaulting to our historical Git-based implementation, with various VCS presets available out of the boxsiteConfig.future.experimental_faster.gitEagerVcs: a future flag to turn on a new - much faster - git-based implementation that doesn't perform thousands ofgit logcallsNew VCS API
We now have a new
future.experimental_vcssite config option.It lets you provide your own implementation to tell Docusaurus how to read the file creation / last update info of a given file.
For example, you could implement an "svn-ad-hoc" strategy yourself:
Or implement an "svn-eager" strategy yourself:
You can also pass
falseto disable, ortrue(default) to enable the default Git strategy:The default strategy:
It is also possible to pass VCS presets:
The available presets are:
git log <filename>based strategyWe also have default strategies that are "dynamic" and delegate to the strategies above:
We recommend that most users use either
true,'default-v1'or'default-v2'Reading Git file info in dev can be expensive and can significantly slow down your dev server startup time.
Even simpler: just turn Docusaurus Faster on to leverage the new Git-based implementation. See below.
Docusaurus Faster -
gitEagerVcsflagThe
siteConfig.future.experimental_faster.gitEagerVcsflag will simply swap the default'default-v1'to'default-v2', replacing thegit-ad-hocimpl in production builds by thegit-eagerstrategy, while keepinghardcodedin dev).The simpler configuration I'd recommend is to turn on Docusaurus Faster globally:
Your site will now use the new
'default-v2'default VCS strategy by default.If proven successful, this VCS strategy will become the new default in Docusaurus v4.
Benchmark
We are comparing the legacy (git-ad-hoc) vs the new (git-eager) VCS strategies on our own website.
We are using
DOCUSAURUS_RETURN_AFTER_LOADING=trueso that we only measure the site initial phase, skipping the bundling/SSG phases, so that we can measure more accurarely the perf improvement on the phase we aim to optimize.yarn build:website:fastBuilding our website for English only, with a limited subset of doc versions:
The new implementation is ~1.3x faster:
The result is significant, but not outstanding, because:
git logcommands for a limited number of docsThis is likely the worst-case scenario for the new implementation, and yet it performs faster.
yarn build:website --locale enBuilding our website for English only:
The new implementation is ~2.8x faster:
From our tests, we believe the gap will likely increase even more with the number of docs and the size of your repository.
yarn build:websiteBuilding our website for all locales:
The new implementation is 5.6x+ faster:
This result is even more impressive because we only read the Git repository eagerly once for the first locale. For all subsequent locales, we'll read the git file info data from the repository that we already read!
The new implementation is expected to speed up significantly i18n sites that build all the locales in a single
docusaurus buildrun.(ie, not using multiple
docusaurus build --locale <currentLocale>calls)What's next?
The new git-eager implementation is an initial experimental implementation.
It is possible that it doesn't work well for all sites, or perform as fast as it does on our own website. Please report us problems you have with this new implementation, and we'll improve it!
Please also note that this is an initial version: it should be possible to make it even more performant. For example, we could implement an incremental mode so that we only have to read the latest commits from your Git history, leading to more performant rebuilds (similar to bundler persistent caching).
e18e/ecosystem-issues#216
Test Plan
CI + unit tests + dogfood
Test links
Preview: https://deploy-preview-11512--docusaurus-2.netlify.app/
VCS documentation: https://deploy-preview-11512--docusaurus-2.netlify.app/docs/api/docusaurus-config#vcs