Skip to content

fix: strip Python bytecode from bundled backend#102

Merged
zouyonghe merged 2 commits intoAstrBotDevs:mainfrom
zouyonghe:codex/strip-python-bytecode-from-backend-bundle
Mar 25, 2026
Merged

fix: strip Python bytecode from bundled backend#102
zouyonghe merged 2 commits intoAstrBotDevs:mainfrom
zouyonghe:codex/strip-python-bytecode-from-backend-bundle

Conversation

@zouyonghe
Copy link
Member

@zouyonghe zouyonghe commented Mar 25, 2026

Fixes #101.

This change removes Python bytecode artifacts from the packaged backend runtime before Tauri bundles resources/backend into the desktop installer.

On the current packaging path, copyTree() already skips __pycache__, .pyc, and .pyo when copying source files and the standalone CPython runtime. The remaining issue is that build-backend.mjs runs python -m pip install -r requirements.txt inside the bundled runtime, and that step repopulates a large number of bytecode cache files under resources/backend/python. Those files are then shipped verbatim in the installer, which increases file count and slows install-time extraction on affected machines.

This PR fixes the problem in two layers:

  • pass --no-compile to every bundled pip install
  • force PYTHONDONTWRITEBYTECODE=1 for those install subprocesses
  • run a final recursive cleanup pass that removes any remaining __pycache__, .pyc, and .pyo artifacts as a safety net
  • add regression tests covering the install environment helper and recursive bytecode cleanup helper

Validation:

  • node --test scripts/backend/runtime-layout-utils.test.mjs
  • pnpm run test:prepare-resources

Summary by Sourcery

Ensure bundled Python backend excludes bytecode artifacts and add safeguards and tests around runtime dependency installation.

New Features:

  • Introduce utilities to create a Python install environment that suppresses bytecode generation and to recursively prune Python bytecode artifacts from a runtime directory.

Enhancements:

  • Extend backend build script to prevent pip from compiling bytecode during dependency installation and to log summary statistics for any removed bytecode artifacts.

Tests:

  • Add unit tests for the Python runtime layout utilities, including environment setup and bytecode pruning behavior.

@gemini-code-assist
Copy link

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly optimizes the backend packaging process by eliminating Python bytecode artifacts from the bundled runtime. By preventing bytecode generation during dependency installation and implementing a robust cleanup pass, the changes aim to reduce the final installer size and improve extraction performance, addressing a known issue with unnecessary files being shipped.

Highlights

  • Bytecode Prevention: Implemented measures to prevent Python bytecode generation during pip install by adding the --no-compile flag and setting the PYTHONDONTWRITEBYTECODE=1 environment variable.
  • Bytecode Cleanup: Introduced a recursive cleanup mechanism to remove any __pycache__ directories and orphaned .pyc or .pyo files that might still exist after installation.
  • New Utility Functions: Added new helper functions in runtime-layout-utils.mjs for creating Python installation environments and pruning bytecode artifacts.
  • Regression Tests: Included comprehensive regression tests for the newly added bytecode prevention and cleanup logic to ensure correctness and prevent future regressions.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@zouyonghe zouyonghe marked this pull request as ready for review March 25, 2026 00:01
@zouyonghe zouyonghe changed the title [codex] strip Python bytecode from bundled backend fix: strip Python bytecode from bundled backend Mar 25, 2026
Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue

Prompt for AI Agents
Please address the comments from this code review:

## Individual Comments

### Comment 1
<location path="scripts/backend/runtime-layout-utils.mjs" line_range="34" />
<code_context>
+  PYTHONDONTWRITEBYTECODE: '1',
+});
+
+const countFilesInDirectory = (directoryPath) => {
+  let total = 0;
+  for (const entry of fs.readdirSync(directoryPath, { withFileTypes: true })) {
</code_context>
<issue_to_address>
**issue (complexity):** Consider rewriting `prunePythonBytecodeArtifacts` as a single recursive traversal that deletes and counts files without using `countFilesInDirectory`.

You can keep the existing stats API and behavior while simplifying `prunePythonBytecodeArtifacts` to a single-pass traversal and removing the extra full recursion in `countFilesInDirectory`.

Key changes:
- Drop `countFilesInDirectory`.
- Use a single recursive walker that both deletes and counts.
- Track whether you’re inside a `__pycache__` directory to decide which counter to increment.

For example:

```js
const isBytecodeFile = (entryName) => entryName.endsWith('.pyc') || entryName.endsWith('.pyo');

export const prunePythonBytecodeArtifacts = (rootDir) => {
  const stats = {
    removedCacheDirs: 0,
    removedBytecodeFiles: 0,
    removedOrphanBytecodeFiles: 0,
  };

  const visit = (directoryPath, { inPycache = false } = {}) => {
    for (const entry of fs.readdirSync(directoryPath, { withFileTypes: true })) {
      const entryPath = path.join(directoryPath, entry.name);

      if (entry.isDirectory()) {
        if (entry.name === '__pycache__' && !inPycache) {
          stats.removedCacheDirs += 1;
          // Recurse marking we are in a __pycache__ tree
          visit(entryPath, { inPycache: true });
          // Directory should now be empty
          fs.rmSync(entryPath, { force: true });
        } else {
          visit(entryPath, { inPycache });
        }
        continue;
      }

      if (inPycache) {
        // Match previous behavior: count all files inside __pycache__
        stats.removedBytecodeFiles += 1;
        fs.rmSync(entryPath, { force: true });
      } else if (isBytecodeFile(entry.name)) {
        stats.removedOrphanBytecodeFiles += 1;
        fs.rmSync(entryPath, { force: true });
      }
    }
  };

  if (fs.existsSync(rootDir)) {
    visit(rootDir);
  }

  return stats;
};
```

This preserves:
- `removedCacheDirs`: number of `__pycache__` directories removed.
- `removedBytecodeFiles`: number of files removed from within `__pycache__` trees (all files, as before).
- `removedOrphanBytecodeFiles`: number of `.pyc`/`.pyo` files removed outside `__pycache__`.

The control flow is flatter (no extra helper traversal, fewer `continue`s) and the stats are computed in a single pass.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces functionality to prevent and clean up Python bytecode artifacts during the backend build process. It modifies the pip install command to use --no-compile, adds a utility to set the PYTHONDONTWRITEBYTECODE environment variable, and implements a new function to recursively prune __pycache__ directories and orphan bytecode files. New tests are included for these utility functions. The reviewer suggested improving the readability of a log message by using a single template literal.

Comment on lines +576 to +581
console.log(
'[build-backend] removed Python bytecode artifacts ' +
`(${bytecodeCleanupStats.removedCacheDirs} cache dirs, ` +
`${bytecodeCleanupStats.removedBytecodeFiles} cached files, ` +
`${bytecodeCleanupStats.removedOrphanBytecodeFiles} orphan files).`,
);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For improved readability and consistency, this log message can be constructed using a single template literal instead of mixing string concatenation with template literals.

    console.log(`[build-backend] removed Python bytecode artifacts (${bytecodeCleanupStats.removedCacheDirs} cache dirs, ${bytecodeCleanupStats.removedBytecodeFiles} cached files, ${bytecodeCleanupStats.removedOrphanBytecodeFiles} orphan files).`);

@zouyonghe
Copy link
Member Author

@sourcery-ai review

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've reviewed your changes and they look great!


Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@zouyonghe zouyonghe merged commit c5e38a8 into AstrBotDevs:main Mar 25, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] 在打包前删除pycache文件

1 participant