Skip to content
Draft
Show file tree
Hide file tree
Changes from 18 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
0c7d08f
nfpm.native_libs.scripts: add aiohttp requirement
cognifloyd Aug 7, 2025
950b563
nfpm.native_libs.scripts: Add deb_search_for_sonames with tests
cognifloyd Aug 8, 2025
0643307
nfpm.native_libs: add deb_search_for_sonames rule
cognifloyd Oct 9, 2025
5908911
nfpm.native_libs: add integration test for scripts.deb_search_for_son…
cognifloyd Oct 9, 2025
7ae7baa
nfpm.native_libs.scripts: Refactor to preserve API response data
cognifloyd Oct 11, 2025
92df36e
nfpm.native_libs.scripts: Add aiohttp-retry requirement
cognifloyd Oct 11, 2025
3c8d6cd
nfpm.native_libs.scripts: Use aiohttp-retry on flaky API
cognifloyd Oct 11, 2025
fb24260
nfpm.native_libs.scripts: Customize User-Agent header
cognifloyd Oct 11, 2025
ff024ec
nfpm.native_libs.scripts: Move TODO about search API errors
cognifloyd Oct 11, 2025
fd8f2d9
nfpm.native_libs.scripts: Add from_best_so_files ld.so-like filtering
cognifloyd Oct 11, 2025
945664b
nfpm.native_libs: rename package scripts->deb
cognifloyd Oct 11, 2025
2e012bb
nfpm.native_libs: move search_for_sonames rule into deb package
cognifloyd Oct 11, 2025
d8ef4e8
nfpm.native_libs: register rules after moving to deb package
cognifloyd Oct 11, 2025
0cf7065
nfpm.native_libs: regenerate user_reqs.lock to include new deps
cognifloyd Nov 3, 2025
70f9619
nfpm.native_libs: Include native_libs.deb in docs/notes
cognifloyd Nov 6, 2025
d775ddb
nfpm.native_libs.deb: add rules to prevent py deps on :scripts
cognifloyd Nov 21, 2025
139e595
Revert "nfpm.native_libs.deb: add rules to prevent py deps on :scripts"
cognifloyd Nov 21, 2025
6789c47
nfpm.native_libs.deb: add usage comments
cognifloyd Nov 21, 2025
74e9cad
Merge branch 'main' into cognifloyd/nfpm-native_libs-deb-scripts
cognifloyd Nov 25, 2025
cc84f68
nfpm.native_libs.deb: drop /usr/local comments
cognifloyd Nov 27, 2025
7fbb73a
nfpm.native_libs.deb: split tests into separate files
cognifloyd Nov 27, 2025
bea35f5
Merge branch 'main' into cognifloyd/nfpm-native_libs-deb-scripts
cognifloyd Dec 1, 2025
ae56961
typo fix
cognifloyd Dec 1, 2025
78dd63c
nfpm.native_libs.deb: refactor script error handling
cognifloyd Dec 1, 2025
cfb4fa7
nfpm.native_libs.deb: new deb subsystem for configurable search URLs
cognifloyd Dec 2, 2025
3d5de60
Merge branch 'main' into cognifloyd/nfpm-native_libs-deb-scripts
cognifloyd Dec 2, 2025
606163c
nfpm.native_libs.deb: protect against "Client Challenge" response
cognifloyd Dec 2, 2025
38b122c
nfpm.native_libs.deb: drop unused dict (moved to subsystem)
cognifloyd Dec 3, 2025
57e0845
nfpm.native_libs.deb: comment out tests w/ blocked API requests
cognifloyd Dec 3, 2025
7c34a10
Merge branch 'main' into cognifloyd/nfpm-native_libs-deb-scripts
cognifloyd Dec 3, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions 3rdparty/python/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,8 @@ node-semver==0.9.0


# These dependencies are for scripts that rules run in an external process (and for script tests).
aiohttp==3.12.15 # see: pants.backends.nfpm.native_libs.deb
aiohttp-retry==2.9.1 # see: pants.backends.nfpm.native_libs.deb
elfdeps==0.2.0 # see: pants.backends.nfpm.native_libs.elfdeps
# These dependencies are only for debugging Pants itself (in VSCode/PyCharm respectively),
# and should never be imported.
Expand Down
522 changes: 522 additions & 0 deletions 3rdparty/python/user_reqs.lock

Large diffs are not rendered by default.

2 changes: 2 additions & 0 deletions 3rdparty/python/user_reqs.lock.metadata
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@
"generated_with_requirements": [
"PyGithub==2.8.1",
"PyYAML<7.0,>=6.0",
"aiohttp-retry==2.9.1",
"aiohttp==3.12.15",
"ansicolors==1.1.8",
"beautifulsoup4==4.11.1",
"chevron==0.14.0",
Expand Down
1 change: 1 addition & 0 deletions docs/notes/2.31.x.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,7 @@ So, this backend provides a simplified set of features from these native packagi
- `rpm`: `elfdeps` analyzes ELF metadata and `rpmdeps` adds the requirements to the `.rpm` file.

This backend should be platform-agnostic, allowing it to run wherever `pants` can run. To do this, it relies on the [`elfdeps`](https://github.com/python-wheel-build/elfdeps) [📦](https://pypi.org/project/elfdeps/) pure-python package (a "Python implementation of RPM `elfdeps`" with its pure-python dep [`pyelftools`](https://github.com/eliben/pyelftools) [📦](https://pypi.org/project/pyelftools/)) for analyzing `ELF` libraries.
Then, for `deb` packages, this backend queries official [`debian`](https://packages.debian.org/search) or [`ubuntu`](https://packages.ubuntu.com/search) API (over HTTPS) to lookup which package(s) contain required libraries. Using the package search API avoids the local package metadata stores that can only find dependency packages if they are installed (such stores are not just distribution-specific, they are specific to a single release of a distribution).

Please provide feedback on this backend [here](https://github.com/pantsbuild/pants/discussions/22396).

Expand Down
11 changes: 11 additions & 0 deletions src/python/pants/backend/nfpm/native_libs/deb/BUILD
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Copyright 2025 Pants project contributors (see CONTRIBUTORS.md).
# Licensed under the Apache License, Version 2.0 (see LICENSE).

python_sources(
overrides={"rules.py": dict(dependencies=["./search_for_sonames.py"])},
)

python_tests(
name="tests",
overrides={"search_for_sonames_integration_test.py": dict(timeout=150)},
)
Empty file.
225 changes: 225 additions & 0 deletions src/python/pants/backend/nfpm/native_libs/deb/rules.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,225 @@
# Copyright 2025 Pants project contributors (see CONTRIBUTORS.md).
# Licensed under the Apache License, Version 2.0 (see LICENSE).

from __future__ import annotations

import importlib.metadata
import json
import logging
import sys
from collections.abc import Iterable, Mapping
from dataclasses import dataclass, replace
from pathlib import PurePath

from pants.backend.python.util_rules.pex import PexRequest, VenvPexProcess, create_venv_pex
from pants.backend.python.util_rules.pex_environment import PythonExecutable
from pants.backend.python.util_rules.pex_requirements import PexRequirements
from pants.engine.fs import CreateDigest, FileContent
from pants.engine.internals.native_engine import UnionRule
from pants.engine.internals.selectors import concurrently
from pants.engine.intrinsics import create_digest, execute_process
from pants.engine.process import FallibleProcessResult
from pants.engine.rules import Rule, collect_rules, implicitly, rule
from pants.init.import_util import find_matching_distributions
from pants.util.logging import LogLevel
from pants.util.resources import read_resource
from pants.version import VERSION

logger = logging.getLogger(__name__)

_NATIVE_LIBS_DEB_PACKAGE = "pants.backend.nfpm.native_libs.deb"
_SEARCH_FOR_SONAMES_SCRIPT = "search_for_sonames.py"
_PEX_NAME = "native_libs_deb.pex"


@dataclass(frozen=True)
class DebSearchForSonamesRequest:
distro: str
distro_codename: str
debian_arch: str
sonames: tuple[str, ...]
from_best_so_files: bool

def __init__(
self,
distro: str,
distro_codename: str,
debian_arch: str,
sonames: Iterable[str],
*,
from_best_so_files: bool = False,
):
object.__setattr__(self, "distro", distro)
object.__setattr__(self, "distro_codename", distro_codename)
object.__setattr__(self, "debian_arch", debian_arch)
object.__setattr__(self, "sonames", tuple(sorted(sonames)))
object.__setattr__(self, "from_best_so_files", from_best_so_files)


@dataclass(frozen=True)
class DebPackagesPerSoFile:
so_file: str
packages: tuple[str, ...]

def __init__(self, so_file: str, packages: Iterable[str]):
object.__setattr__(self, "so_file", so_file)
object.__setattr__(self, "packages", tuple(sorted(packages)))


_TYPICAL_LD_PATH_PATTERNS = (
# platform specific system libs (like libc) get selected first
# "/usr/local/lib/*-linux-*/",
"/lib/*-linux-*/",
"/usr/lib/*-linux-*/",
# Then look for a generic system libs
# "/usr/local/lib/",
"/lib/",
"/usr/lib/",
# Anything else has to be added manually to dependencies.
# These rules cannot use symbols or shlibs metadata to inform package selection.
)


@dataclass(frozen=True)
class DebPackagesForSoname:
soname: str
packages_per_so_files: tuple[DebPackagesPerSoFile, ...]

def __init__(self, soname: str, packages_per_so_files: Iterable[DebPackagesPerSoFile]):
object.__setattr__(self, "soname", soname)
object.__setattr__(self, "packages_per_so_files", tuple(packages_per_so_files))

@property
def from_best_so_files(self) -> DebPackagesForSoname:
"""Pick best so_files from packages_for_so_files using a simplified ld.so-like algorithm.

The most preferred is first. This is NOT a recursive match; Only match if direct child of
ld_path_patt dir. Anything that uses a subdir like /usr/lib/<app>/lib*.so.* uses rpath to
prefer the app's libs over system libs. If this vastly simplified form of ld.so-style
matching does not select the correct libs, then the package(s) that provide the shared lib
should be added manually to the nfpm requires field.
"""
if len(self.packages_per_so_files) <= 1: # shortcut; no filtering required for 0-1 results.
return self

remaining = list(self.packages_per_so_files)

packages_per_so_files = []
for ld_path_patt in _TYPICAL_LD_PATH_PATTERNS:
for packages_per_so_file in remaining[:]:
if PurePath(packages_per_so_file.so_file).parent.match(ld_path_patt):
packages_per_so_files.append(packages_per_so_file)
remaining.remove(packages_per_so_file)

return replace(self, packages_per_so_files=tuple(packages_per_so_files))


@dataclass(frozen=True)
class DebPackagesForSonames:
packages_for_sonames: tuple[DebPackagesForSoname, ...]

@classmethod
def from_dict(cls, raw: Mapping[str, Mapping[str, Iterable[str]]]) -> DebPackagesForSonames:
return cls(
tuple(
DebPackagesForSoname(
soname,
(
DebPackagesPerSoFile(so_file, packages)
for so_file, packages in files_to_packages.items()
),
)
for soname, files_to_packages in raw.items()
)
)

@property
def from_best_so_files(self) -> DebPackagesForSonames:
packages = []
for packages_for_soname in self.packages_for_sonames:
packages.append(packages_for_soname.from_best_so_files)
return DebPackagesForSonames(tuple(packages))


@rule
async def deb_search_for_sonames(
request: DebSearchForSonamesRequest,
) -> DebPackagesForSonames:
script = read_resource(_NATIVE_LIBS_DEB_PACKAGE, _SEARCH_FOR_SONAMES_SCRIPT)
if not script:
raise ValueError(
f"Unable to find source of {_SEARCH_FOR_SONAMES_SCRIPT!r} in {_NATIVE_LIBS_DEB_PACKAGE}"
)

script_content = FileContent(
path=_SEARCH_FOR_SONAMES_SCRIPT, content=script, is_executable=True
)

# Pull python and requirements versions from the pants venv since that is what the script is tested with.
pants_python = PythonExecutable.fingerprinted(
sys.executable, ".".join(map(str, sys.version_info[:3])).encode("utf8")
)
distributions_in_pants_venv: list[importlib.metadata.Distribution] = list(
find_matching_distributions(None)
)
constraints = tuple(f"{dist.name}=={dist.version}" for dist in distributions_in_pants_venv)
requirements = { # requirements (and transitive deps) are constrained to the versions in the pants venv
"aiohttp",
"aiohttp-retry",
"beautifulsoup4",
}
Comment on lines +158 to +170
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This requests the same python executable that pants is using, and constrains deps to the packages that are in the pants venv.

I searched the pants codebase for anything that did something similar, landing on this solution based on code from plugin dependency resolution code:

  • python: PythonExecutable | None = None
    if not request.interpreter_constraints:
    python = PythonExecutable.fingerprinted(
    sys.executable, ".".join(map(str, sys.version_info[:3])).encode("utf8")
    )
  • def to_requirement(d):
    return f"{d.name}=={d.version}"
    distributions: list[importlib.metadata.Distribution] = []
    if self._inherit_existing_constraints:
    distributions = list(find_matching_distributions(None))
    request = PluginsRequest(
    self._interpreter_constraints,
    tuple(to_requirement(dist) for dist in distributions),
    tuple(requirements),
    )

I had some hackier approaches to doing this earlier, like skipping ICs and package versions altogether and letting pex just figure it out. A less hacky approach might be creating a subsystem and lockfile, but giving the user knobs to change deps feels like a footgun in this case. For most scripts in other backends, the script is a very lightweight wrapper around some external library, but this script has to much logic to be considered a "wrapper".

So, logically, the script really should run in the pants venv, or at least that feels less smelly.

I think this should support remote execution, right? I don't have access to an REAPI server, so I can't test how this behaves when, for example, the REAPI server doesn't have the same version of python that pants-itself is running under.

Are there any other gotchas I missed here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't REAPI friendly, for the reason you state (the python interpreter is an external dependency). You might want to set the process_execution_environment arg to execute_process() to force it to run locally?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't REAPI friendly, for the reason you state (the python interpreter is an external dependency).

I was afraid of that. I wonder if we can use the PBS provider backend without forcing it to be enabled for all python rules?

You might want to set the process_execution_environment arg to execute_process() to force it to run locally?

I hate to lose the potential benefits of REAPI. But, making that explicit would probably be wise if I stick with this approach.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I really just need to add yet another lockfile for this. Avoiding the complexity of lockfiles (or of exposing too many knobs) just moved the complexity elsewhere.


script_digest, venv_pex = await concurrently(
create_digest(CreateDigest([script_content])),
create_venv_pex(
**implicitly(
PexRequest(
output_filename=_PEX_NAME,
internal_only=True,
python=pants_python,
requirements=PexRequirements(
requirements,
constraints_strings=constraints,
description_of_origin=f"Requirements for {_PEX_NAME}:{_SEARCH_FOR_SONAMES_SCRIPT}",
),
)
)
),
)

result: FallibleProcessResult = await execute_process(
**implicitly(
VenvPexProcess(
venv_pex,
argv=(
script_content.path,
f"--user-agent-suffix=pants/{VERSION}",
f"--distro={request.distro}",
f"--distro-codename={request.distro_codename}",
f"--arch={request.debian_arch}",
*request.sonames,
),
input_digest=script_digest,
description=f"Search deb packages for sonames: {request.sonames}",
level=LogLevel.DEBUG,
)
)
)

if result.exit_code == 0:
packages = json.loads(result.stdout)
else:
# The search API returns 200 even if no results were found.
# A 4xx or 5xx error means we gave up retrying because the server is unavailable.
# TODO: Should this raise an error instead of just a warning?
logger.warning(result.stderr.decode("utf-8"))
packages = {}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any thoughts here? So far, I've just treated an error as if it returned an empty response. The script already handles retrying with exponentional+jitter backoff, so a failure here means we already hit multiple server errors by this point. Should this actually fail instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should probably fail if the error is server-side (5xx) as those are ephemeral, so the user can retry later. But if it is client side (4xx) I guess this means... what exactly? a bug in this code, presumably. So yeah, should probably fail in that case too...? I don't see that artificially returning an empty set (and caching that fact!) is going to be good.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll update this to be an error.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to add a test that covers the error case, so I'm marking this as draft while I figure that out.


deb_packages_for_sonames = DebPackagesForSonames.from_dict(packages)
if request.from_best_so_files:
return deb_packages_for_sonames.from_best_so_files
return deb_packages_for_sonames


def rules() -> Iterable[Rule | UnionRule]:
return collect_rules()
Loading
Loading