The Python package currently has two separate tracks:
- Offline CLI generation via
ph create - Offline CLI generation via
ph bulk-create - In-process runtime creation via
build_table()
That split is deliberate.
ph create stays aligned with the existing offline PerfectHashCreate
workflow. ph bulk-create stays aligned with the existing offline
PerfectHashBulkCreate workflow. build_table() is the start of the
programmatic Python runtime API.
The ph create command is the Python-facing wrapper around the existing
offline C create contract.
Conceptually:
- Parse a modern Python CLI
- Translate that into the C
PerfectHashCreateargv shape - Execute the offline create binary
Current command shape:
ph create KEYS_PATH OUTPUT_DIR --hash-function HASH_FUNCTION [OPTIONS]ph bulk-create KEYS_DIR OUTPUT_DIR --hash-function HASH_FUNCTION [OPTIONS]Examples:
env -u PYTHONPATH uv run python -m perfecthash create \
keys/example.keys \
out \
--hash-function MultiplyShiftR \
--emit-c-argvenv -u PYTHONPATH uv run python -m perfecthash create \
keys/example.keys \
out \
--hash-function MultiplyShiftR \
--graph-impl 3 \
--create-binary /home/trentn/src/perfecthash/build/bin/PerfectHashCreate \
--dry-runCurrent ph create supports a small but real subset of the C create surface:
--hash-function--maximum-concurrency--compile--disable-csv-output-file--do-not-try-use-hash16-impl--graph-impl--max-solve-time-in-seconds
It also supports:
--emit-c-argv--dry-run--create-binary
The initial ph bulk-create command supports the same overall execution model,
plus a first useful subset of bulk-specific flags such as:
--compile--skip-test-after-create--quiet--disable-csv-output-file--omit-csv-row-if-table-create-failed--omit-csv-row-if-table-create-succeeded
The current programmatic API entry point is:
from perfecthash import build_tableExample:
from perfecthash import build_table
keys = [1, 3, 5, 7, 11, 13, 17, 19]
with build_table(keys, hash_function="MultiplyShiftR") as table:
print(table.backend)
print(table.hash_function)
print(table.key_count)
print(table.index(13))
print(table.index_many(keys))Current behavior:
- The fast path stays native.
- Python is mostly coordinating input normalization and object lifetime.
- The current implementation uses the
rawdog_jitonline runtime path.
Current Table surface is intentionally small:
index()index_many()close()- context-manager support
- metadata:
backendhash_functionkey_countlibrary_path
The currently curated supported set is:
MultiplyShiftRMultiplyShiftRXMulshrolate1RXMulshrolate2RXMulshrolate3RXMulshrolate4RX
These names are preserved exactly as they appear in the C codebase.
Offline CLI:
- Not all C create options are exposed yet.
- Binary/library discovery is better aligned with installed prefixes now, but packaged distribution layout still needs to be finalized.
Programmatic API:
- Current backend is
rawdog_jit. - Current binding path is ABI-level
ctypes. - Current key support is focused on Python integer sequences and 32-bit keys.
- The API is still missing higher-level conveniences such as value binding and richer input normalization.
Near-term:
- expand the programmatic
TableAPI carefully without pulling table logic into Python - extend
ph createoption coverage where it makes sense - add
ph bulk-create
Longer-term:
- decide whether the production native Python path remains ABI-based or grows a compiled extension once wheel packaging is in place
- improve packaged binary discovery instead of relying on development-tree heuristics
The current Python package now prefers installation-oriented discovery paths before it falls back to source-tree builds.
For binaries and libraries, the search order is roughly:
- Explicit binary/library env vars
- Package-bundled native dirs under
perfecthash/_native/ - Explicit install-prefix env vars such as
PERFECTHASH_PREFIX - The active Python / conda prefix (
sys.prefix,CONDA_PREFIX) - Development-tree fallback paths
That is meant to make conda or wheel-style end-user installs the default case, with source-tree builds as the fallback rather than the primary assumption.
For developers working from a source checkout, the recommended workflow is:
./scripts/install-python-native-prefix.sh
export PERFECTHASH_PREFIX="$PWD/.perfecthash-prefix"
env -u PYTHONPATH uv syncThat gives you:
- editable Python code from the source tree
- native binaries/libraries installed to a repo-local prefix
- discovery behavior that is much closer to conda/wheel installs than ad hoc build-tree probing