view: opt into htslib dense single-base BED auto-promotion (#2557)#2561
view: opt into htslib dense single-base BED auto-promotion (#2557)#2561carstenerickson wants to merge 1 commit into
Conversation
…ls#2557) Wires bcftools view into the new BCF_SR_AUTO_TARGETS_FROM_REGIONS htslib opt (see samtools/htslib for the underlying sniffer and streaming- targets routing). When -R FILE is given and no -T is set, view sets the opt before calling bcf_sr_set_regions(); htslib's sniffer then decides per-file whether to promote. Why gate on !targets_list: BCF_SR_AUTO_TARGETS_FROM_REGIONS populates readers->targets, which conflicts with a subsequent bcf_sr_set_targets() call. Always-on would silently break the -R FILE -T OTHER workflow, so the opt is suppressed whenever a -t/-T was given. The new --no-regions-fastpath suppresses the opt unconditionally for users that hit corner cases the sniffer accepts incorrectly. End-to-end measurements (synthetic single-base BED, 10bp avg spacing, matching VCF; bcftools 1.23.1-dirty, macOS arm64): N=100K: 11.67s -> 0.17s (69x) N=1M : 156.60s -> 0.73s (215x) The N=1M result puts the production HGDP+1kGP / AADR / PGS panel cases within minutes instead of hours. --no-regions-fastpath, --regions-file without an index, and -R FILE -T OTHER all preserve the existing seek-per-region default. Regression test test_vcf_view_regions_fastpath asserts byte-identical output across the fastpath, --no-regions-fastpath, and -T paths on a 300-entry single-base fixture (>= htslib SNIFF_LINES=256 so the sniffer accepts). Tracks samtools#2557.
b7faa5a to
deb6990
Compare
|
CI fixes pushed (force-push, post-rebase):
|
Summary
Wires
bcftools view -R FILEinto the newBCF_SR_AUTO_TARGETS_FROM_REGIONShtslib opt (samtools/htslib#2011). When-R FILEis given and no-Tis set, view sets the opt beforebcf_sr_set_regions(); htslib's sniffer decides per-file whether to promote a dense single-base BED to the streaming-targets path.Design
-R FILE, no-t/-T-R FILE -T OTHERreaders->targets, conflicting withset_targets)-r REGION(string)-R FILE --no-regions-fastpath--no-regions-fastpathis the user-facing escape hatch for corner cases the sniffer accepts incorrectly.Measurements
End-to-end via
bcftools view -R; single-base BED at 10 bp avg spacing matching the VCF positions; macOS arm64:The 1M case puts production HGDP+1kGP / AADR / PGS panel intersections into the minutes-not-hours regime via the most common entry point.
Test plan
test_vcf_view_regions_fastpath(test/test.pl) asserts byte-identical output across three paths over a 300-entry single-base BED fixture:view -R FILE(fastpath; opt on)view -R FILE --no-regions-fastpath(slow path preserved)view -T FILE(control)Existing tests pass unchanged.
Notes
Depends on samtools/htslib#2011; without that PR, the opt has no effect and view falls back to the slow path with no semantic change.
Tracks #2557.