Use faccessat() and atomic_uint for slight performance improvements#934
Open
Use faccessat() and atomic_uint for slight performance improvements#934
Conversation
…re: 1560223) Change 1: On some systems (i.e. those using glibc) faccessat can be faster than eaccess, so this pull request amends libast to prefer that function instead. Benchmark (glibc 2.42, base commit is 79ad217): $ cat /tmp/foo for ((i=0; i!=300; i++)) do rm $(/opt/ast/bin/mktemp) done $ strace -c ksh /tmp/foo faccessat: 100.00 0.053465 7 7328 1217 total eaccess: 100.00 0.054799 6 8828 1217 total Change 2: The current workaround for the GCC optimizer bug affecting job_lock() and job_unlock() (which is actually triggered by -fstrict-aliasing in particular) uses the aso* functions in libast to portably perform atomic operations on job.in_critical. This works, but the generated assembly is inferior compared to the obsolete __sync_fetch_and_add/sub GCC builtins. Specifically, the aso functions called result in executing a total of 66 extra instructions on x86_64 (when compiling on the dev branch with defaults; use 'objdump -S' to see the resulting assembly for the binary). Only the lock instructions are truly necessary; everything else is superfluous. C11's atomic_uint provides the same functionality the old GCC builtinss, so for efficiency it's a better choice. (I initially thought this change sped up subshells, but with further benchmarking I found results to be too inconsistent when using different compilers versions and flags. In any case the C11 solution produces more efficient assembly than the aso*() solution, so it ought be used instead when available.) File changes: src/cmd/ksh93/include/jobs.h: - Use C11 atomic_uint and regular increments/decrements when _Atomic is available. src/lib/libast/features/{common,eaccess}: - Add a _std_atomic test to detect the presence of atomic_uint. - Add a probe for faccessat(). src/lib/libast/{comp/eaccess.c,features/map.c}: - Remap eaccess calls to _ast_eaccess(), which prefers POSIX faccessat() when it's available.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Change 1:
On some systems (i.e. those using glibc) POSIX
faccessat()can be faster thaneaccess(), so this pull request amends libast to prefer that function instead. Benchmark (glibc 2.42, base commit is 79ad217):Change 2:
The current workaround for the GCC optimizer bug affecting
job_lock()andjob_unlock()(re: 1560223, 52067c3, 595a0a5, 52067c3, 07cc71b) (which is actually triggered by using-fstrict-aliasingin particular) uses theaso*()functions in libast to portably perform atomic operations onjob.in_critical. This works, but the generated assembly is inferior compared to the obsolete__sync_fetch_and_add/subGCC builtins. Specifically, the aso functions called result in executing a total of 66 extra instructions on x86_64 (when compiling on the dev branch with defaults; useobjdump -Sto see the resulting assembly for the binary).Only the lock instructions are truly necessary; everything else is superfluous. C11's
atomic_uintprovides the same functionality the old GCC builtins have, so for efficiency it's a better choice. (I initially thought this change sped up subshells, but with further benchmarking I found results to be too inconsistent when using different compilers versions and flags. In any case the C11 solution produces more efficient assembly than the libast ASO solution, so it ought be used instead when available.)src/cmd/ksh93/include/jobs.h:
atomic_uintand regular increments/decrements when_Atomicis available.src/lib/libast/features/{common,eaccess}:
_std_atomictest to detect the presence ofatomic_uint.faccessat().src/lib/libast/{comp/eaccess.c,features/map.c}:
eaccesscalls to_ast_eaccess(), which now prefers POSIXfaccessat().