Commit 169e98e
authored
[gpu] performance and functionality improvements (#1265)
* [gpu] performance and functionality improvements
* Capturing disk usage statistics to reduce excessive disk space
* created exit handler to clean up environment on completion or failure
* created prepare function to prepare for the installation
* when sufficient memory is available, configure a ramdisk
* reduce noise by turning off -x in utility functions
* added descriptive comments before the obscurely coded
compare_versions_lte and compare_versions_lt functions
* removed some intermediate driver versions
* added cuda url for 12.6
* execute_with_retries now logs on failure, captures runtime and
cleans before installing on debian
* saving OS installation and NV .run files and their temp files to ramdisk
* piping source .xz file directly xz instead of saving to disk first
* new utility function "is_debuntu" checks for the frequently used
conditon of whether the running OS is either debian or ubuntu
* added support for specifying an http proxy (thank you प्रकाश)
* moving load of kernel module to later in the code and exercising
modprobe of all modules to avoid regression
* fixed problem with attempting to fetch from incorrect vault
directory when rocky kernel package is not found in primary repo
* using correct cran-r signing key for ubuntu18
* corrected file check condition for /etc/apt/trusted.gpg
* do not update all packages on rocky ; move preparation to prepare function
* increasing memory to make use of ramdisk
* using something a little smaller
* create mount_ramdisk function and call it ; fix up the version comparison functions ; create ge and le comparisons for OSs
* iterating better, caching results of system calls ; renamed to repair_old_backports
* comparing correct version numbers
* rocky uses a tmpfs on /tmp in the base image
* tested on rocky and ubuntu
* tested harder on rocky
* cuda 11 no longer available for debian 12
* cuda v11 no longer supported on debian12
* corrected use of ubuntu regex for rocky version
* re-enabling spark job tests
* correct a couple of edge cases
* added instructions for manually running tests
* open a monitor session by default
* cleaning up cuda and cudnn url generation
* condition better
* cleaned up generation of NVIDIA_CUDA_URL
* updated versions and GPU accelerators in the documentation
* ensure this test to be skipped based on cuda version rather than dataproc version alone
* fix for /usr/local/cuda-12.4/bin/nvcc: No such file or directory
* correcting path to run-bazel-tests.sh
* runing variable definition
* cleaned up skip conditions
* order of operations
* works with 2.0-rocky8
* remove redundant conditional check
* supported version limits are tightened up a bit ; clean up rocky vault install code
* corrected syntax errors
* failure to run dnf here should not fail the entire installer
* order matters here
* 2.2-ubuntu22 works with cuda 11, other 2.2 do not
* 2.2-ubuntu22 works with cuda 11, other 2.2 do not
* fixes ubuntu22 kernel version mismatch error
* disabling rocky9 builds due to out of date base dataproc image
* cuda 2.0 not supported in debian12
* some 2.0-rocky8 single instance tests fail
* intended to use <= and not >=
* simplify gpu resource script
* setting default discoveryScript ; testing pyspark in its own function
* remove spark: prefix from property names
* comment out quite a few tests
* new version numbers
* fixed a syntax error with documentation
* musn't forget the commas
* half as many tasks with twice as much cpu and gpu each
* pause before first ssh ; correct variable name1 parent da3d8c1 commit 169e98e
File tree
8 files changed
+735
-271
lines changed- gpu
8 files changed
+735
-271
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
14 | 14 | | |
15 | 15 | | |
16 | 16 | | |
17 | | - | |
| 17 | + | |
18 | 18 | | |
19 | | - | |
| 19 | + | |
20 | 20 | | |
21 | | - | |
22 | | - | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
23 | 25 | | |
24 | 26 | | |
25 | 27 | | |
| |||
28 | 30 | | |
29 | 31 | | |
30 | 32 | | |
31 | | - | |
32 | 33 | | |
33 | 34 | | |
34 | 35 | | |
| |||
47 | 48 | | |
48 | 49 | | |
49 | 50 | | |
50 | | - | |
51 | | - | |
| 51 | + | |
| 52 | + | |
52 | 53 | | |
53 | 54 | | |
54 | 55 | | |
55 | 56 | | |
56 | 57 | | |
57 | 58 | | |
58 | | - | |
59 | | - | |
| 59 | + | |
60 | 60 | | |
61 | 61 | | |
62 | 62 | | |
| |||
90 | 90 | | |
91 | 91 | | |
92 | 92 | | |
93 | | - | |
94 | | - | |
| 93 | + | |
| 94 | + | |
95 | 95 | | |
96 | 96 | | |
97 | 97 | | |
| |||
136 | 136 | | |
137 | 137 | | |
138 | 138 | | |
139 | | - | |
| 139 | + | |
140 | 140 | | |
141 | | - | |
| 141 | + | |
142 | 142 | | |
143 | 143 | | |
144 | | - | |
| 144 | + | |
145 | 145 | | |
146 | 146 | | |
147 | 147 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
0 commit comments