@@ -44,96 +44,3 @@ julia> GPUArrays.unsafe_free!(cache)
4444For a more sophisticated real-world example, see how
4545[ GaussianSplatting.jl] ( https://github.com/JuliaNeuralGraphics/GaussianSplatting.jl/blob/e4ef1324c187371e336bef875b053023afe7fb2c/src/training.jl#L183 )
4646handles it.
47-
48- ## Avoid triggering Hostcalls
49-
50- Some functions in the kernel may cause an exception,
51- capturing the original value of the variable that caused it.
52- These are usually related to float-to-integer conversion, so functions like
53- ` Int(1.0), ceil(Int, 1.0), floor(Int, 1.0) ` will cause it.
54-
55- This will perform dynamic memory allocation and launch a ` Hostcall ` for that,
56- which will sit in the background thread until kernel finishes execution and the user synchronizes the ` stream ` .
57- Having a hostcall unnecessarily slows execution down and you can avoid that by using
58- "GPU-friendly" version of the function.
59-
60- !!! info "Hostcalls"
61- Hostcalls should be used mostly for debugging. When performance matters, they should be avoided.
62-
63- For example, let's see how we may deal with ` ceil(Int, x) ` and convert it to GPU-friendly version.
64-
65- Starting with the bad example:
66-
67- ``` jldoctest hostcall
68- julia> function bad_kernel!(y, x)
69- @inbounds y[1] = ceil(Int, x[1])
70- return
71- end
72- bad_kernel! (generic function with 1 method)
73-
74- julia> x = ROCArray(Float32[1.1f0]);
75-
76- julia> y = ROCArray(zeros(Int, 1));
77-
78- julia> @roc bad_kernel!(y, x);
79- ┌ Info: Global hostcalls detected!
80- │ - Source: MethodInstance for bad_kernel!(::AMDGPU.Device.ROCDeviceVector{Int64, 1}, ::AMDGPU.Device.ROCDeviceVector{Float32, 1})
81- │ - Hostcalls: [:malloc_hostcall]
82- │
83- │ Use `AMDGPU.synchronize(; stop_hostcalls=true)` to synchronize and stop them.
84- └ Otherwise, performance might degrade if they keep running in the background.
85-
86- julia> y
87- 1-element ROCArray{Int64, 1, AMDGPU.Runtime.Mem.HIPBuffer}:
88- 2
89-
90- julia> AMDGPU.synchronize(; stop_hostcalls=true)
91- [ Info: Stopped global hostcall: `malloc_hostcall`.
92- ```
93-
94- Here we can see that using "un-optimized" version of ` ceil(Int, x[1]) `
95- causes a ` malloc_hostcall ` to be launched.
96- Which we then have to stop by passing ` stop_hostcalls=true ` to the synchronization functions.
97-
98- We can avoid this by using "unsafe" version that avoids checking for errors under-the-hood.
99-
100- ``` jldoctest hostcall
101- julia> function good_kernel!(y, x)
102- @inbounds y[1] = unsafe_trunc(Int, ceil(x[1]))
103- return
104- end
105- good_kernel! (generic function with 1 method)
106-
107- julia> fill!(y, 0);
108-
109- julia> @roc good_kernel!(y, x);
110-
111- julia> AMDGPU.synchronize(; stop_hostcalls=true) # Nothing is printed, so no hostcall was launched & stopped.
112-
113- julia> y
114- 1-element ROCArray{Int64, 1, AMDGPU.Runtime.Mem.HIPBuffer}:
115- 2
116- ```
117-
118- By doing ` ceil(x[1]) ` first, then "unsafely" converting ` Float32 ` to ` Int `
119- we can avoid error-checking & hostcalls.
120-
121- We can compare LLVM IR of the function that converts ` Float32 ` to ` Int ` to see how they differ:
122-
123- ::: tabs
124-
125- == unsafe_trunc(Int, 1.0)
126-
127- ``` @example good-conversion
128- using InteractiveUtils
129- InteractiveUtils.@code_llvm unsafe_trunc(Int, 1.0)
130- ```
131-
132- == Int(1.0)
133-
134- ``` @example bad-conversion
135- using InteractiveUtils
136- InteractiveUtils.@code_llvm Int(1.0)
137- ```
138-
139- :::
0 commit comments