Conversation
| cudaStream_t GetCudaStream() const override { | ||
| return g_stream_override.OverrideStream(stream_); | ||
| #ifdef K2_WITH_CUDA | ||
| return g_stream_override.OverrideStream(0); |
There was a problem hiding this comment.
Here, in Pytorch_Context.cu, we get the stream by c10::cuda::getCurrentCudaStream, I read the source code of Pytorch, if we don't setCurrentCudaStream the return stream is always the default stream (i.e. stream 0).
So, can we use the default stream 0 here? (0 might not be good here, we can add a macro some like kDefaultStream)
| #ifdef K2_WITH_CUDA | ||
| DeviceGuard guard(gpu_id_); | ||
| // the default stream is 0 | ||
| auto ret = allocator_->DeviceAllocate(&p, bytes); |
There was a problem hiding this comment.
Here, the allocator allocates memory with default stream 0, we can pass a stream to it as well.
What is the use case? That is, how will we use k2 with cuda but without PyTorch? |
For decoding purpose, if we use onnxruntime or tensorRT as the inference engine, I don't think we should depend on Pytorch at that time. The inference engine might have allocator, though, I am not sure. |
In that case, I would suggest creating another context that wraps the allocator from the given framework. |
Yes, we will see, just to fix the |
The purpose of this PR is to fix the
default_context.cuso that we can use k2 without Pytorch. There are still some issues to be fixed.NvToolExt.hNot found error (if K2_ENABLE_NVTX=ON). I found that the cuda env was handle by Pytorch before, I am readingFindCUDA.cmakein Pytorch and try to extract some script there.Contexthas its own allocator instance? Which one is better?Even though this is a small fixes, allocator is the fundamental part, I may miss someting important, hope for your suggestions.