cast_transpose_noop.h
Functions
-
void nvte_transpose_with_noop(const NVTETensor input, const NVTETensor noop, NVTETensor output, cudaStream_t stream)
Transposes the input.
- Parameters:
input – [in] Input tensor to be cast.
noop – [in] If this single element tensor has non-zero value, kernel will exit immediately.
output – [inout] Output tensor.
stream – [in] CUDA stream used for the operation.
-
void nvte_cast_transpose_with_noop(const NVTETensor input, const NVTETensor noop, NVTETensor output, cudaStream_t stream)
Casts and transposes the input.
- Parameters:
input – [in] Input tensor to be cast.
noop – [in] If this single element tensor has non-zero value, kernel will exit immediately.
output – [inout] Output quantized tensor.
stream – [in] CUDA stream used for the operation.