cast_transpose_noop.h

Functions

void nvte_transpose_with_noop(const NVTETensor input, const NVTETensor noop, NVTETensor output, cudaStream_t stream)

Transposes the input.

Parameters:
  • input[in] Input tensor to be cast.

  • noop[in] If this single element tensor has non-zero value, kernel will exit immediately.

  • output[inout] Output tensor.

  • stream[in] CUDA stream used for the operation.

void nvte_cast_transpose_with_noop(const NVTETensor input, const NVTETensor noop, NVTETensor output, cudaStream_t stream)

Casts and transposes the input.

Parameters:
  • input[in] Input tensor to be cast.

  • noop[in] If this single element tensor has non-zero value, kernel will exit immediately.

  • output[inout] Output quantized tensor.

  • stream[in] CUDA stream used for the operation.