Conv GPU Kernel Without Cudnn (#916)
* feat: add UseCudnnOnGpu in Operator and fix conv op * feat: add WithCudnn and WithoutCudnn in ConvKernel<kGPU, T> * feat: add CUDA NCDHWIm2ColGpu kernel and compile done * refine: rename Im2ColNCDHWGpu() to NCDHWIm2ColGpu() * fix: reverse update about int32_t to int64_t in BlocksNum4ThreadNum() * feat: add CUDA NCDHWCol2ImGpu kernel * refactor: extract InitSharedArrays() for device code * feat: add CUDA NDHWCIm2ColGpu kernel * feat: add CUDA NDHWCCol2ImGpu kernel and fix typos * fix: fix the bug of calc im_offset when NDHWCIm2ColGpu * refactor: extract Im2ColCalcKernelAndOutIndex() and Im2ColCalcImIndex() * fix: fix format and the missing shared_im[] parameter in Im2ColCalcImIndex() * refactor: merge NCDHWCol2ImGpu() and NDHWCCol2ImGpu() into Col2ImGpu() * refactor: merge NCDHWIm2ColGpu() and NDHWCIm2ColGpu() into Im2ColGpu() * feat: add class ConvKernelImplByIm2Col between ConvKernelIf and ConvKernel; compile done, to be run * fix: add explicit template instantiation for ConvKernelUtil * refine: remove unused class function declaration: KernelInitWithoutCudnn e.g. * fix(operator/conv_op.cpp): make sure UseCudnnOnGpu() == true when infer cudnn algo * refine(kernel/conv_kernel.cu): let the gpu kernel function be inside the anoymous namespace * refactor: add dim_num as the template paramter of Im2ColGpu() and Col2ImGpu() * refactor: add is_channel_first as the template paramter of Im2ColGpu() and Col2ImGpu() * refine(kernel/conv_kernel.cu): add #undef IM2COL_FUNC_CALL * refine(kernel/conv_kernel.cu): add dim_num as the template parameter of InitSharedMemory() * fix(kernel/conv_kernel.cu): fix the bug of use col_offset in Im2ColGpu()
Showing
- oneflow/core/kernel/conv_kernel.cpp 56 additions, 66 deletionsoneflow/core/kernel/conv_kernel.cpp
- oneflow/core/kernel/conv_kernel.cu 390 additions, 3 deletionsoneflow/core/kernel/conv_kernel.cu
- oneflow/core/kernel/conv_kernel.h 103 additions, 32 deletionsoneflow/core/kernel/conv_kernel.h
- oneflow/core/kernel/kernel.h 2 additions, 1 deletiononeflow/core/kernel/kernel.h
- oneflow/core/operator/conv_op.cpp 4 additions, 4 deletionsoneflow/core/operator/conv_op.cpp
- oneflow/core/operator/operator.cpp 0 additions, 4 deletionsoneflow/core/operator/operator.cpp
- oneflow/core/operator/operator.h 2 additions, 1 deletiononeflow/core/operator/operator.h
Please register or sign in to comment