- Jun 15, 2018
-
-
Li Xinqi authored
* is head regst desc where sharing same mem * fix typo
-
- Jun 14, 2018
-
-
ShawnXuan authored
-
- Jun 13, 2018
-
-
chengtbf authored
-
Niu Chong authored
* feat: add UseCudnnOnGpu in Operator and fix conv op * feat: add WithCudnn and WithoutCudnn in ConvKernel<kGPU, T> * feat: add CUDA NCDHWIm2ColGpu kernel and compile done * refine: rename Im2ColNCDHWGpu() to NCDHWIm2ColGpu() * fix: reverse update about int32_t to int64_t in BlocksNum4ThreadNum() * feat: add CUDA NCDHWCol2ImGpu kernel * refactor: extract InitSharedArrays() for device code * feat: add CUDA NDHWCIm2ColGpu kernel * feat: add CUDA NDHWCCol2ImGpu kernel and fix typos * fix: fix the bug of calc im_offset when NDHWCIm2ColGpu * refactor: extract Im2ColCalcKernelAndOutIndex() and Im2ColCalcImIndex() * fix: fix format and the missing shared_im[] parameter in Im2ColCalcImIndex() * refactor: merge NCDHWCol2ImGpu() and NDHWCCol2ImGpu() into Col2ImGpu() * refactor: merge NCDHWIm2ColGpu() and NDHWCIm2ColGpu() into Im2ColGpu() * feat: add class ConvKernelImplByIm2Col between ConvKernelIf and ConvKernel; compile done, to be run * fix: add explicit template instantiation for ConvKernelUtil * refine: remove unused class function declaration: KernelInitWithoutCudnn e.g. * fix(operator/conv_op.cpp): make sure UseCudnnOnGpu() == true when infer cudnn algo * refine(kernel/conv_kernel.cu): let the gpu kernel function be inside the anoymous namespace * refactor: add dim_num as the template paramter of Im2ColGpu() and Col2ImGpu() * refactor: add is_channel_first as the template paramter of Im2ColGpu() and Col2ImGpu() * refine(kernel/conv_kernel.cu): add #undef IM2COL_FUNC_CALL * refine(kernel/conv_kernel.cu): add dim_num as the template parameter of InitSharedMemory() * fix(kernel/conv_kernel.cu): fix the bug of use col_offset in Im2ColGpu()
-
- Jun 12, 2018
-
-
chengtbf authored
* refine regst manager * refine regst mananger version 2 * remove memInfo * refine code and reduce rows * using protobuf message differencer * explicit * memory allocator free memory * add regst desc proto list init for test code
-
qicosmos authored
* remove cplusplus_17.h, it's no need now. * improve make_unique * make macros together * remove OF_CALL_ONCE macro * remove macros in ctrl_server * -Wno-unused-function * remove useless header
-
Jinhui Yuan authored
-
- Jun 09, 2018
-
-
Jinhui Yuan authored
-
Yi Zhu authored
* refine vgg proto && add report * refine proto * add proto for 2 machine * add proto for 2 machine 4 GPU and measurement of reduce time for 2 machine 8 GPU * refine protos * remove comment
-
jiyuan authored
-
jiyuan authored
-
- Jun 08, 2018
-
-
Li Xinqi authored
* out delay regst * no delay edge to kMdSave task
-
- Jun 05, 2018
-
-
willzhang4a58 authored
-
Yi Zhu authored
-
Niu Chong authored
-
chengtbf authored
-
- Jun 04, 2018
-
-
chengtbf authored
-
Li Xinqi authored
-
willzhang4a58 authored
-
willzhang4a58 authored
-
willzhang4a58 authored
-
- Jun 01, 2018
-
-
willzhang4a58 authored
-
willzhang4a58 authored
-
willzhang4a58 authored
-
- May 30, 2018
-
-
willzhang4a58 authored
-
Yi Zhu authored
-
willzhang4a58 authored
-
- May 29, 2018
-
-
willzhang4a58 authored
-
willzhang4a58 authored
-
Yi Zhu authored
* fix bug when elem cnt in model < split num * add checker
-
- May 28, 2018
-
-
willzhang4a58 authored
-
willzhang4a58 authored
-
Jinhui Yuan authored
* let opencv depend on libjpeg-turbo * fix libjpeg-turbo dependency
-
leaves-zwx authored
* remove copy by itor * use already existed copy method * delete forward field override and use CopyField instead * remove needless
-
Jinhui Yuan authored
* use libjpeg-turbo * refine format
-
- May 25, 2018
- May 24, 2018
-
-
willzhang4a58 authored
-
willzhang4a58 authored
-
chengtbf authored
* set regst max=min when same stream * move fix regst num to task node
-