- Jul 19, 2021
-
-
liufengwei0103 authored
* modified SetInputArgModifyFn * Delete the CHECK changes in the assign_op.cpp file * Format * Modified the OutputArgModifyFn interface * add return * maybe error stack from CheckAndConstructOp to OutputArgModifier callback function * maybe error stack from CheckAndConstructOp to OutputArgModifier callback function * OutputArgModifier return maybe part_1 * maybe error stack from CheckAndConstructOp to OutputArgModifier callback function * input_arg_modifier return maybe * gen_bw_fn return maybe * bw_gen_fn return maybe * registry_callback_fn return maybe * fix bug after merge master * fix bug Co-authored-by:
aishangjj <702572275@qq.com>
-
- Jul 18, 2021
-
-
liufengwei0103 authored
* modified SetInputArgModifyFn * Delete the CHECK changes in the assign_op.cpp file * Format * Modified the OutputArgModifyFn interface * add return * maybe error stack from CheckAndConstructOp to OutputArgModifier callback function * maybe error stack from CheckAndConstructOp to OutputArgModifier callback function * OutputArgModifier return maybe part_1 * maybe error stack from CheckAndConstructOp to OutputArgModifier callback function * input_arg_modifier return maybe * gen_bw_fn return maybe * bw_gen_fn return maybe * fix bug: return Maybe without JUST Co-authored-by:
aishangjj <702572275@qq.com>
-
- Jul 16, 2021
-
-
liufengwei0103 authored
* modified SetInputArgModifyFn * Delete the CHECK changes in the assign_op.cpp file * Format * Modified the OutputArgModifyFn interface * add return * maybe error stack from CheckAndConstructOp to OutputArgModifier callback function * maybe error stack from CheckAndConstructOp to OutputArgModifier callback function * OutputArgModifier return maybe part_1 * maybe error stack from CheckAndConstructOp to OutputArgModifier callback function * input_arg_modifier return maybe * change lambda for JUST macro * fix conflicts Co-authored-by:
aishangjj <702572275@qq.com> Co-authored-by:
oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
-
- Jul 10, 2021
-
-
luqiang guo authored
* Refactor infer ctx input tensordesc and related calls * Modify the missing content * Modify the post-merged error * Add input judgment * Missing modifications * Fix merge errors * format * Modify omission * Fix the crash caused by lack of judgment * Delete comment * Optimize naming * Delete comment * Delete blank lines Co-authored-by:
liufengwei0103 <2472937968@qq.com> Co-authored-by:
oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
-
- Jun 18, 2021
-
-
Tianyu Zhao authored
* Add interface InferContext::OutputTensorDesc * auto format by CI * Implement 'OutputTensorDesc' for 'UserOpExprInferContext' Co-authored-by:
oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by:
oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
-
- Apr 27, 2021
-
-
Hongsheng Wang authored
* Rename InferXXXFn as XXXInferFn * rename interface_name * modify infer_xxx_fn to xxx_infer_fn Co-authored-by:
binbinHan <han_binbin@163.com> Co-authored-by:
oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
-
- Apr 22, 2021
-
-
binbinHan authored
* remove_user_op_conf_in_kernel_compute_ctx * remove_user_op_conf_in_kernel_infer_ctx * remove_user_op_conf_in_kernel_init_ctx * remove_user_op_conf_in_kernel_create_ctx * remove user_op_conf in UserKernelOpInferContext * solve KernelComputeContext in nvtx_range_kernel.c * remove user_op_conf in InferContext and slove bug after pull master * del useless code * remove attrs_ to derived class * optimize * del useless code * use Attr() instead of attr() * remove attr() in ctx * minor fix Co-authored-by:
oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
-
- Apr 08, 2021
-
-
Yurui Li authored
* add blob object id and eager blob object interface * use maybe for blob object id * minor fix * add registry for data type, modified dropout as example * revert origin data type infer * add infer data type entry * minor fix infer data type * user op 32~40 * add unchange data type * user op 26~40 * fix inferDataType from 0-25 * remove infer data type from util no change * fix is_dynamic attribute * user op 70~80 * user op 66~70 * remove redundant dtype * minor fix one hot * move data infer interface to user op * review and fix * Zailiang_Make infer data type away from tensor desc (#4544) * same_padding_op.cpp * scalar_add_op.cpp * scalar_by_tensor_op.cpp * scalar_mul_op.cpp * scalar_pow_op.cpp * Sigmoid_cross_entropy_op.cpp * slice_op.cpp * smooth_l1_loss_op.cpp * softmax_cross_entropy_op.cpp * Update scalar_by_tensor_op.cpp * sigmoid_op.cpp * Dev split infer data type from tensor desc (81~95) (#4545) * Splite InferDataType form 81 to 95 * Add is_dynamic * fix relu * Fix relu_op * Add Reduce ops InferDataType * start up (#4539) * start up * 121~128 done * refine * edit test_ops.cpp * SetPhysicalTensorDescInferFn * finished * reformat * remove reduntant code * upsampling * test_ops Co-authored-by:
poohRui <yuruil@qq.com> Co-authored-by:
Yurui Li <32978179+poohRui@users.noreply.github.com> * Dev refactor split infer data type (#4547) * softmax_op.cpp * 107:sort_op.cpp * 106-115 * 108-109 * 110 * 111 * 112 * 113 * 115 * 106-115 * modify 106-115 * modify assign * zzk fix Co-authored-by:
MARD1NO <359521840@qq.com> Co-authored-by:
Yurui Li <32978179+poohRui@users.noreply.github.com> * fix compile bug * warning fix * user op 41-80 (#4540) * user op 41-50 * user op 51-55 * fix image_object_preprocess ops * user op 56-65 * update * update * update * fix a bug * fix bug * fix bug * fix devconv bug * fix multiple bugs * make tensor buffer to tesor list as todo * add impl for tensor buffer to tensor list * remove data type from infer shape in matmul * add check * fix transpose bug * fix test op bug * add check for add n * fix cast to static shape op bug * fix dynamic loss * remove unnecessary code in flatten op * fix hierarchical parallel * fix leaky relu op * fix math binary broadcast * remove unnecessary data type infer in math binary * fix terrible bug in math binary * format * fix softmax bug * remove unnecessary code for softmax cross entropy * fix conv * remove unnecessary code in maximum * add comments * fix image preprocess * fix layer norm * auto to auto& * fix partial fc sample * fix partial fc sample * fix image preprocess ops * format * user sigmoid op * format * fix partial fc * fix prelu * add newer ops * fix bug in math binary * fix combined margin loss op * fix ofrecord reader * add fused bias add op * format Co-authored-by:
MARD1NO <359521840@qq.com> Co-authored-by:
Zailiang <zailiangyu@gmail.com> Co-authored-by:
Zhenhua <1209435+hengzi@users.noreply.github.com> Co-authored-by:
doombeaker <later@usopp.net> Co-authored-by:
Hongsheng Wang <31394900+wanghongsheng01@users.noreply.github.com> Co-authored-by:
Mosout <mosout@163.com> Co-authored-by:
oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
-
- Mar 03, 2021
-
-
guo ran authored
* bn_add_relu SetLogicalTensorDescInferFn * fix * refine Co-authored-by:
oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
-
- Feb 24, 2021
-
-
cheng cheng authored
* Remove batch_axis * rm partial tick * temp fix bug of flow.math.add * fix bug of input_blob_def.split_axis init * [KEY] refine infer sbp order value consider logical shape enable split * ignore op get sbp sign ERROR. * filter and check valid sbp sign by logical shape; rm magic num * fix bug of return * merge rm sigmoid cross entropy op * rm sigmoid batch axis fn * more debug log for check valid in get sbp sign * rm useless check Co-authored-by:
oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
-
- Oct 27, 2020
-
-
OuYang Yu authored
* refactor user_op_registry Attr * Replace the User Op Registry Attr with a new interface (#3714) * replace user op attr Co-authored-by:
lixinqi <lixinqi0703106@163.com>
-
- Oct 05, 2020
-
-
guo ran authored
* bn_add_relu use bit mask * refine * refine
-
- Sep 04, 2020
-
-
Juncheng authored
* FuseAddToOutput * change default value of enable_fuse_add_to_output to false * consider reachability * supported_op_type_name2output_bn=>supported_op_type_name2output_arg * Add necessary comments * refine * fix Co-authored-by:
guo ran <360112263@qq.com>
-
- Aug 31, 2020
-
-
Juncheng authored
* Fused BatchNormAddRelu * skip if cpu * d_addend=>addend_diff
-
- Aug 03, 2020
- Jul 31, 2020
-
-
daquexian authored
-
- Jul 29, 2020
-
-
strint authored
* #3196 refactor user op c++ register draft * #3196 refactor op register c++ api part2 * #3196 refine interface * #3196 user new get * #3196 rename * #3196 finish Op reg * #3196 finish Op grad reg * #3196 refine * #3196 op kernel reg part 1 * #3196 op kernel reg part 2 * #3196 op kernel reg part 3 * #3196 op kernel reg part 4 * #3196 op kernel reg part 5 fix * #3196 compile & test pass * #3196 merge branch develop * #3196 rm useless code * use new api to rewrite batch norm grad graph * use new api to rewrite batch norm grad graph fix typo * use new api to rewrite batch norm grad graph fix typo * use new api to rewrite where * implement new interface * finish BackwardOpConfContext * finish first version bw_op_gen_context & fw_op interface * refine lambda fn & interface * refine lambda var capture in func * check batch norm & add comment * batch norm & where finish & pass test * refine interface of builder fn * refactor relu grad * rm relu grad Co-authored-by:
Li Xinqi <lixinqi2010@gmail.com>
-
- Jul 23, 2020
-
-
Shenghang Tsai authored
* add license at root dir * check in empty files * rm space * check in script * update script * fix bug * add print * fix * add exit * add to of_format * add CI task * fix license * Revert "fix license" This reverts commit 818b6d7691d3a8b4a25dd41a47ff2c5922b8ec57. * only add once * quick fix * fix script * dont fmt empty file * fix * quick fix * fix py * add license * fix exit * add license for hpp * add license * license new vm files Co-authored-by:
tsai <caishenghang@oneflow.org>
-
- Jul 15, 2020
-
-
Li Xinqi authored
* rename Async2sync to Await * oneflow.eager_fixed_placement * fix a bug in InstructionBuilder._StatelessCall * do not panic in VirtualMachine::ForEachMutMirroredObject * GetMirroredObject * add instruction ReplaceMirrored * hob.is_current_machine_master * rename ForeignWorkerWatcher to ForeignWorkerCallback (#2935) * EagerJobBuildAndInferCtx * refactor api_oneflow_function * eager_oneflow_global_function * fix a bug in Session.Init() * interpreter.EagerRun * JobBuildAndInferCtx::GetMirroredOpParallelConf * JobBuildAndInferCtx:GetMirroredOpName * chanege the JobBuildAndInferCtx::Complete function to the pure virtual * refactor EagerJobBuildAndInferCtx::Complete() * LazyConsistentBlob/LazyMirroredBlob * EagerRemoteBlob * eager logical blob * rename symbol_dict to symbol_cache; rename object_dict to object_cache * refactor EagerLogicalBlob with instruction ReplaceMirror * test_eager_logical_blob * get_eager_variable * free stashed variable blob before close session * Quick fix eager (#2964) * fix EagerPhysicalBlob * fix lazy Co-authored-by:
tsai <caishenghang@oneflow.org> * DeprecatedStatelessCallOpKernel * python wrapper for DeprecatedStatelessCall * Fix compile error (#2967) * fix compile error * modify by comment * rename unused cuda copy instruction * remove OneflowVM<kMaster> * stream_tag for StatelessCallOpKernel/DeprecatedStatelessCallOpKernel * quick fix is train in hob (#2971) Co-authored-by:
tsai <caishenghang@oneflow.org> * CudaHostRegisterBlob (#2972) * CudaHostRegisterBlob * CudaHostUnRegisterBlob * 1) gpu.copy_h2d.*; 2) gpu.copy_d2h.* * amend CudaHostUnregisterBlob Co-authored-by:
ouyangyu <xuanjiuye@gmail.com> * CudaHostAllocator for CudaCopyD2HStreamType * eager_copy * fix bugs in eager oneflow.copy * oneflow.copy * autograd for CopyOp * move MakeCopyInstructionBuilderFunction from copy.py to eager/vm_util.py * auto copy_hd * oneflow.system.assign * single device version model_init * more assert in CurJobAddMirroredOp * BroadcastReference * rename BroadcastReference to BroadcastObjectReference * InstructionBuilder.BroadcastBlobReference * remove RAIIBlobObject; add class BlobObject * oneflow.python.eager.boxing_util * rename DeprecatedXXX to SystemXXX * vm_util.InstructionBuilder cares nothing about logical_blob_name * refactor eager oneflow.get_variable and eager oneflow.system.assign * OneToManyBroadcastBlobReference * no panic in GenerateBackwardOpConfIf * ConsumedByGradientOp * fix Global<ResourceDesc> bug and gradient_function_not_found panic * implement gradient_util.* * fix misuse bug of CHECK_NOTNULL * add pass AutoTrainStep and AutoLearningRate for eager execution * replace ibn with ibn_prefix in StatelessCallOpKernelInstruction * delete unused bn_in_op index * object_cache.BnInOp2BlobObjectScope * refactor ModelInitOpConf * refactor InstructionBuilder.StatelessCall * refactor InstructionBuilder._SystemStatelessCall * fuse UserStatelessCall and SystemStatelessCall * Operator::GetOpAttributeWithoutOpNameAndLbn * always pass ParallelConf to InstructionBuilder.StatelessCall * refactor eager_get_variable * refactor python functions about OpAttribute * EagerCastToMirrored * OpArgAttribute * refactor interpreter_callback.Interpret * refactor FetchDelegateBlob * eager backward interpreter * BlobRegister * refactor interpret_callback * refactor gradient_util * put eager variable blob object into backward blob register * refactor interpreter_callback.Interpret * gradient_util.ReleaseUnusedBlobObject * Fix opkernel_instruction_type_test and remove machine_id2dev_phy_ids_ (#3047) * EagerRunBackwardOps * eager train demo * refactor interpreter_callback * remove unused foreign callback apis * interpret_callback.FindOrCreateVarBlobObject * rename OpArgAttribute to OpArgParallelAttribute * refactor EagerConsistentBlob/EagerMirroredBlob * interpret completed variable op * bugfix: mv oneflow.function to oneflow.global_function * fix the bug abount twice called global_function * refactor NormalModelUpdateKernel::Forward to being called by eager execution * refactor Kernel::Forward to the public one * refactor system kernel for being compatible to eager execution * 1) boxing_util.TryBroadcastOneToMany; 2) vm_util.BoxingStatelessCall * 1) refactor class Maybe and class Error; 2) fix boxing_util.TryBroadcastOneToMany * more test case for eager executed model_init * fix CastToMirrored::InferSbpSignature and CastFromMirrored::InferSbpSignature * merge develop * merge develop * boxing_util.TrySingleDeviceBoxing * op_executor * refactor boxing_util with boxing_hob * more boxing methods * GetEnvDefaultParallelConf * boxing_util.NcclAllReduce * OneflowVm::TryReceiveAndRun * Remove class ForeignWorkerCallback (#3097) * Remove class ForeignWorkerCallback * Add test_2d_gpu_variable * framework/register_python_callback.py * rename EagerModelForward to EagerForward * boxing_hob.MasterMachineOnly * boxing_util.ComposeBoxing * HobContextAttr * boxing_util.BroadcastManyToOne * boxing_util.NoBoxing * merge develop * c_api_util.InferOpConf * ReplaceBlobParallelDesc * rename local variables * refine boxing_util.BoxingTo * boxing_util.NcclAllReduce * support non broadcast paralleled variable * FillLogicalBlobDescSignature in InferOpConf * NaiveCpuConcatSplit * fix SystemOpKernelObject::ResetKernel * RwMutexedObject::Get returns Maybe<const T*> instead of const T& * CHECK_OK * 1) boxing_middle; 2) boxing_util.RefBlobObjectWithParallelDesc * more boxing methods composed with NaiveCpuConcatSplit * boxing_util.CpuManyOneToOne * Scope * 1) refactor vm::SymbolStorage; 2) Scope * refactor GetOpAttribute4OpConf * update session.Scope when new name_scope constructed * refactor c_util.InferOpConf * OperatorConf.scope_symbol_id * refactor AddAndInferConsistentOp/AddAndInferMirroredOp * fix get_variable * global function input output (#3065) * eager return * update test * update output * global function input base test pass * update test * fix some issues * EagerConsistentBlob return * merge dev_eager * refactor EagerConsistentBlob.numpy(...) * minor update * refactor.ModeScope * refactor GetOpAttribute4OpConf * fix unittest using numpy_mirrored_list Co-authored-by:
lixinqi <lixinqi0703106@163.com> * eager oneflow.watch * Rename symbol_cache.py to symbol_storage.py (#3138) * eager watch_diff * more eager tests * Remove duplicate function GetParallelContext * 1) add FeedContext; 2) remove LocalFixedTensor * add instruction FeedBlob * rename: WatchBlob => FetchBlob * oneflow.env.enable_eager_environment * IsCpuOnly * fix eager push_util bugs * code format * reformat * ArgBlobDef.SetBatchAxisAndSplitAxis * CallOpkernel instruction family add argument SbpSignature * refactor remote eager blob * refactor InitGlobalCudaDeviceProp * fix InterfaceOpUtil * recusive call MakeEagerInputBlobs * copy returned blob in train job * 1) refactor UserStatelessCallOpKernel; 2) replace Global<ThreadMgr>::Get()->compute_thread_pool() with Global<ThreadPool>::Get() * fix gpu argwhere * refactor BlobObject::header_buffer_ * interpret_util.ConsistentInterpret * return more debug messages when encountering mixed consistent/mirrored error * TryMirroredCastTotalLossInstanceNum * replace compile_context.CurJobAddOp with interpret_util.Forward * check backward timeline * add scope to return (#3156) * add scope to return * more elegant * Dev eager merge develop (#3157) * skip empty stream (#3141) * skip empty stream * skip empty stream * add tbs for gdb in docker (#3139) * add tbs for gdb in docker * add more desc * Fix CUDNN_STATUS_NOT_SUPPORTED error for bn (#3147) * Fix CUDNN_STATUS_NOT_SUPPORTED error for bn * always use nchw when training * fix xla cmake arg (#3144) Co-authored-by:
guoran <guoran@oneflow.org> Co-authored-by:
Juncheng <liujuncheng1022@gmail.com> * Autograd use user op (#3151) * Use scalar_mul user op * IndexedSlicesOptimizerRewritePass use scalar_mul user op * scalar_sub_by_tensor * broadcast_div=>scalar_div * fix name * fix int_operand * fix diff_lbi * install python pkgs from dev-requirements.txt when running CI (#3121) * Hotfix multi machine vm panic (#3153) * fix multi machine vm panic * fix compile bug * fix vm unittest bug (#3155) * Fix test cases Co-authored-by:
Li Xinqi <lixinqi2010@gmail.com> Co-authored-by:
Shenghang Tsai <jackalcooper@gmail.com> Co-authored-by:
Juncheng <liujuncheng1022@gmail.com> Co-authored-by:
guo ran <360112263@qq.com> Co-authored-by:
guoran <guoran@oneflow.org> Co-authored-by:
daquexian <daquexian566@gmail.com> * ParallelSignature * fix normalization grad ops timeline * Dev eager fix assign op (#3160) * Fix assign op * Add enable_if to assign api * Dev eager merge develop branch (#3164) * fix vm unittest bug (#3155) * Support BN Ex Operation (#3154) * Hotfix CUDNN_STATUS_NOT_SUPPORTED error (#3162) * xrt support user op (#3152) * xrt support user op * xla add Sole func * tensorrt Sole func * fix Co-authored-by:
Juncheng <liujuncheng1022@gmail.com> Co-authored-by:
Li Xinqi <lixinqi2010@gmail.com> Co-authored-by:
Juncheng <liujuncheng1022@gmail.com> Co-authored-by:
guo ran <360112263@qq.com> * refactor BoxingHobContext * fix FixedTensorDef * refactor boxing_util.BroadcastManyToOne and boxing_util.BroadcastOneToMany * refactor eager boxing * replace compile_context.CurJobAddOp with interpret_util.Forward * boxing verbose * Call TryClearObject4BlobName in EagerConsistentBlob.__del__ * fix instruction CudaHostRegisterBlob * new boxing method B -> S * blob_register.RegisteredBlobAccess * fix the use of enable_if (#3179) * add boxing P->B and P->S * fix eager oneflow.assign * Tensor list input and output of eager global function (#3181) * input tensor list * test input * output tensor list and update test * fix op watch (#3182) * fix op watch * optimized code * Math binary elementwise ops (#3169) (#3184) * math binary elementwise ops * implement of math binary elementwise gpu floating kernel * implement math binary elementwise cpu kernel; add test scripts * rm note Co-authored-by:
guo ran <360112263@qq.com> Co-authored-by:
cheng cheng <472491134@qq.com> Co-authored-by:
guo ran <360112263@qq.com> * rename test_gather* to test_agather* (#3191) * Dev eager merge develop (#3192) * Math binary elementwise ops (#3169) * math binary elementwise ops * implement of math binary elementwise gpu floating kernel * implement math binary elementwise cpu kernel; add test scripts * rm note Co-authored-by:
guo ran <360112263@qq.com> * Remove multiply/gelu/tanh system op (#3183) * Remove multiply system op * Remove gelu/tanh system op * fix * Remove layer_norm/slice system op (#3180) * Remove layer_norm system op * Remove slice system op * Remove scalar_add/scalar_mul system op (#3189) * Remove useless system op (#3190) * Remove axpy system op * Remove print system op * Remove reduce_mean system op * Remove local_response_normalization system op * cleanup kernel.proto * Remove dot system op * Remove maximum system op * cleanup TopKOpConf * cleanup op_conf * remove TryUpdtBnVal4SepcialOpConf * fix xrt print Co-authored-by:
cheng cheng <472491134@qq.com> Co-authored-by:
guo ran <360112263@qq.com> Co-authored-by:
Juncheng <liujuncheng1022@gmail.com> * remove test_eager.py (#3193) Co-authored-by:
ouyangyu <xuanjiuye@gmail.com> Co-authored-by:
OuYang Yu <xuanjiuye@gmail.com> Co-authored-by:
Shenghang Tsai <jackalcooper@gmail.com> Co-authored-by:
tsai <caishenghang@oneflow.org> Co-authored-by:
leaves-zwx <kunta0932@gmail.com> Co-authored-by:
Juncheng <liujuncheng1022@gmail.com> Co-authored-by:
guo ran <360112263@qq.com> Co-authored-by:
guoran <guoran@oneflow.org> Co-authored-by:
daquexian <daquexian566@gmail.com> Co-authored-by:
cheng cheng <472491134@qq.com>
-
- Jul 02, 2020
-
-
Juncheng authored
* Add identity op to bn grad * rename identity to diff_identity * SetTensorDescInferFn use TensorDesc * revert flow.identity api
-
- Jun 02, 2020
-
-
daquexian authored
* support fp16 in bn and add tests * add bn in gray_list * reformat * fix dtype in test * add missing import * set dtype in FixedTensorDef in tests * add nn.batch_normalization, update python code, fix backward in fp16 non-training mode * cast back to fp32 in tests * watch_diff on fp32 blob, relax the atol, add notes on nn.bn * add more tests and handle negative axis in nn.bn * add comments for fp32 params * add fp16 trainable && not training test * skip bn fp16 test in non-user-op mode * register no cast registry for bn * update amp batch_axis check * address comments * reformat Co-authored-by:
guo ran <360112263@qq.com> Co-authored-by:
Juncheng <liujuncheng1022@gmail.com>
-
- May 19, 2020
-
-
OuYang Yu authored
* Unified atter * code format * The blank space * replace GetAttr to Attr * replace Attr with SetAttr *.py * code format Co-authored-by:
Shenghang Tsai <jackalcooper@gmail.com>
-
- May 17, 2020
-
-
daquexian authored
* [WIP] add normalization user op * Fix no input diff problem, thank to @liujuncheng * stash * migrate scalar_add to user op * (not tested) support is_training=False * unify cpu and gpu kernels * update test_util.py to support non-training test * fix test_util.py * Add trainable op attr and training python param, rename is_training to training, and other minor changes * test bn inference * format * fix getsbpfn * restore test_util.py * update comments * update comments * remove trainable attr in c++ * use constant when not trainable * Add high epsilon test case * Fix wrong sbp * add new method, update the template * use the new test_global_storage and CHECK_OR_RETURN api * rename REGISTER_KERNEL macro * make some vars const * use new sbp builder api * format * use new input modifier api, add inferandtryrun, refine the test * add the missing JUST * remove the unnecessary copy * give grad ops a descriptive name * format * support trainable=True and training=False * format * rename some vars * skip trainable_without_training test for non-user-op mode * add CUDNN_BN_MIN_EPSILON check * wrap CUDNN_BN_MIN_EPSILON check with WITH_CUDA * remove unnecessary dx check and shape assignment * remove c++ attr center and scale * remove duplicated 'import os' * rename in->x, out->y * check tensor_desc is not nullptr in SetParamTensorDesc * set 'trainble' of moving_mean and moving_variance to False, wrap the non-user-op implementation with if-else * fix the inconsistent trainable parameter * rename out_desc->y_desc * skip adding grad_op when not necessary * remove the comments about CHECK_NE_OR_RETURN * reformat * rename out_grad_* => dy_* * move kernel into anonymous namespace Co-authored-by:
Shenghang Tsai <jackalcooper@gmail.com>
-