Skip to content
Snippets Groups Projects
  1. Jul 19, 2021
    • liufengwei0103's avatar
      registry_callback_fn return maybe (#5456) · 1e351cf8
      liufengwei0103 authored
      
      * modified SetInputArgModifyFn
      
      * Delete the CHECK changes in the assign_op.cpp file
      
      * Format
      
      * Modified the OutputArgModifyFn interface
      
      * add return
      
      * maybe error stack from CheckAndConstructOp to OutputArgModifier callback function
      
      * maybe error stack from CheckAndConstructOp to OutputArgModifier callback function
      
      * OutputArgModifier return maybe part_1
      
      * maybe error stack from CheckAndConstructOp to OutputArgModifier callback function
      
      * input_arg_modifier return maybe
      
      * gen_bw_fn return maybe
      
      * bw_gen_fn return maybe
      
      * registry_callback_fn return maybe
      
      * fix bug after merge master
      
      * fix bug
      
      Co-authored-by: default avataraishangjj <702572275@qq.com>
  2. Jul 18, 2021
    • liufengwei0103's avatar
      bw_gen_fn return maybe (#5455) · bd2d3dc2
      liufengwei0103 authored
      
      * modified SetInputArgModifyFn
      
      * Delete the CHECK changes in the assign_op.cpp file
      
      * Format
      
      * Modified the OutputArgModifyFn interface
      
      * add return
      
      * maybe error stack from CheckAndConstructOp to OutputArgModifier callback function
      
      * maybe error stack from CheckAndConstructOp to OutputArgModifier callback function
      
      * OutputArgModifier return maybe part_1
      
      * maybe error stack from CheckAndConstructOp to OutputArgModifier callback function
      
      * input_arg_modifier return maybe
      
      * gen_bw_fn return maybe
      
      * bw_gen_fn return maybe
      
      * fix bug: return Maybe without JUST
      
      Co-authored-by: default avataraishangjj <702572275@qq.com>
  3. Jul 16, 2021
  4. Jul 10, 2021
  5. Jun 18, 2021
  6. Apr 27, 2021
  7. Apr 22, 2021
    • binbinHan's avatar
      Remove user op conf in kernel init ctx (#4659) · 8f96b06d
      binbinHan authored
      
      * remove_user_op_conf_in_kernel_compute_ctx
      
      * remove_user_op_conf_in_kernel_infer_ctx
      
      * remove_user_op_conf_in_kernel_init_ctx
      
      * remove_user_op_conf_in_kernel_create_ctx
      
      * remove user_op_conf in UserKernelOpInferContext
      
      * solve KernelComputeContext in nvtx_range_kernel.c
      
      * remove user_op_conf in InferContext and slove bug after pull master
      
      * del useless code
      
      * remove attrs_ to derived class
      
      * optimize
      
      * del useless code
      
      * use Attr() instead of attr()
      
      * remove attr() in ctx
      
      * minor fix
      
      Co-authored-by: default avataroneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
  8. Apr 08, 2021
    • Yurui Li's avatar
      Make infer data type away from tensor desc (#4536) · be319893
      Yurui Li authored
      
      * add blob object id and eager blob object interface
      
      * use maybe for blob object id
      
      * minor fix
      
      * add registry for data type, modified dropout as example
      
      * revert origin data type infer
      
      * add infer data type entry
      
      * minor fix infer data type
      
      * user op 32~40
      
      * add unchange data type
      
      * user op 26~40
      
      * fix inferDataType from 0-25
      
      * remove infer data type from util no change
      
      * fix is_dynamic attribute
      
      * user op 70~80
      
      * user op 66~70
      
      * remove redundant dtype
      
      * minor fix one hot
      
      * move data infer interface to user op
      
      * review and fix
      
      * Zailiang_Make infer data type away from tensor desc (#4544)
      
      * same_padding_op.cpp
      
      * scalar_add_op.cpp
      
      * scalar_by_tensor_op.cpp
      
      * scalar_mul_op.cpp
      
      * scalar_pow_op.cpp
      
      * Sigmoid_cross_entropy_op.cpp
      
      * slice_op.cpp
      
      * smooth_l1_loss_op.cpp
      
      * softmax_cross_entropy_op.cpp
      
      * Update scalar_by_tensor_op.cpp
      
      * sigmoid_op.cpp
      
      * Dev split infer data type from tensor desc (81~95) (#4545)
      
      * Splite InferDataType form 81 to 95
      
      * Add is_dynamic
      
      * fix relu
      
      * Fix relu_op
      
      * Add Reduce ops InferDataType
      
      * start up (#4539)
      
      * start up
      
      * 121~128 done
      
      * refine
      
      * edit test_ops.cpp
      
      * SetPhysicalTensorDescInferFn
      
      * finished
      
      * reformat
      
      * remove reduntant code
      
      * upsampling
      
      * test_ops
      
      Co-authored-by: default avatarpoohRui <yuruil@qq.com>
      Co-authored-by: default avatarYurui Li <32978179+poohRui@users.noreply.github.com>
      
      * Dev refactor split infer data type (#4547)
      
      * softmax_op.cpp
      
      * 107:sort_op.cpp
      
      * 106-115
      
      * 108-109
      
      * 110
      
      * 111
      
      * 112
      
      * 113
      
      * 115
      
      * 106-115
      
      * modify 106-115
      
      * modify assign
      
      * zzk fix
      
      Co-authored-by: default avatarMARD1NO <359521840@qq.com>
      Co-authored-by: default avatarYurui Li <32978179+poohRui@users.noreply.github.com>
      
      * fix compile bug
      
      * warning fix
      
      * user op 41-80 (#4540)
      
      * user op 41-50
      
      * user op 51-55
      
      * fix image_object_preprocess ops
      
      * user op 56-65
      
      * update
      
      * update
      
      * update
      
      * fix a bug
      
      * fix bug
      
      * fix bug
      
      * fix devconv bug
      
      * fix multiple bugs
      
      * make tensor buffer to tesor list as todo
      
      * add impl for tensor buffer to tensor list
      
      * remove data type from infer shape in matmul
      
      * add check
      
      * fix transpose bug
      
      * fix test op bug
      
      * add check for add n
      
      * fix cast to static shape op bug
      
      * fix dynamic loss
      
      * remove unnecessary code in flatten op
      
      * fix hierarchical parallel
      
      * fix leaky relu op
      
      * fix math binary broadcast
      
      * remove unnecessary data type infer in math binary
      
      * fix terrible bug in math binary
      
      * format
      
      * fix softmax bug
      
      * remove unnecessary code for softmax cross entropy
      
      * fix conv
      
      * remove unnecessary code in maximum
      
      * add comments
      
      * fix image preprocess
      
      * fix layer norm
      
      * auto to auto&
      
      * fix partial fc sample
      
      * fix partial fc sample
      
      * fix image preprocess ops
      
      * format
      
      * user sigmoid op
      
      * format
      
      * fix partial fc
      
      * fix prelu
      
      * add newer ops
      
      * fix bug in math binary
      
      * fix combined margin loss op
      
      * fix ofrecord reader
      
      * add fused bias add op
      
      * format
      
      Co-authored-by: default avatarMARD1NO <359521840@qq.com>
      Co-authored-by: default avatarZailiang <zailiangyu@gmail.com>
      Co-authored-by: default avatarZhenhua <1209435+hengzi@users.noreply.github.com>
      Co-authored-by: default avatardoombeaker <later@usopp.net>
      Co-authored-by: default avatarHongsheng Wang <31394900+wanghongsheng01@users.noreply.github.com>
      Co-authored-by: default avatarMosout <mosout@163.com>
      Co-authored-by: default avataroneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
  9. Mar 03, 2021
  10. Feb 24, 2021
    • cheng cheng's avatar
      Remove batch_axis (#4238) · d5b2fee2
      cheng cheng authored
      
      * Remove batch_axis
      
      * rm partial tick
      
      * temp fix bug of flow.math.add
      
      * fix bug of input_blob_def.split_axis init
      
      * [KEY] refine infer sbp order value consider logical shape enable split
      
      * ignore op get sbp sign ERROR.
      
      * filter and check valid sbp sign by logical shape; rm magic num
      
      * fix bug of return
      
      * merge rm sigmoid cross entropy op
      
      * rm sigmoid batch axis fn
      
      * more debug log for check valid in get sbp sign
      
      * rm useless check
      
      Co-authored-by: default avataroneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
  11. Oct 27, 2020
  12. Oct 05, 2020
  13. Sep 04, 2020
    • Juncheng's avatar
      FuseAddToOutput (#3524) · fd2fe579
      Juncheng authored
      
      * FuseAddToOutput
      
      * change default value of enable_fuse_add_to_output to false
      
      * consider reachability
      
      * supported_op_type_name2output_bn=>supported_op_type_name2output_arg
      
      * Add necessary comments
      
      * refine
      
      * fix
      
      Co-authored-by: default avatarguo ran <360112263@qq.com>
  14. Aug 31, 2020
  15. Aug 03, 2020
  16. Jul 31, 2020
  17. Jul 29, 2020
    • strint's avatar
      Refactor user op c++ grad register api (#3321) · d0931f35
      strint authored
      
      * #3196 refactor user op c++ register draft
      
      * #3196 refactor op register c++ api part2
      
      * #3196 refine interface
      
      * #3196 user new get
      
      * #3196 rename
      
      * #3196 finish Op reg
      
      * #3196 finish Op grad reg
      
      * #3196 refine
      
      * #3196 op kernel reg part 1
      
      * #3196 op kernel reg part 2
      
      * #3196 op kernel reg part 3
      
      * #3196 op kernel reg part 4
      
      * #3196 op kernel reg part 5 fix
      
      * #3196 compile & test pass
      
      * #3196 merge branch develop
      
      * #3196 rm useless code
      
      * use new api to rewrite batch norm grad graph
      
      * use new api to rewrite batch norm grad graph fix typo
      
      * use new api to rewrite batch norm grad graph fix typo
      
      * use new api to rewrite where
      
      * implement new interface
      
      * finish BackwardOpConfContext
      
      * finish first version bw_op_gen_context & fw_op interface
      
      * refine lambda fn & interface
      
      * refine lambda var capture in func
      
      * check batch norm & add comment
      
      * batch norm & where finish & pass test
      
      * refine interface of builder fn
      
      * refactor relu grad
      
      * rm relu grad
      
      Co-authored-by: default avatarLi Xinqi <lixinqi2010@gmail.com>
  18. Jul 23, 2020
    • Shenghang Tsai's avatar
      Dev apache2 license (#3266) · d0bdbd5d
      Shenghang Tsai authored
      
      * add license at root dir
      
      * check in empty files
      
      * rm space
      
      * check in script
      
      * update script
      
      * fix bug
      
      * add print
      
      * fix
      
      * add exit
      
      * add to of_format
      
      * add CI task
      
      * fix license
      
      * Revert "fix license"
      
      This reverts commit 818b6d7691d3a8b4a25dd41a47ff2c5922b8ec57.
      
      * only add once
      
      * quick fix
      
      * fix script
      
      * dont fmt empty file
      
      * fix
      
      * quick fix
      
      * fix py
      
      * add license
      
      * fix exit
      
      * add license for hpp
      
      * add license
      
      * license new vm files
      
      Co-authored-by: default avatartsai <caishenghang@oneflow.org>
  19. Jul 15, 2020
    • Li Xinqi's avatar
      Dev eager (#2966) · 996ebd45
      Li Xinqi authored
      
      * rename Async2sync to Await
      
      * oneflow.eager_fixed_placement
      
      * fix a bug in InstructionBuilder._StatelessCall
      
      * do not panic in VirtualMachine::ForEachMutMirroredObject
      
      * GetMirroredObject
      
      * add instruction ReplaceMirrored
      
      * hob.is_current_machine_master
      
      * rename ForeignWorkerWatcher to ForeignWorkerCallback (#2935)
      
      * EagerJobBuildAndInferCtx
      
      * refactor api_oneflow_function
      
      * eager_oneflow_global_function
      
      * fix a bug in Session.Init()
      
      * interpreter.EagerRun
      
      * JobBuildAndInferCtx::GetMirroredOpParallelConf
      
      * JobBuildAndInferCtx:GetMirroredOpName
      
      * chanege the JobBuildAndInferCtx::Complete function to the pure virtual
      
      * refactor EagerJobBuildAndInferCtx::Complete()
      
      * LazyConsistentBlob/LazyMirroredBlob
      
      * EagerRemoteBlob
      
      * eager logical blob
      
      * rename symbol_dict to symbol_cache; rename object_dict to object_cache
      
      * refactor EagerLogicalBlob with instruction ReplaceMirror
      
      * test_eager_logical_blob
      
      * get_eager_variable
      
      * free stashed variable blob before close session
      
      * Quick fix eager (#2964)
      
      * fix EagerPhysicalBlob
      
      * fix lazy
      
      Co-authored-by: default avatartsai <caishenghang@oneflow.org>
      
      * DeprecatedStatelessCallOpKernel
      
      * python wrapper for DeprecatedStatelessCall
      
      * Fix compile error (#2967)
      
      * fix compile error
      
      * modify by comment
      
      * rename unused cuda copy instruction
      
      * remove OneflowVM<kMaster>
      
      * stream_tag for StatelessCallOpKernel/DeprecatedStatelessCallOpKernel
      
      * quick fix is train in hob (#2971)
      
      Co-authored-by: default avatartsai <caishenghang@oneflow.org>
      
      * CudaHostRegisterBlob (#2972)
      
      * CudaHostRegisterBlob
      
      * CudaHostUnRegisterBlob
      
      * 1) gpu.copy_h2d.*; 2) gpu.copy_d2h.*
      
      * amend CudaHostUnregisterBlob
      
      Co-authored-by: default avatarouyangyu <xuanjiuye@gmail.com>
      
      * CudaHostAllocator for CudaCopyD2HStreamType
      
      * eager_copy
      
      * fix bugs in eager oneflow.copy
      
      * oneflow.copy
      
      * autograd for CopyOp
      
      * move MakeCopyInstructionBuilderFunction from copy.py to eager/vm_util.py
      
      * auto copy_hd
      
      * oneflow.system.assign
      
      * single device version model_init
      
      * more assert in CurJobAddMirroredOp
      
      * BroadcastReference
      
      * rename BroadcastReference to BroadcastObjectReference
      
      * InstructionBuilder.BroadcastBlobReference
      
      * remove RAIIBlobObject; add class BlobObject
      
      * oneflow.python.eager.boxing_util
      
      * rename DeprecatedXXX to SystemXXX
      
      * vm_util.InstructionBuilder cares nothing about logical_blob_name
      
      * refactor eager oneflow.get_variable and eager oneflow.system.assign
      
      * OneToManyBroadcastBlobReference
      
      * no panic in GenerateBackwardOpConfIf
      
      * ConsumedByGradientOp
      
      * fix Global<ResourceDesc> bug and gradient_function_not_found panic
      
      * implement gradient_util.*
      
      * fix misuse bug of CHECK_NOTNULL
      
      * add pass AutoTrainStep and AutoLearningRate for eager execution
      
      * replace ibn with ibn_prefix in StatelessCallOpKernelInstruction
      
      * delete unused bn_in_op index
      
      * object_cache.BnInOp2BlobObjectScope
      
      * refactor ModelInitOpConf
      
      * refactor InstructionBuilder.StatelessCall
      
      * refactor InstructionBuilder._SystemStatelessCall
      
      * fuse UserStatelessCall and SystemStatelessCall
      
      * Operator::GetOpAttributeWithoutOpNameAndLbn
      
      * always pass ParallelConf to InstructionBuilder.StatelessCall
      
      * refactor eager_get_variable
      
      * refactor python functions about OpAttribute
      
      * EagerCastToMirrored
      
      * OpArgAttribute
      
      * refactor interpreter_callback.Interpret
      
      * refactor FetchDelegateBlob
      
      * eager backward interpreter
      
      * BlobRegister
      
      * refactor interpret_callback
      
      * refactor gradient_util
      
      * put eager variable blob object into backward blob register
      
      * refactor interpreter_callback.Interpret
      
      * gradient_util.ReleaseUnusedBlobObject
      
      * Fix opkernel_instruction_type_test and remove machine_id2dev_phy_ids_ (#3047)
      
      * EagerRunBackwardOps
      
      * eager train demo
      
      * refactor interpreter_callback
      
      * remove unused foreign callback apis
      
      * interpret_callback.FindOrCreateVarBlobObject
      
      * rename OpArgAttribute to OpArgParallelAttribute
      
      * refactor EagerConsistentBlob/EagerMirroredBlob
      
      * interpret completed variable op
      
      * bugfix: mv oneflow.function to oneflow.global_function
      
      * fix the bug abount twice called global_function
      
      * refactor NormalModelUpdateKernel::Forward to being called by eager execution
      
      * refactor Kernel::Forward to the public one
      
      * refactor system kernel for being compatible to eager execution
      
      * 1) boxing_util.TryBroadcastOneToMany; 2) vm_util.BoxingStatelessCall
      
      * 1) refactor class Maybe and class Error; 2) fix boxing_util.TryBroadcastOneToMany
      
      * more test case for eager executed model_init
      
      * fix CastToMirrored::InferSbpSignature and CastFromMirrored::InferSbpSignature
      
      * merge develop
      
      * merge develop
      
      * boxing_util.TrySingleDeviceBoxing
      
      * op_executor
      
      * refactor boxing_util with boxing_hob
      
      * more boxing methods
      
      * GetEnvDefaultParallelConf
      
      * boxing_util.NcclAllReduce
      
      * OneflowVm::TryReceiveAndRun
      
      * Remove class ForeignWorkerCallback (#3097)
      
      * Remove class ForeignWorkerCallback
      
      * Add test_2d_gpu_variable
      
      * framework/register_python_callback.py
      
      * rename EagerModelForward to EagerForward
      
      * boxing_hob.MasterMachineOnly
      
      * boxing_util.ComposeBoxing
      
      * HobContextAttr
      
      * boxing_util.BroadcastManyToOne
      
      * boxing_util.NoBoxing
      
      * merge develop
      
      * c_api_util.InferOpConf
      
      * ReplaceBlobParallelDesc
      
      * rename local variables
      
      * refine boxing_util.BoxingTo
      
      * boxing_util.NcclAllReduce
      
      * support non broadcast paralleled variable
      
      * FillLogicalBlobDescSignature in InferOpConf
      
      * NaiveCpuConcatSplit
      
      * fix SystemOpKernelObject::ResetKernel
      
      * RwMutexedObject::Get returns Maybe<const T*> instead of const T&
      
      * CHECK_OK
      
      * 1) boxing_middle; 2) boxing_util.RefBlobObjectWithParallelDesc
      
      * more boxing methods composed with NaiveCpuConcatSplit
      
      * boxing_util.CpuManyOneToOne
      
      * Scope
      
      * 1) refactor vm::SymbolStorage; 2) Scope
      
      * refactor GetOpAttribute4OpConf
      
      * update session.Scope when new name_scope constructed
      
      * refactor c_util.InferOpConf
      
      * OperatorConf.scope_symbol_id
      
      * refactor AddAndInferConsistentOp/AddAndInferMirroredOp
      
      * fix get_variable
      
      * global function input output (#3065)
      
      * eager return
      
      * update test
      
      * update output
      
      * global function input base test pass
      
      * update test
      
      * fix some issues
      
      * EagerConsistentBlob return
      
      * merge dev_eager
      
      * refactor EagerConsistentBlob.numpy(...)
      
      * minor update
      
      * refactor.ModeScope
      
      * refactor GetOpAttribute4OpConf
      
      * fix unittest using numpy_mirrored_list
      
      Co-authored-by: default avatarlixinqi <lixinqi0703106@163.com>
      
      * eager oneflow.watch
      
      * Rename symbol_cache.py to symbol_storage.py (#3138)
      
      * eager watch_diff
      
      * more eager tests
      
      * Remove duplicate function GetParallelContext
      
      * 1) add FeedContext; 2) remove LocalFixedTensor
      
      * add instruction FeedBlob
      
      * rename: WatchBlob => FetchBlob
      
      * oneflow.env.enable_eager_environment
      
      * IsCpuOnly
      
      * fix eager push_util bugs
      
      * code format
      
      * reformat
      
      * ArgBlobDef.SetBatchAxisAndSplitAxis
      
      * CallOpkernel instruction family add argument SbpSignature
      
      * refactor remote eager blob
      
      * refactor InitGlobalCudaDeviceProp
      
      * fix InterfaceOpUtil
      
      * recusive call MakeEagerInputBlobs
      
      * copy returned blob in train job
      
      * 1) refactor UserStatelessCallOpKernel; 2) replace Global<ThreadMgr>::Get()->compute_thread_pool() with Global<ThreadPool>::Get()
      
      * fix gpu argwhere
      
      * refactor BlobObject::header_buffer_
      
      * interpret_util.ConsistentInterpret
      
      * return more debug messages when encountering mixed consistent/mirrored error
      
      * TryMirroredCastTotalLossInstanceNum
      
      * replace compile_context.CurJobAddOp with interpret_util.Forward
      
      * check backward timeline
      
      * add scope to return (#3156)
      
      * add scope to return
      
      * more elegant
      
      * Dev eager merge develop (#3157)
      
      * skip empty stream (#3141)
      
      * skip empty stream
      
      * skip empty stream
      
      * add tbs for gdb in docker (#3139)
      
      * add tbs for gdb in docker
      
      * add more desc
      
      * Fix CUDNN_STATUS_NOT_SUPPORTED error for bn (#3147)
      
      * Fix CUDNN_STATUS_NOT_SUPPORTED error for bn
      
      * always use nchw when training
      
      * fix xla cmake arg (#3144)
      
      Co-authored-by: default avatarguoran <guoran@oneflow.org>
      Co-authored-by: default avatarJuncheng <liujuncheng1022@gmail.com>
      
      * Autograd use user op (#3151)
      
      * Use scalar_mul user op
      
      * IndexedSlicesOptimizerRewritePass use scalar_mul user op
      
      * scalar_sub_by_tensor
      
      * broadcast_div=>scalar_div
      
      * fix name
      
      * fix int_operand
      
      * fix diff_lbi
      
      * install python pkgs from dev-requirements.txt when running CI (#3121)
      
      * Hotfix multi machine vm panic (#3153)
      
      * fix multi machine vm panic
      
      * fix compile bug
      
      * fix vm unittest bug (#3155)
      
      * Fix test cases
      
      Co-authored-by: default avatarLi Xinqi <lixinqi2010@gmail.com>
      Co-authored-by: default avatarShenghang Tsai <jackalcooper@gmail.com>
      Co-authored-by: default avatarJuncheng <liujuncheng1022@gmail.com>
      Co-authored-by: default avatarguo ran <360112263@qq.com>
      Co-authored-by: default avatarguoran <guoran@oneflow.org>
      Co-authored-by: default avatardaquexian <daquexian566@gmail.com>
      
      * ParallelSignature
      
      * fix normalization grad ops timeline
      
      * Dev eager fix assign op (#3160)
      
      * Fix assign op
      
      * Add enable_if to assign api
      
      * Dev eager merge develop branch (#3164)
      
      * fix vm unittest bug (#3155)
      
      * Support BN Ex Operation (#3154)
      
      * Hotfix  CUDNN_STATUS_NOT_SUPPORTED error (#3162)
      
      * xrt support user op (#3152)
      
      * xrt support user op
      
      * xla add Sole func
      
      * tensorrt Sole func
      
      * fix
      
      Co-authored-by: default avatarJuncheng <liujuncheng1022@gmail.com>
      
      Co-authored-by: default avatarLi Xinqi <lixinqi2010@gmail.com>
      Co-authored-by: default avatarJuncheng <liujuncheng1022@gmail.com>
      Co-authored-by: default avatarguo ran <360112263@qq.com>
      
      * refactor BoxingHobContext
      
      * fix FixedTensorDef
      
      * refactor boxing_util.BroadcastManyToOne and boxing_util.BroadcastOneToMany
      
      * refactor eager boxing
      
      * replace compile_context.CurJobAddOp with interpret_util.Forward
      
      * boxing verbose
      
      * Call TryClearObject4BlobName in EagerConsistentBlob.__del__
      
      * fix instruction CudaHostRegisterBlob
      
      * new boxing method B -> S
      
      * blob_register.RegisteredBlobAccess
      
      * fix the use of enable_if (#3179)
      
      * add boxing P->B and P->S
      
      * fix eager oneflow.assign
      
      * Tensor list input and output of eager global function (#3181)
      
      * input tensor list
      
      * test input
      
      * output tensor list and update test
      
      * fix op watch (#3182)
      
      * fix op watch
      
      * optimized code
      
      * Math binary elementwise ops (#3169) (#3184)
      
      * math binary elementwise ops
      
      * implement of math binary elementwise gpu floating kernel
      
      * implement math binary elementwise cpu kernel; add test scripts
      
      * rm note
      
      Co-authored-by: default avatarguo ran <360112263@qq.com>
      
      Co-authored-by: default avatarcheng cheng <472491134@qq.com>
      Co-authored-by: default avatarguo ran <360112263@qq.com>
      
      * rename test_gather* to test_agather* (#3191)
      
      * Dev eager merge develop (#3192)
      
      * Math binary elementwise ops (#3169)
      
      * math binary elementwise ops
      
      * implement of math binary elementwise gpu floating kernel
      
      * implement math binary elementwise cpu kernel; add test scripts
      
      * rm note
      
      Co-authored-by: default avatarguo ran <360112263@qq.com>
      
      * Remove multiply/gelu/tanh system op (#3183)
      
      * Remove multiply system op
      
      * Remove gelu/tanh system op
      
      * fix
      
      * Remove layer_norm/slice system op (#3180)
      
      * Remove layer_norm system op
      
      * Remove slice system op
      
      * Remove scalar_add/scalar_mul system op (#3189)
      
      * Remove useless system op (#3190)
      
      * Remove axpy system op
      
      * Remove print system op
      
      * Remove reduce_mean system op
      
      * Remove local_response_normalization system op
      
      * cleanup kernel.proto
      
      * Remove dot system op
      
      * Remove maximum system op
      
      * cleanup TopKOpConf
      
      * cleanup op_conf
      
      * remove TryUpdtBnVal4SepcialOpConf
      
      * fix xrt print
      
      Co-authored-by: default avatarcheng cheng <472491134@qq.com>
      Co-authored-by: default avatarguo ran <360112263@qq.com>
      Co-authored-by: default avatarJuncheng <liujuncheng1022@gmail.com>
      
      * remove test_eager.py (#3193)
      
      Co-authored-by: default avatarouyangyu <xuanjiuye@gmail.com>
      
      Co-authored-by: default avatarOuYang Yu <xuanjiuye@gmail.com>
      Co-authored-by: default avatarShenghang Tsai <jackalcooper@gmail.com>
      Co-authored-by: default avatartsai <caishenghang@oneflow.org>
      Co-authored-by: default avatarleaves-zwx <kunta0932@gmail.com>
      Co-authored-by: default avatarJuncheng <liujuncheng1022@gmail.com>
      Co-authored-by: default avatarguo ran <360112263@qq.com>
      Co-authored-by: default avatarguoran <guoran@oneflow.org>
      Co-authored-by: default avatardaquexian <daquexian566@gmail.com>
      Co-authored-by: default avatarcheng cheng <472491134@qq.com>
  20. Jul 02, 2020
  21. Jun 02, 2020
    • daquexian's avatar
      add nn.batch_normalization, support fp16 in bn and add tests (#2941) · 2514adea
      daquexian authored
      
      * support fp16 in bn and add tests
      
      * add bn in gray_list
      
      * reformat
      
      * fix dtype in test
      
      * add missing import
      
      * set dtype in FixedTensorDef in tests
      
      * add nn.batch_normalization, update python code, fix backward in fp16 non-training mode
      
      * cast back to fp32 in tests
      
      * watch_diff on fp32 blob, relax the atol, add notes on nn.bn
      
      * add more tests and handle negative axis in nn.bn
      
      * add comments for fp32 params
      
      * add fp16 trainable && not training test
      
      * skip bn fp16 test in non-user-op mode
      
      * register no cast registry for bn
      
      * update amp batch_axis check
      
      * address comments
      
      * reformat
      
      Co-authored-by: default avatarguo ran <360112263@qq.com>
      Co-authored-by: default avatarJuncheng <liujuncheng1022@gmail.com>
  22. May 19, 2020
  23. May 17, 2020
    • daquexian's avatar
      migrate normalization op (#2768) · 34084690
      daquexian authored
      
      * [WIP] add normalization user op
      
      * Fix no input diff problem, thank to @liujuncheng
      
      * stash
      
      * migrate scalar_add to user op
      
      * (not tested) support is_training=False
      
      * unify cpu and gpu kernels
      
      * update test_util.py to support non-training test
      
      * fix test_util.py
      
      * Add trainable op attr and training python param, rename is_training to training, and other minor changes
      
      * test bn inference
      
      * format
      
      * fix getsbpfn
      
      * restore test_util.py
      
      * update comments
      
      * update comments
      
      * remove trainable attr in c++
      
      * use constant when not trainable
      
      * Add high epsilon test case
      
      * Fix wrong sbp
      
      * add new method, update the template
      
      * use the new test_global_storage and CHECK_OR_RETURN api
      
      * rename REGISTER_KERNEL macro
      
      * make some vars const
      
      * use new sbp builder api
      
      * format
      
      * use new input modifier api, add inferandtryrun, refine the test
      
      * add the missing JUST
      
      * remove the unnecessary copy
      
      * give grad ops a descriptive name
      
      * format
      
      * support trainable=True and training=False
      
      * format
      
      * rename some vars
      
      * skip trainable_without_training test for non-user-op mode
      
      * add CUDNN_BN_MIN_EPSILON check
      
      * wrap CUDNN_BN_MIN_EPSILON check with WITH_CUDA
      
      * remove unnecessary dx check and shape assignment
      
      * remove c++ attr center and scale
      
      * remove duplicated 'import os'
      
      * rename in->x, out->y
      
      * check tensor_desc is not nullptr in SetParamTensorDesc
      
      * set 'trainble' of moving_mean and moving_variance to False, wrap the non-user-op implementation with if-else
      
      * fix the inconsistent trainable parameter
      
      * rename out_desc->y_desc
      
      * skip adding grad_op when not necessary
      
      * remove the comments about CHECK_NE_OR_RETURN
      
      * reformat
      
      * rename out_grad_* => dy_*
      
      * move kernel into anonymous namespace
      
      Co-authored-by: default avatarShenghang Tsai <jackalcooper@gmail.com>