Commits · 1e351cf84d2842bc10d1773ab981744609b6412e · Summer2021 / 210130121 · GitLab

Snippets Groups Projects

Jul 19, 2021

registry_callback_fn return maybe (#5456) · 1e351cf8

liufengwei0103 authored 3 years ago


* modified SetInputArgModifyFn

* Delete the CHECK changes in the assign_op.cpp file

* Format

* Modified the OutputArgModifyFn interface

* add return

* maybe error stack from CheckAndConstructOp to OutputArgModifier callback function

* maybe error stack from CheckAndConstructOp to OutputArgModifier callback function

* OutputArgModifier return maybe part_1

* maybe error stack from CheckAndConstructOp to OutputArgModifier callback function

* input_arg_modifier return maybe

* gen_bw_fn return maybe

* bw_gen_fn return maybe

* registry_callback_fn return maybe

* fix bug after merge master

* fix bug

Co-authored-by: aishangjj <702572275@qq.com>

1e351cf8

Jul 18, 2021

bw_gen_fn return maybe (#5455) · bd2d3dc2

liufengwei0103 authored 3 years ago


* modified SetInputArgModifyFn

* Delete the CHECK changes in the assign_op.cpp file

* Format

* Modified the OutputArgModifyFn interface

* add return

* maybe error stack from CheckAndConstructOp to OutputArgModifier callback function

* maybe error stack from CheckAndConstructOp to OutputArgModifier callback function

* OutputArgModifier return maybe part_1

* maybe error stack from CheckAndConstructOp to OutputArgModifier callback function

* input_arg_modifier return maybe

* gen_bw_fn return maybe

* bw_gen_fn return maybe

* fix bug: return Maybe without JUST

Co-authored-by: aishangjj <702572275@qq.com>

bd2d3dc2

Jul 16, 2021

Input arg modifier return maybe (#5453) · e39e3c62

liufengwei0103 authored 3 years ago


* modified SetInputArgModifyFn

* Delete the CHECK changes in the assign_op.cpp file

* Format

* Modified the OutputArgModifyFn interface

* add return

* maybe error stack from CheckAndConstructOp to OutputArgModifier callback function

* maybe error stack from CheckAndConstructOp to OutputArgModifier callback function

* OutputArgModifier return maybe part_1

* maybe error stack from CheckAndConstructOp to OutputArgModifier callback function

* input_arg_modifier return maybe

* change lambda for JUST macro

* fix conflicts

Co-authored-by: aishangjj <702572275@qq.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

e39e3c62

Jul 10, 2021

Refactor infer ctx input tensordesc (#5226) · 3abfe716

luqiang guo authored 3 years ago


* Refactor infer ctx input tensordesc and related calls

* Modify the missing content

* Modify the post-merged error

* Add input judgment

* Missing modifications

* Fix merge errors

* format

* Modify omission

* Fix the crash caused by lack of judgment

* Delete comment

* Optimize naming

* Delete comment

* Delete blank lines

Co-authored-by: liufengwei0103 <2472937968@qq.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

3abfe716

Jun 18, 2021

Add interface InferContext::OutputTensorDesc (#5219) · 9c1b19ca

Tianyu Zhao authored 3 years ago


* Add interface InferContext::OutputTensorDesc

* auto format by CI

* Implement 'OutputTensorDesc' for 'UserOpExprInferContext'

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

9c1b19ca

Apr 27, 2021

Rename InferXXXFn as XXXInferFn (#4739) · e4cbac0c

Hongsheng Wang authored 3 years ago


* Rename InferXXXFn as XXXInferFn

* rename interface_name

* modify infer_xxx_fn to xxx_infer_fn

Co-authored-by: binbinHan <han_binbin@163.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

e4cbac0c

Apr 22, 2021

Remove user op conf in kernel init ctx (#4659) · 8f96b06d

binbinHan authored 3 years ago


* remove_user_op_conf_in_kernel_compute_ctx

* remove_user_op_conf_in_kernel_infer_ctx

* remove_user_op_conf_in_kernel_init_ctx

* remove_user_op_conf_in_kernel_create_ctx

* remove user_op_conf in UserKernelOpInferContext

* solve KernelComputeContext in nvtx_range_kernel.c

* remove user_op_conf in InferContext and slove bug after pull master

* del useless code

* remove attrs_ to derived class

* optimize

* del useless code

* use Attr() instead of attr()

* remove attr() in ctx

* minor fix

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

8f96b06d

Apr 08, 2021

Make infer data type away from tensor desc (#4536) · be319893

Yurui Li authored 3 years ago


* add blob object id and eager blob object interface

* use maybe for blob object id

* minor fix

* add registry for data type, modified dropout as example

* revert origin data type infer

* add infer data type entry

* minor fix infer data type

* user op 32~40

* add unchange data type

* user op 26~40

* fix inferDataType from 0-25

* remove infer data type from util no change

* fix is_dynamic attribute

* user op 70~80

* user op 66~70

* remove redundant dtype

* minor fix one hot

* move data infer interface to user op

* review and fix

* Zailiang_Make infer data type away from tensor desc (#4544)

* same_padding_op.cpp

* scalar_add_op.cpp

* scalar_by_tensor_op.cpp

* scalar_mul_op.cpp

* scalar_pow_op.cpp

* Sigmoid_cross_entropy_op.cpp

* slice_op.cpp

* smooth_l1_loss_op.cpp

* softmax_cross_entropy_op.cpp

* Update scalar_by_tensor_op.cpp

* sigmoid_op.cpp

* Dev split infer data type from tensor desc (81~95) (#4545)

* Splite InferDataType form 81 to 95

* Add is_dynamic

* fix relu

* Fix relu_op

* Add Reduce ops InferDataType

* start up (#4539)

* start up

* 121~128 done

* refine

* edit test_ops.cpp

* SetPhysicalTensorDescInferFn

* finished

* reformat

* remove reduntant code

* upsampling

* test_ops

Co-authored-by: poohRui <yuruil@qq.com>
Co-authored-by: Yurui Li <32978179+poohRui@users.noreply.github.com>

* Dev refactor split infer data type (#4547)

* softmax_op.cpp

* 107:sort_op.cpp

* 106-115

* 108-109

* 110

* 111

* 112

* 113

* 115

* 106-115

* modify 106-115

* modify assign

* zzk fix

Co-authored-by: MARD1NO <359521840@qq.com>
Co-authored-by: Yurui Li <32978179+poohRui@users.noreply.github.com>

* fix compile bug

* warning fix

* user op 41-80 (#4540)

* user op 41-50

* user op 51-55

* fix image_object_preprocess ops

* user op 56-65

* update

* update

* update

* fix a bug

* fix bug

* fix bug

* fix devconv bug

* fix multiple bugs

* make tensor buffer to tesor list as todo

* add impl for tensor buffer to tensor list

* remove data type from infer shape in matmul

* add check

* fix transpose bug

* fix test op bug

* add check for add n

* fix cast to static shape op bug

* fix dynamic loss

* remove unnecessary code in flatten op

* fix hierarchical parallel

* fix leaky relu op

* fix math binary broadcast

* remove unnecessary data type infer in math binary

* fix terrible bug in math binary

* format

* fix softmax bug

* remove unnecessary code for softmax cross entropy

* fix conv

* remove unnecessary code in maximum

* add comments

* fix image preprocess

* fix layer norm

* auto to auto&

* fix partial fc sample

* fix partial fc sample

* fix image preprocess ops

* format

* user sigmoid op

* format

* fix partial fc

* fix prelu

* add newer ops

* fix bug in math binary

* fix combined margin loss op

* fix ofrecord reader

* add fused bias add op

* format

Co-authored-by: MARD1NO <359521840@qq.com>
Co-authored-by: Zailiang <zailiangyu@gmail.com>
Co-authored-by: Zhenhua <1209435+hengzi@users.noreply.github.com>
Co-authored-by: doombeaker <later@usopp.net>
Co-authored-by: Hongsheng Wang <31394900+wanghongsheng01@users.noreply.github.com>
Co-authored-by: Mosout <mosout@163.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

be319893

Mar 03, 2021

fix bn_add_relu LogicalTensorDescInferFn (#4305) · d456c3b3

guo ran authored 4 years ago


* bn_add_relu SetLogicalTensorDescInferFn

* fix

* refine

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

d456c3b3

Feb 24, 2021

Remove batch_axis (#4238) · d5b2fee2

cheng cheng authored 4 years ago


* Remove batch_axis

* rm partial tick

* temp fix bug of flow.math.add

* fix bug of input_blob_def.split_axis init

* [KEY] refine infer sbp order value consider logical shape enable split

* ignore op get sbp sign ERROR.

* filter and check valid sbp sign by logical shape; rm magic num

* fix bug of return

* merge rm sigmoid cross entropy op

* rm sigmoid batch axis fn

* more debug log for check valid in get sbp sign

* rm useless check

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

d5b2fee2

Oct 27, 2020

User op registry attr (#3716) · 856ba4ab

OuYang Yu authored 4 years ago


* refactor user_op_registry Attr

* Replace the User Op Registry Attr with a new interface (#3714)

* replace user op attr

Co-authored-by: lixinqi <lixinqi0703106@163.com>

856ba4ab

Oct 05, 2020
- bn_add_relu use bit mask (#3645) · 315d6962
  guo ran authored 4 years ago
```
* bn_add_relu use bit mask

* refine

* refine
```
  Unverified
  
  315d6962
Sep 04, 2020

FuseAddToOutput (#3524) · fd2fe579

Juncheng authored 4 years ago


* FuseAddToOutput

* change default value of enable_fuse_add_to_output to false

* consider reachability

* supported_op_type_name2output_bn=>supported_op_type_name2output_arg

* Add necessary comments

* refine

* fix

Co-authored-by: guo ran <360112263@qq.com>

fd2fe579

Aug 31, 2020
- Fused BatchNormAddRelu (#3519) · 5f7a1ba7
  Juncheng authored 4 years ago
```
* Fused BatchNormAddRelu

* skip if cpu

* d_addend=>addend_diff
```
  Unverified
  
  5f7a1ba7
Aug 03, 2020
- Fix lambda return type deduction (#3388) · 7b31f798
  llehtahw authored 4 years ago
```
* Update const std::string param type

* Fix lambda return type
```
  Unverified
  
  7b31f798
- rename customized to user (#3379) · cd836a54
  OuYang Yu authored 4 years ago
  
  Unverified
  
  cd836a54
Jul 31, 2020
- fix name conflict in bn grad subgraph (#3344) · 4b85abcf
  daquexian authored 4 years ago
  
  Unverified
  
  4b85abcf
Jul 29, 2020

Refactor user op c++ grad register api (#3321) · d0931f35

strint authored 4 years ago


* #3196 refactor user op c++ register draft

* #3196 refactor op register c++ api part2

* #3196 refine interface

* #3196 user new get

* #3196 rename

* #3196 finish Op reg

* #3196 finish Op grad reg

* #3196 refine

* #3196 op kernel reg part 1

* #3196 op kernel reg part 2

* #3196 op kernel reg part 3

* #3196 op kernel reg part 4

* #3196 op kernel reg part 5 fix

* #3196 compile & test pass

* #3196 merge branch develop

* #3196 rm useless code

* use new api to rewrite batch norm grad graph

* use new api to rewrite batch norm grad graph fix typo

* use new api to rewrite batch norm grad graph fix typo

* use new api to rewrite where

* implement new interface

* finish BackwardOpConfContext

* finish first version bw_op_gen_context & fw_op interface

* refine lambda fn & interface

* refine lambda var capture in func

* check batch norm & add comment

* batch norm & where finish & pass test

* refine interface of builder fn

* refactor relu grad

* rm relu grad

Co-authored-by: Li Xinqi <lixinqi2010@gmail.com>

d0931f35

Jul 23, 2020

Dev apache2 license (#3266) · d0bdbd5d

Shenghang Tsai authored 4 years ago


* add license at root dir

* check in empty files

* rm space

* check in script

* update script

* fix bug

* add print

* fix

* add exit

* add to of_format

* add CI task

* fix license

* Revert "fix license"

This reverts commit 818b6d7691d3a8b4a25dd41a47ff2c5922b8ec57.

* only add once

* quick fix

* fix script

* dont fmt empty file

* fix

* quick fix

* fix py

* add license

* fix exit

* add license for hpp

* add license

* license new vm files

Co-authored-by: tsai <caishenghang@oneflow.org>

d0bdbd5d

Jul 15, 2020

Dev eager (#2966) · 996ebd45

Li Xinqi authored 4 years ago


* rename Async2sync to Await

* oneflow.eager_fixed_placement

* fix a bug in InstructionBuilder._StatelessCall

* do not panic in VirtualMachine::ForEachMutMirroredObject

* GetMirroredObject

* add instruction ReplaceMirrored

* hob.is_current_machine_master

* rename ForeignWorkerWatcher to ForeignWorkerCallback (#2935)

* EagerJobBuildAndInferCtx

* refactor api_oneflow_function

* eager_oneflow_global_function

* fix a bug in Session.Init()

* interpreter.EagerRun

* JobBuildAndInferCtx::GetMirroredOpParallelConf

* JobBuildAndInferCtx:GetMirroredOpName

* chanege the JobBuildAndInferCtx::Complete function to the pure virtual

* refactor EagerJobBuildAndInferCtx::Complete()

* LazyConsistentBlob/LazyMirroredBlob

* EagerRemoteBlob

* eager logical blob

* rename symbol_dict to symbol_cache; rename object_dict to object_cache

* refactor EagerLogicalBlob with instruction ReplaceMirror

* test_eager_logical_blob

* get_eager_variable

* free stashed variable blob before close session

* Quick fix eager (#2964)

* fix EagerPhysicalBlob

* fix lazy

Co-authored-by: tsai <caishenghang@oneflow.org>

* DeprecatedStatelessCallOpKernel

* python wrapper for DeprecatedStatelessCall

* Fix compile error (#2967)

* fix compile error

* modify by comment

* rename unused cuda copy instruction

* remove OneflowVM<kMaster>

* stream_tag for StatelessCallOpKernel/DeprecatedStatelessCallOpKernel

* quick fix is train in hob (#2971)

Co-authored-by: tsai <caishenghang@oneflow.org>

* CudaHostRegisterBlob (#2972)

* CudaHostRegisterBlob

* CudaHostUnRegisterBlob

* 1) gpu.copy_h2d.*; 2) gpu.copy_d2h.*

* amend CudaHostUnregisterBlob

Co-authored-by: ouyangyu <xuanjiuye@gmail.com>

* CudaHostAllocator for CudaCopyD2HStreamType

* eager_copy

* fix bugs in eager oneflow.copy

* oneflow.copy

* autograd for CopyOp

* move MakeCopyInstructionBuilderFunction from copy.py to eager/vm_util.py

* auto copy_hd

* oneflow.system.assign

* single device version model_init

* more assert in CurJobAddMirroredOp

* BroadcastReference

* rename BroadcastReference to BroadcastObjectReference

* InstructionBuilder.BroadcastBlobReference

* remove RAIIBlobObject; add class BlobObject

* oneflow.python.eager.boxing_util

* rename DeprecatedXXX to SystemXXX

* vm_util.InstructionBuilder cares nothing about logical_blob_name

* refactor eager oneflow.get_variable and eager oneflow.system.assign

* OneToManyBroadcastBlobReference

* no panic in GenerateBackwardOpConfIf

* ConsumedByGradientOp

* fix Global<ResourceDesc> bug and gradient_function_not_found panic

* implement gradient_util.*

* fix misuse bug of CHECK_NOTNULL

* add pass AutoTrainStep and AutoLearningRate for eager execution

* replace ibn with ibn_prefix in StatelessCallOpKernelInstruction

* delete unused bn_in_op index

* object_cache.BnInOp2BlobObjectScope

* refactor ModelInitOpConf

* refactor InstructionBuilder.StatelessCall

* refactor InstructionBuilder._SystemStatelessCall

* fuse UserStatelessCall and SystemStatelessCall

* Operator::GetOpAttributeWithoutOpNameAndLbn

* always pass ParallelConf to InstructionBuilder.StatelessCall

* refactor eager_get_variable

* refactor python functions about OpAttribute

* EagerCastToMirrored

* OpArgAttribute

* refactor interpreter_callback.Interpret

* refactor FetchDelegateBlob

* eager backward interpreter

* BlobRegister

* refactor interpret_callback

* refactor gradient_util

* put eager variable blob object into backward blob register

* refactor interpreter_callback.Interpret

* gradient_util.ReleaseUnusedBlobObject

* Fix opkernel_instruction_type_test and remove machine_id2dev_phy_ids_ (#3047)

* EagerRunBackwardOps

* eager train demo

* refactor interpreter_callback

* remove unused foreign callback apis

* interpret_callback.FindOrCreateVarBlobObject

* rename OpArgAttribute to OpArgParallelAttribute

* refactor EagerConsistentBlob/EagerMirroredBlob

* interpret completed variable op

* bugfix: mv oneflow.function to oneflow.global_function

* fix the bug abount twice called global_function

* refactor NormalModelUpdateKernel::Forward to being called by eager execution

* refactor Kernel::Forward to the public one

* refactor system kernel for being compatible to eager execution

* 1) boxing_util.TryBroadcastOneToMany; 2) vm_util.BoxingStatelessCall

* 1) refactor class Maybe and class Error; 2) fix boxing_util.TryBroadcastOneToMany

* more test case for eager executed model_init

* fix CastToMirrored::InferSbpSignature and CastFromMirrored::InferSbpSignature

* merge develop

* merge develop

* boxing_util.TrySingleDeviceBoxing

* op_executor

* refactor boxing_util with boxing_hob

* more boxing methods

* GetEnvDefaultParallelConf

* boxing_util.NcclAllReduce

* OneflowVm::TryReceiveAndRun

* Remove class ForeignWorkerCallback (#3097)

* Remove class ForeignWorkerCallback

* Add test_2d_gpu_variable

* framework/register_python_callback.py

* rename EagerModelForward to EagerForward

* boxing_hob.MasterMachineOnly

* boxing_util.ComposeBoxing

* HobContextAttr

* boxing_util.BroadcastManyToOne

* boxing_util.NoBoxing

* merge develop

* c_api_util.InferOpConf

* ReplaceBlobParallelDesc

* rename local variables

* refine boxing_util.BoxingTo

* boxing_util.NcclAllReduce

* support non broadcast paralleled variable

* FillLogicalBlobDescSignature in InferOpConf

* NaiveCpuConcatSplit

* fix SystemOpKernelObject::ResetKernel

* RwMutexedObject::Get returns Maybe<const T*> instead of const T&

* CHECK_OK

* 1) boxing_middle; 2) boxing_util.RefBlobObjectWithParallelDesc

* more boxing methods composed with NaiveCpuConcatSplit

* boxing_util.CpuManyOneToOne

* Scope

* 1) refactor vm::SymbolStorage; 2) Scope

* refactor GetOpAttribute4OpConf

* update session.Scope when new name_scope constructed

* refactor c_util.InferOpConf

* OperatorConf.scope_symbol_id

* refactor AddAndInferConsistentOp/AddAndInferMirroredOp

* fix get_variable

* global function input output (#3065)

* eager return

* update test

* update output

* global function input base test pass

* update test

* fix some issues

* EagerConsistentBlob return

* merge dev_eager

* refactor EagerConsistentBlob.numpy(...)

* minor update

* refactor.ModeScope

* refactor GetOpAttribute4OpConf

* fix unittest using numpy_mirrored_list

Co-authored-by: lixinqi <lixinqi0703106@163.com>

* eager oneflow.watch

* Rename symbol_cache.py to symbol_storage.py (#3138)

* eager watch_diff

* more eager tests

* Remove duplicate function GetParallelContext

* 1) add FeedContext; 2) remove LocalFixedTensor

* add instruction FeedBlob

* rename: WatchBlob => FetchBlob

* oneflow.env.enable_eager_environment

* IsCpuOnly

* fix eager push_util bugs

* code format

* reformat

* ArgBlobDef.SetBatchAxisAndSplitAxis

* CallOpkernel instruction family add argument SbpSignature

* refactor remote eager blob

* refactor InitGlobalCudaDeviceProp

* fix InterfaceOpUtil

* recusive call MakeEagerInputBlobs

* copy returned blob in train job

* 1) refactor UserStatelessCallOpKernel; 2) replace Global<ThreadMgr>::Get()->compute_thread_pool() with Global<ThreadPool>::Get()

* fix gpu argwhere

* refactor BlobObject::header_buffer_

* interpret_util.ConsistentInterpret

* return more debug messages when encountering mixed consistent/mirrored error

* TryMirroredCastTotalLossInstanceNum

* replace compile_context.CurJobAddOp with interpret_util.Forward

* check backward timeline

* add scope to return (#3156)

* add scope to return

* more elegant

* Dev eager merge develop (#3157)

* skip empty stream (#3141)

* skip empty stream

* skip empty stream

* add tbs for gdb in docker (#3139)

* add tbs for gdb in docker

* add more desc

* Fix CUDNN_STATUS_NOT_SUPPORTED error for bn (#3147)

* Fix CUDNN_STATUS_NOT_SUPPORTED error for bn

* always use nchw when training

* fix xla cmake arg (#3144)

Co-authored-by: guoran <guoran@oneflow.org>
Co-authored-by: Juncheng <liujuncheng1022@gmail.com>

* Autograd use user op (#3151)

* Use scalar_mul user op

* IndexedSlicesOptimizerRewritePass use scalar_mul user op

* scalar_sub_by_tensor

* broadcast_div=>scalar_div

* fix name

* fix int_operand

* fix diff_lbi

* install python pkgs from dev-requirements.txt when running CI (#3121)

* Hotfix multi machine vm panic (#3153)

* fix multi machine vm panic

* fix compile bug

* fix vm unittest bug (#3155)

* Fix test cases

Co-authored-by: Li Xinqi <lixinqi2010@gmail.com>
Co-authored-by: Shenghang Tsai <jackalcooper@gmail.com>
Co-authored-by: Juncheng <liujuncheng1022@gmail.com>
Co-authored-by: guo ran <360112263@qq.com>
Co-authored-by: guoran <guoran@oneflow.org>
Co-authored-by: daquexian <daquexian566@gmail.com>

* ParallelSignature

* fix normalization grad ops timeline

* Dev eager fix assign op (#3160)

* Fix assign op

* Add enable_if to assign api

* Dev eager merge develop branch (#3164)

* fix vm unittest bug (#3155)

* Support BN Ex Operation (#3154)

* Hotfix  CUDNN_STATUS_NOT_SUPPORTED error (#3162)

* xrt support user op (#3152)

* xrt support user op

* xla add Sole func

* tensorrt Sole func

* fix

Co-authored-by: Juncheng <liujuncheng1022@gmail.com>

Co-authored-by: Li Xinqi <lixinqi2010@gmail.com>
Co-authored-by: Juncheng <liujuncheng1022@gmail.com>
Co-authored-by: guo ran <360112263@qq.com>

* refactor BoxingHobContext

* fix FixedTensorDef

* refactor boxing_util.BroadcastManyToOne and boxing_util.BroadcastOneToMany

* refactor eager boxing

* replace compile_context.CurJobAddOp with interpret_util.Forward

* boxing verbose

* Call TryClearObject4BlobName in EagerConsistentBlob.__del__

* fix instruction CudaHostRegisterBlob

* new boxing method B -> S

* blob_register.RegisteredBlobAccess

* fix the use of enable_if (#3179)

* add boxing P->B and P->S

* fix eager oneflow.assign

* Tensor list input and output of eager global function (#3181)

* input tensor list

* test input

* output tensor list and update test

* fix op watch (#3182)

* fix op watch

* optimized code

* Math binary elementwise ops (#3169) (#3184)

* math binary elementwise ops

* implement of math binary elementwise gpu floating kernel

* implement math binary elementwise cpu kernel; add test scripts

* rm note

Co-authored-by: guo ran <360112263@qq.com>

Co-authored-by: cheng cheng <472491134@qq.com>
Co-authored-by: guo ran <360112263@qq.com>

* rename test_gather* to test_agather* (#3191)

* Dev eager merge develop (#3192)

* Math binary elementwise ops (#3169)

* math binary elementwise ops

* implement of math binary elementwise gpu floating kernel

* implement math binary elementwise cpu kernel; add test scripts

* rm note

Co-authored-by: guo ran <360112263@qq.com>

* Remove multiply/gelu/tanh system op (#3183)

* Remove multiply system op

* Remove gelu/tanh system op

* fix

* Remove layer_norm/slice system op (#3180)

* Remove layer_norm system op

* Remove slice system op

* Remove scalar_add/scalar_mul system op (#3189)

* Remove useless system op (#3190)

* Remove axpy system op

* Remove print system op

* Remove reduce_mean system op

* Remove local_response_normalization system op

* cleanup kernel.proto

* Remove dot system op

* Remove maximum system op

* cleanup TopKOpConf

* cleanup op_conf

* remove TryUpdtBnVal4SepcialOpConf

* fix xrt print

Co-authored-by: cheng cheng <472491134@qq.com>
Co-authored-by: guo ran <360112263@qq.com>
Co-authored-by: Juncheng <liujuncheng1022@gmail.com>

* remove test_eager.py (#3193)

Co-authored-by: ouyangyu <xuanjiuye@gmail.com>

Co-authored-by: OuYang Yu <xuanjiuye@gmail.com>
Co-authored-by: Shenghang Tsai <jackalcooper@gmail.com>
Co-authored-by: tsai <caishenghang@oneflow.org>
Co-authored-by: leaves-zwx <kunta0932@gmail.com>
Co-authored-by: Juncheng <liujuncheng1022@gmail.com>
Co-authored-by: guo ran <360112263@qq.com>
Co-authored-by: guoran <guoran@oneflow.org>
Co-authored-by: daquexian <daquexian566@gmail.com>
Co-authored-by: cheng cheng <472491134@qq.com>

996ebd45

Jul 02, 2020

Add identity op to bn grad (#3118) · 61a2df94

Juncheng authored 4 years ago

* Add identity op to bn grad

* rename identity to diff_identity

* SetTensorDescInferFn use TensorDesc

* revert flow.identity api

61a2df94

Jun 02, 2020

add nn.batch_normalization, support fp16 in bn and add tests (#2941) · 2514adea

daquexian authored 4 years ago


* support fp16 in bn and add tests

* add bn in gray_list

* reformat

* fix dtype in test

* add missing import

* set dtype in FixedTensorDef in tests

* add nn.batch_normalization, update python code, fix backward in fp16 non-training mode

* cast back to fp32 in tests

* watch_diff on fp32 blob, relax the atol, add notes on nn.bn

* add more tests and handle negative axis in nn.bn

* add comments for fp32 params

* add fp16 trainable && not training test

* skip bn fp16 test in non-user-op mode

* register no cast registry for bn

* update amp batch_axis check

* address comments

* reformat

Co-authored-by: guo ran <360112263@qq.com>
Co-authored-by: Juncheng <liujuncheng1022@gmail.com>

2514adea

May 19, 2020

Unified atter (#2903) · 04d30334

OuYang Yu authored 4 years ago


* Unified atter

* code format

* The blank space

* replace GetAttr to Attr

* replace Attr with SetAttr *.py

* code format

Co-authored-by: Shenghang Tsai <jackalcooper@gmail.com>

04d30334

May 17, 2020

migrate normalization op (#2768) · 34084690

daquexian authored 4 years ago


* [WIP] add normalization user op

* Fix no input diff problem, thank to @liujuncheng

* stash

* migrate scalar_add to user op

* (not tested) support is_training=False

* unify cpu and gpu kernels

* update test_util.py to support non-training test

* fix test_util.py

* Add trainable op attr and training python param, rename is_training to training, and other minor changes

* test bn inference

* format

* fix getsbpfn

* restore test_util.py

* update comments

* update comments

* remove trainable attr in c++

* use constant when not trainable

* Add high epsilon test case

* Fix wrong sbp

* add new method, update the template

* use the new test_global_storage and CHECK_OR_RETURN api

* rename REGISTER_KERNEL macro

* make some vars const

* use new sbp builder api

* format

* use new input modifier api, add inferandtryrun, refine the test

* add the missing JUST

* remove the unnecessary copy

* give grad ops a descriptive name

* format

* support trainable=True and training=False

* format

* rename some vars

* skip trainable_without_training test for non-user-op mode

* add CUDNN_BN_MIN_EPSILON check

* wrap CUDNN_BN_MIN_EPSILON check with WITH_CUDA

* remove unnecessary dx check and shape assignment

* remove c++ attr center and scale

* remove duplicated 'import os'

* rename in->x, out->y

* check tensor_desc is not nullptr in SetParamTensorDesc

* set 'trainble' of moving_mean and moving_variance to False, wrap the non-user-op implementation with if-else

* fix the inconsistent trainable parameter

* rename out_desc->y_desc

* skip adding grad_op when not necessary

* remove the comments about CHECK_NE_OR_RETURN

* reformat

* rename out_grad_* => dy_*

* move kernel into anonymous namespace

Co-authored-by: Shenghang Tsai <jackalcooper@gmail.com>

34084690