Commits · 77bcbedd9ecf2029914ef0c12c5c3c3cf6b1b969 · Summer2021 / 210130121

Oct 11, 2019

Juncheng authored 5 years ago

* kCudaAlignSize = 256

* always compute_70

* __CUDA_API_VERSION >= 10000

* __CUDA_API_VERSION >= 10000

* disable_all_reduce_sequence

77bcbedd

Oct 10, 2019

Memory Version 2.0 Step 2: MemSharedAndReused between jobs (#2267) · f4887a09

cheng cheng authored 5 years ago

* MemBlockProto and ChunkProto

* create mem block and chunk after improver

* interface merge mem block and chunk between sub plans

* merge chunk between jobs for memory reuse

* using memory zone unique id replace memory case hash

* merge interface op mem block between jobs for mem shared

* gen GlobalCriticalSection by mem block id and chunk id

* check mem block and chunk valid before runtime

* Refactor: RegstMgr ;  allocate memory by mem block and chunk instead of regst

* fix bug; and pass test

* fig bug: init chunk_id_count in id_manager

* reuse copyHd out mem between jobs

* PushPlan and PullPlan for memblock and chunk

* refine merge mem block / chunk in oneflow.cpp

* at(i);

* GetOpName2JobId2TaskProtos functional

* using output ptr; pass test AlexNet and Resnet

f4887a09

Sep 30, 2019

Fix dev python test (#2294) · 345a0097

QiaoJing authored 5 years ago

* add decode random

* fix decode random actor

* fix dev_python test scripts

* fix batch_size test scripts

* fix

345a0097

Dev bias add (#2299) · fad37c63

Juncheng authored 5 years ago

* use bias add

* fix

* bias_add

* bias add half

* fix

* reinterpret_cast

* fix half

* HALF

* fix

* ADD_DEFAULT_KERNEL_CREATOR

* fix

* format

fad37c63

fix bug: reduce split kernel inplace (#2297) · 3264941f
cheng cheng authored 5 years ago

3264941f

Sep 27, 2019
- cmake find python note when version less 3.14 (#2286) · 8f25a592
  cheng cheng authored 5 years ago
  
  8f25a592
Sep 25, 2019

Dev pr boxing v2 (#2248) · 3e951b0b

Juncheng authored 5 years ago

* NcclDeviceCtx

* include naive_actor

* refine

* use_boxing_v2

* config.use_boxing_v2

* SubTskGphBuilder

* fix

* hash<oneflow::MemoryCase>

* Maybe<void>

* ChainSubTskGphBuilder

* SliceBoxingOp

* return ok

* SliceBoxingKernel

* SliceBoxingActor

* kSliceBoxing

* nccl boxing op

* nccl actor

* REGISTER_OP

* GetMsgFromCustomizedConf

* NcclBoxingTaskNode

* BldSubTskGphByBoxingV2

* NcclBoxingSubTskGphBuilder

* fix

* fix

* NcclKernel

* ParallelContext

* REGISTER_ACTOR

* fix rank set

* IsNcclTaskType

* limit

* 1024

* multi thread reader

* thread_num

* IsKernelLaunchSynchronized

* refine

* NcclTupleReduce/BroadcastKernel use NcclDeviceCtx

* MakeHostMemCase

* NcclBldSubTskGph

* remove use less code

* use_boxing_v2

* refine

* refine

* refine

* refine

* refine

3e951b0b

Sep 24, 2019
- Fix random decode (#2252) · 085731ed
  QiaoJing authored 5 years ago
```
* add decode random

* fix decode random actor
```
  085731ed
Sep 23, 2019
- record reader multi thread (#2246) · 15ad65fd
  Juncheng authored 5 years ago
```
* multi thread

* ComputeThreadPoolSize

* python api
```
  15ad65fd
- memory version 2 step 1: replace original concept about mem sharing (#2242) · f5461d4d
  cheng cheng authored 5 years ago
```
* mem_shared_id -> mem_block_id;  mem_shared_off_set -> mem_block_offset; enable_mem_sharing->enable_reuse_mem

* memory version 2 step 1: replace original concept about mem sharing
```
  f5461d4d
- refine lazy adam (#2244) · 97d0d09f
  Shiyuan Shang-Guan authored 5 years ago
```
* refine lazy adam

* update
```
  97d0d09f
Sep 20, 2019

Dev non distributed optimizer js (#2234) · 2b7c50b0

Juncheng authored 5 years ago

* op&kernel&actor

* job

* job_completer

* graph

* format

* fix pd

* fix

* ignore DelPlacementByOpName

* fix auto tick

* JobBuilder

* fix

* config util

* fix

* fix opgrade

* broadcast tick

* fix allreduce

* balance by model size

* GetSoleOutBlobSize

* async_actor_msg_deque

* group

* AddOrMutOpsOnlyOnce

* fix NcclTupleBroadcastGrad

* order

* set nccl order hint

* op_conf

* grad hint

* NcclTupleBroadcastReduceSequencePass

* add missed mutops

* order fix

* try kMdUpdtArea

* fix nccl_order_hint

* fix

* add ti

* tuple_identity_op

* remove useless

* group

* fix dead lock

* force ctrl in

* sc broadcast

* sort obn

* group nccl

* config group_size_mbyte

* non_distributed_optimizer_group_size_mbyte

* format

* stop check

* rm message sending optimization

2b7c50b0

remove unused task related dot function (#2236) · 8782780a
Li Xinqi authored 5 years ago
```
* remove unused task related dot function

* do not output dot rank info
```
8782780a
more static_assert · d460b80e
lixinqi authored 5 years ago

d460b80e
Merge branch 'dev_python' of github.com:Oneflow-Inc/oneflow into dev_python · 68899d0a
lixinqi authored 5 years ago

68899d0a
seperate LOGICAL_BINARY_FUNC from ARITHMETIC_BINARY_FUNC · 772d2533
lixinqi authored 5 years ago

772d2533

Sep 19, 2019

Dev kernel launch synchronized (#2230) · 7ee0d54c
Juncheng authored 5 years ago
```
* IsKernelLaunchSynchronized

* virtual

* refine

* refine
```
7ee0d54c
refactor ndarray for eq/ne/... · 2d813ffb
lixinqi authored 5 years ago

2d813ffb
fix reduction_coefficient (#2228) · ff0b37bf
Juncheng authored 5 years ago

ff0b37bf
fix send msg (#2227) · 5e28336c
Juncheng authored 5 years ago

5e28336c

Merge wnd python (#2226) · 01f4c2d1

Li Xinqi authored 5 years ago

* not ready yet

* segment fix

* fix segment_sum bugs

* 1st wide_n_deep push

* Fix tick in multi node parallel (#2042)

* check in fixes

* fix by adding boxing method

* register tick op

* move code and add more check

* fix typo

* fix bug when filtering op nodes before adding tick

* fix wheel build not adding .so (#2052)

* color plan dot VERSION-2 (#2045)

* run sucessfully on single GPU

* fix 121 for tick (#2069)

* delete unncessary multiply_grad class

* speed up generate time for dot2svg (#2083)

* Add axis conf to bias_add for any axis channel (#2087)

* bias_add completion

* follow comment

* make conf axis required

* Revert "Add axis conf to bias_add for any axis channel (#2087)" (#2091)

This reverts commit 8679ce980ce8570bf927baeab8616ee7b93fac47.

* updated

* fix segment_sum_grad

* fix sbp

* fix segment_sum impl for data parallel

* fix

* remove useless code in segment_kernel_util.h

* add python interface

* fix sigmoid conf

* fix naming error

* fix typo

* temp mod loss sbp

* add LazyAdam

* Merge branch 'dev_python' of https://github.com/Oneflow-Inc/oneflow into dev_python_widedeep

* rm useless code

* unsorted_segment_sum

* refactor sigmoid_cross_entropy_loss_kernel to high performance

* Improve sigmoid cross entropy loss grad (#2207)

* remove for loop called cuda kernel

* minor fix

* ../oneflow/python/ops/data_ops.py (#2209)

* fix lazy_adam

* Merge wnd and python (#2214)

* rm ActivationType from op/kernel (#2205)

* refactor sigmoid_cross_entropy_loss

* fix SigmoidGrad::InferBatchAxis

* support part_name_prefix and part_name_suffix_length (#2208)

* rename: OutRemoteBlobsResultBox => OutRemoteBlobsStatus

* oneflow.watch for debug

* Dev decode batch size (#2206)

* rm batch_size and piece_size

* merge dev_python

* Update reshape_like_op.cpp (#2213)

* oneflow.parallel (#2211)

* oneflow.parallel

* refactor split_axis => parallel

* rename parallel => distribute

* fix typo: *Parallel => *Distribute

* add blob_desc.with_split_distribute(axis) and blob_desc.with_broadcast_distribute()

* merge dev_python

* fix boxing: P->S(0)

* check in docker build scripts (#2216)

* Dev python widedeep docker (#2218)

* check in docker build scripts

* check in .dockerignore

* rm oneflow.segment_sum

* remove segment_sum

* rm unused file

* rm debug code

* rm debug code

* rm double empty lines

* remove useless comments

01f4c2d1

Dev actor msg queue (#2225) · 6017512d
Juncheng authored 5 years ago
```
* async msg queue

* EnqueueAsyncMsg
```
6017512d

Sep 18, 2019
- sync with bert_benchamrk (#2221) · c4093468
  ShawnXuan authored 5 years ago
```
* sync with bert_benchamrk

* rename run.sh
```
  c4093468
- fix InferBatchAxis (#2220) · 989a3540
  Juncheng authored 5 years ago
  
  989a3540
Sep 17, 2019

remove parallel policy； rm FC/rnn/embedding look up op/kernel (#2215) · c014bf3e

cheng cheng authored 5 years ago

* remove parallel policy

* rm FC/rnn/embedding_look_up op/kernel

* add check data parallel for conv/layer_norm op

* bugfix: bias add + use math_add when batch size = 1

c014bf3e

Sep 16, 2019

add resnet50 in model (#2217) · 247170c1
scxfjiang authored 5 years ago

247170c1

docker build support (#2002) · bcd7621f

Shenghang Tsai authored 5 years ago

* update cmake files

* check in files

* Fix tick in multi node parallel (#2042)

* check in fixes

* fix by adding boxing method

* register tick op

* move code and add more check

* fix typo

* fix bug when filtering op nodes before adding tick

* shrink ctx size

* fix script

* fix wheel build

* fix wheel build not adding .so (#2052)

* lower cmake version bar

* rm more files

* keep build dir

* check in test bash script

* fix

* Dev docker sx (#2124)

* add python2 docker env

* rm old docker files

* update repository

* add ARG CUDA and USE_PYTHON_3_OR_2

* reform files

* update

* rm log doesn't print when there is cache

* use default arg in dockerfile

* better py 2 or 3 condition

* add default

* use if

* update alexnet

* update for bert

* 15->16

bcd7621f

fix warning: return string reference to temporary (#2212) · 719fb073
cheng cheng authored 5 years ago

719fb073

oneflow.parallel (#2211) · 0ca43f65

Li Xinqi authored 5 years ago

* oneflow.parallel

* refactor split_axis => parallel

* rename parallel => distribute

* fix typo: *Parallel => *Distribute

* add blob_desc.with_split_distribute(axis) and blob_desc.with_broadcast_distribute()

0ca43f65

Update reshape_like_op.cpp (#2213) · 01f9b069
Shenghang Tsai authored 5 years ago

01f9b069
Dev decode batch size (#2206) · 42967639
Li Xinqi authored 5 years ago
```
* rm batch_size and piece_size

* merge dev_python
```
42967639

Sep 15, 2019
- oneflow.watch for debug · 0d35327b
  lixinqi authored 5 years ago
  
  0d35327b
- rename: OutRemoteBlobsResultBox => OutRemoteBlobsStatus · 79a420a4
  lixinqi authored 5 years ago
  
  79a420a4
- support part_name_prefix and part_name_suffix_length (#2208) · bc34652f
  ShawnXuan authored 5 years ago
  
  bc34652f
- fix SigmoidGrad::InferBatchAxis · 6b6934c0
  lixinqi authored 5 years ago
  
  6b6934c0
- Merge branch 'dev_python' of github.com:Oneflow-Inc/oneflow into dev_python · c2b2ae7a
  lixinqi authored 5 years ago
  
  c2b2ae7a
- refactor sigmoid_cross_entropy_loss · b1c77ebc
  lixinqi authored 5 years ago
  
  b1c77ebc
Sep 14, 2019
- rm ActivationType from op/kernel (#2205) · 9aea89b9
  Juncheng authored 5 years ago
  
  9aea89b9
- rm blob header kLossInstanceNum (#2204) · d0e80c61
  Juncheng authored 5 years ago
  
  d0e80c61
- rm KernelIfWithModel KernelIfWithActivation (#2203) · cdc1fe88
  Juncheng authored 5 years ago
```
* remove KernelIfWithActivation

* remove KernelIfWithModel
```
  cdc1fe88