Commits · 3654d1645efa15fb9ee351140fb2333f1a5a92ec · Summer2021 / 210130133

Aug 19, 2018
- fix wrong ActNum of ctrl regst produced by AccCompActor (#1136) · 3654d164
  Jinhui Yuan authored 6 years ago
  
  3654d164
- refine act_id order condition (#1088) · 5be84c50
  Jinhui Yuan authored 6 years ago
```
* refine act_id order condition

* strict act id check (excluding model regst)

* add TODO: figure out the ActNumForEachOutput of model regsts to MdSave area
```
  5be84c50
- remove blob_inited check (#1130) · 40a9b9a5
  Jinhui Yuan authored 6 years ago
```
* remove blob_inited check

* fix inplace feature of reduce add actor and kernel

* rm useless code

* add EnableInplace, support CPU allreduce
```
  40a9b9a5
- fix blob mem sharing issues (#1134) · 2e8f33fc
  Jinhui Yuan authored 6 years ago
  
  2e8f33fc
Aug 18, 2018
- set blob not share mem by default (#1133) · 980b0dc3
  Jinhui Yuan authored 6 years ago
  
  980b0dc3
- refine chain_graph construtor (#1131) · 945139d9
  strickland12 authored 6 years ago
```
* refine task_graph construtor

* use const qualifier

* add ordered_chain_nodes_
```
  945139d9
- use std::vector<std::bitset<N>> (#1129) · e5b4d790
  strickland12 authored 6 years ago
  
  e5b4d790
- enable in_diff blobs of bw_add share mem (#1120) · 839d1b7d
  Jinhui Yuan authored 6 years ago
```
* enable in_diff blobs of bw_add share mem

* use default operator= for BlobDesc

* blob's mem_shared_id -> blob_mem_id

* minor refine

* move mem_case util to memory_allocator.h

* refine blob mem sharing

* refine compute packed blob

* add operator= for blob_desc
```
  839d1b7d
Aug 17, 2018

remove boxing/121 out regst (#1126) · ce5aa7fb

cheng cheng authored 6 years ago

* BldSubGrpBy 121/boxing use same buf task

* remove 121 boxing regst from loss/decode compute task node

* remove B121 regst in fw/bw/compute task node, remove lbi_121/boxing in logical node

* remove b121

* fix bug in bw task node

ce5aa7fb

refine ChainGraph (#1123) · 8616e0d4
strickland12 authored 6 years ago
```
* rm TryMerge

* rm extra loop in CollectAncestorsForEachNode
```
8616e0d4

Aug 16, 2018
- fix: rename UseCudnnOnGpu&UseCudnn to EnableCudnn&DevIsGpuAndEnableCudnn (#1121) · 10e08d66
  Niu Chong authored 6 years ago
  
  10e08d66
- warmup (#1112) · 02629a55
  Shiyuan Shang-Guan authored 6 years ago
```
* add warmup

* add constant warmup

* rename cur_batch_num

* make warmup and lr decay not work together

* WarmupOnset -> TriggerWarmup
```
  02629a55
- fix bug of normalization load model from snapshot (#1119) · e74bb463
  Shiyuan Shang-Guan authored 6 years ago
  
  e74bb463
Aug 15, 2018
- Remove redundant calculation and refine base_ii (#1104) · 953e1aec
  strickland12 authored 6 years ago
```
* refine and move CalcBaseII to ChainActGraph

* use MainByteSize4OneRegst()

* use ReverseDfs

* use shared and unshared

* use regst_act_group

* use SameFakeProducerOutsRegstActGroup

* avoid use HashMap<std::list<T*>>

* use std::set instead of HashSet

* change variant name

* minor refine of variable name
```
  953e1aec
- start mem shared id (#1118) · ce707908
  Li Xinqi authored 6 years ago
  
  ce707908
- fix: fix the enable_mem_sharing field of reduce regsts (#1117) · 4d573e18
  Niu Chong authored 6 years ago
  
  4d573e18
- add enable_write_snapshot option (#1111) · f8561470
  Jinhui Yuan authored 6 years ago
  
  f8561470
Aug 13, 2018

feat: update reduce kernel to support inplace compute (#1105) · e15fea94

Niu Chong authored 6 years ago

* feat(reduce_scatter_kernel): reduce scatter kernel is as before for cpu, and do nothing for gpu

* feat(reduce_local_add_kernel): update reduce_local_add kernel for in-place on GPU

* feat: update reduce_local_add actor to support inplace kernel

* feat: add support for inplace in reduce_global_add/gather

* fix: use ibn.substr(3) to specify inplace in_blob other than in_bn_id

e15fea94

Aug 12, 2018

feat: enable mem sharing in ReduceStruct (#1097) · 25ddb833

Niu Chong authored 6 years ago

* feat(register_desc): add mem_shared_offset

* feat(regst_desc): set the default val of mem_shared_offset as -1

* feat(register_manager): add offset in register_manager

* feat: add EnableMemSharingInReduceStruct(), set the mem_shared_id/offset of reduce regst

* feat(task_graph): AddCtrlEdge4MemSharingInOneReduce()

* feat: set mem_shared of regst produced by reduce/copy task nodes as true

* refactor(copy_task_node): regsts produced by CopyH2D and consumed by reduce_task are able to share mem, others stays unable

* fix(copy_task_node): not use SoleOutEdge() for copy_task_node

* fix(task_graph): fix the compile bug

* feat: set min/max regst_num of regst produced by reduce/copy task_node as 1

* feat: support mem_shared_offset in improver

* fix(copy_task_node): set max_regst_num of copy node succeed reduce nodes as 1

* refactor(task_graph): remove FindSuccReduceTaskNode() from member function, just as lambda function

* refactor(task_graph): refine EnableMemSharingInOneReduce()

* fix(reduce_scatter_task_node): fix the bug of wrong out_regst_name when parallel_num==machine_num

* refine: refine due to comment

25ddb833

Aug 08, 2018
- remove Useless Consume Relationship Between Fw Bw (#1093) · 9665137b
  cheng cheng authored 6 years ago
```
* RmUselessConsumeRelationshipBetweenFwBw impl

* add need in/out blob when backward in ops

* remove NormalBackwardActor rely to out regst desc id

* remove log and fix warning

* fix code
```
  9665137b
- fix packed blob shape check (#1095) · c475da78
  Jinhui Yuan authored 6 years ago
  
  c475da78
Aug 06, 2018

Seperate thread (#1045) · 18bda5ca

Jinhui Yuan authored 6 years ago

* add tuple switch;

* 分离record_load和loss_print线程，动态分配线程id;
让persistence，comm_net线程和其创建顺序解耦.

* seperate persistence thread; dynamically setting the thread id of the persistence tasknode and creating persistence threads.

* improve seperate thread

* improve algorithm; add unit test;

* simplify algorithm

* merge master

* solve conflict

* add braces for if statement

* improve code

* change persitence_work_num to max_mdsave_work_num; add CHECK; add a unique function(remove duplicated elements) into util.h

* use std::unique; improve code; update unit test.

* simplify code: use std::set to unique; replace "+=" into "=";

18bda5ca

Aug 04, 2018

fix: 1,4d packed_blob 2, collect_act_event (#1087) · 6cb2ef86
Jinhui Yuan authored 6 years ago

6cb2ef86
ensure the order of regst producing and consuming (#1085) · af808c6e
Jinhui Yuan authored 6 years ago

af808c6e
refine collect active event pre-condition (#1084) · e3f87ee3
Jinhui Yuan authored 6 years ago

e3f87ee3
fixed MdSave Op DeviceType (#1086) · 0410f099
strickland12 authored 6 years ago

0410f099

feat: modify the allreduce to build base for supportting in-place reduce op (#1080) · f8c88f27

Niu Chong authored 6 years ago

* feat(task_graph.cpp): reconnect the edge between ReduceLocalAdd and ReduceGlobalAdd

* feat: round up the packed_blob_desc of md_diff_regst to parallel_num

* feat: round up the packed_blob_desc of md_diff_acc regst, and remove MutPackedBlobDesc() interface from RegstDesc

* fix(task_graph.cpp): remove the CHECK of same machine when building edge between ReduceScatter and ReduceGlobalAdd

* feat(task_graph.cpp): connect same pair of ReduceScatter and ReduceLocalAdd with duplicate edges

* refactor(reduce_scatter_comp_task_node): update the bind of produced_regst and obn

* chore: add some check

* feat(reduce_scatter_op): update the InferBlobDescs()

* fix(reduce_scatter_compute_task_node): fix the bug of set produced out regst name

* refactor(reduce_scatter_comp_task_node): rename the produced out regst

* fix(reduce_scatter_comp_task_node): fix the CHECK macro

* refine(reduce_scatter_op): refine the CHECK macro

* feat: udpate reduce_local_add tasknode/op/op_conf to support new connecting

* feat: update the actor for reduce and reduce_local kernel

* fix: cannot find lbi in reduce_local_add, so identify regsts by name_in_producer

* fix(normal_backward_comp_task_node): consider normal backward with no model_diff

* fix(input_wise_compute_actor): fix the bug of update member status after send to consumer

* fix merge conflicts

* refactor: deduct the relation between ibn and in_regst in Build() other than ConsumeRegst() for ReduceLocalAdd

* refine: refine the CHECK macro

f8c88f27

Aug 03, 2018

Dev face cudnn conv test (#1066) · deed0655

Jinhui Yuan authored 6 years ago

* modify the cudnn algorithm selction API

* remove the redundant setting of bwd_data

* modify the conv algo seletion API to the *Ex

* modify the cudnn max workspace to 1g

* pre-allocate work space

* fix empty fw_cudnn_buf issue

deed0655

Aug 02, 2018

rm IsBwClone (#1078) · 8c895ee9
cheng cheng authored 6 years ago

8c895ee9

Dev blob header cpu (#1056) · 8d2daef3

chengtbf authored 6 years ago

* MemBlobDesc

* refine blob runtime

* refine copyHd clone reshape kernel

* move blob header to cpu

* refine RoundUp of Blob size

* fix bug of runtime regst desc packed blob

* refine (#1059)

* add loss test data id note

* Dev blob header cpu3 (#1062)

* add blob body desc

* make blob body desc work

* make blob header desc work

* rm MemBlobDesc

* add CellDesc

* add is_packed

* add chunk desc

* rm CellDesc

* chunk desc -> field desc

* add RtBlobDesc

* refine

* refine

* body_desc -> body_field

* Infer header field desc in BlobDesc.ToProto()

* refine

* RuntimeBlobDesc constructor

* runtime uses RtBlobDesc workable, still has data_id bug

* make some accessors of BlobDesc private

* rm header_byte_size

* fix stupid bug

* refine is_packed -> header_is_packed

* header_is_packed -> header_is_opaque

* refactor BlobDesc.Proto

* add constructor BlobDesc -> RtBlobDesc

* refactor, remove useless code

* refine

* fix bug of regst manager allocate separated mem

* refine blob_desc constructer

* remove BlobDesc(const BLobDescProto&)

* refine code

* delete test code

* remove useless code

* add BlobDesc()from proto

* make opaque header mutual exclusive (#1077)

* make opaque header mutual exclusive

* refine kernel.proto

* has_data_id -> need_do_data_id

8d2daef3

refine activation and remove RemoveBackwardAdd (#1076) · ed46b7c2
chengtbf authored 6 years ago
```
* half impl

* refine activation

* kernel if with activation

* remove virtual of infer if
```
ed46b7c2
fix empty bw_cudnn_buf (#1075) · b23d18f0
Jinhui Yuan authored 6 years ago

b23d18f0
fix fw_buf of reduce_sum op (#1074) · bdceef38
Jinhui Yuan authored 6 years ago

bdceef38

add fw/bw buf blob，remove thread/device buf (#1072) · c8853f81

chengtbf authored 6 years ago

* half impl

* op inferbwbufblobdesc

* fw_buf bw_buf regst

* buf_blob instead of thread_buf for conv_op

* cudnn fw bw buf runnable

* fw bw buf for softmax

* remove buf of softmax loss

* remove buf_size of reduce_sum

* remove thread buf & device buf

* use fw_buf for reduce_sum_op fw_tmp blob

c8853f81

add hack regst num only for CopyHd instead of kMdUpdtArea (#1073) · 51962720
chengtbf authored 6 years ago

51962720

Aug 01, 2018
- add produced_ctrl_regst2reading_cnt_ (#1067) · 82f6d43e
  Jinhui Yuan authored 6 years ago
  
  82f6d43e
Jul 28, 2018
- use BfsTopo instead. DfsTopo causes issues in ReduceStruct (#1058) · 1bc6a8ce
  Jinhui Yuan authored 6 years ago
```
* use BfsTopo instead. DfsTopo causes issues in ReduceStruct

* add comment
```
  1bc6a8ce
Jul 27, 2018
- Add safe check (#1055) · 874dff3e
  Jinhui Yuan authored 6 years ago
```
* remove staleness

* ensure balanced placement for data and model parallel

* add device_num_of_each_machine
```
  874dff3e
- fix bug of op infer data id (#1054) · f9473e22
  chengtbf authored 6 years ago
  
  f9473e22
Jul 25, 2018
- add jpeg encoder (#1052) · 098db839
  chengtbf authored 6 years ago
  
  098db839