- Aug 19, 2018
-
-
Jinhui Yuan authored
-
Jinhui Yuan authored
* refine act_id order condition * strict act id check (excluding model regst) * add TODO: figure out the ActNumForEachOutput of model regsts to MdSave area
-
Jinhui Yuan authored
* remove blob_inited check * fix inplace feature of reduce add actor and kernel * rm useless code * add EnableInplace, support CPU allreduce
-
Jinhui Yuan authored
-
- Aug 18, 2018
-
-
Jinhui Yuan authored
-
strickland12 authored
* refine task_graph construtor * use const qualifier * add ordered_chain_nodes_
-
strickland12 authored
-
Jinhui Yuan authored
* enable in_diff blobs of bw_add share mem * use default operator= for BlobDesc * blob's mem_shared_id -> blob_mem_id * minor refine * move mem_case util to memory_allocator.h * refine blob mem sharing * refine compute packed blob * add operator= for blob_desc
-
- Aug 17, 2018
-
-
cheng cheng authored
* BldSubGrpBy 121/boxing use same buf task * remove 121 boxing regst from loss/decode compute task node * remove B121 regst in fw/bw/compute task node, remove lbi_121/boxing in logical node * remove b121 * fix bug in bw task node
-
strickland12 authored
* rm TryMerge * rm extra loop in CollectAncestorsForEachNode
-
- Aug 16, 2018
-
-
Niu Chong authored
-
Shiyuan Shang-Guan authored
* add warmup * add constant warmup * rename cur_batch_num * make warmup and lr decay not work together * WarmupOnset -> TriggerWarmup
-
Shiyuan Shang-Guan authored
-
- Aug 15, 2018
-
-
strickland12 authored
* refine and move CalcBaseII to ChainActGraph * use MainByteSize4OneRegst() * use ReverseDfs * use shared and unshared * use regst_act_group * use SameFakeProducerOutsRegstActGroup * avoid use HashMap<std::list<T*>> * use std::set instead of HashSet * change variant name * minor refine of variable name
-
Li Xinqi authored
-
Niu Chong authored
-
Jinhui Yuan authored
-
- Aug 13, 2018
-
-
Niu Chong authored
* feat(reduce_scatter_kernel): reduce scatter kernel is as before for cpu, and do nothing for gpu * feat(reduce_local_add_kernel): update reduce_local_add kernel for in-place on GPU * feat: update reduce_local_add actor to support inplace kernel * feat: add support for inplace in reduce_global_add/gather * fix: use ibn.substr(3) to specify inplace in_blob other than in_bn_id
-
- Aug 12, 2018
-
-
Niu Chong authored
* feat(register_desc): add mem_shared_offset * feat(regst_desc): set the default val of mem_shared_offset as -1 * feat(register_manager): add offset in register_manager * feat: add EnableMemSharingInReduceStruct(), set the mem_shared_id/offset of reduce regst * feat(task_graph): AddCtrlEdge4MemSharingInOneReduce() * feat: set mem_shared of regst produced by reduce/copy task nodes as true * refactor(copy_task_node): regsts produced by CopyH2D and consumed by reduce_task are able to share mem, others stays unable * fix(copy_task_node): not use SoleOutEdge() for copy_task_node * fix(task_graph): fix the compile bug * feat: set min/max regst_num of regst produced by reduce/copy task_node as 1 * feat: support mem_shared_offset in improver * fix(copy_task_node): set max_regst_num of copy node succeed reduce nodes as 1 * refactor(task_graph): remove FindSuccReduceTaskNode() from member function, just as lambda function * refactor(task_graph): refine EnableMemSharingInOneReduce() * fix(reduce_scatter_task_node): fix the bug of wrong out_regst_name when parallel_num==machine_num * refine: refine due to comment
-
- Aug 08, 2018
-
-
cheng cheng authored
* RmUselessConsumeRelationshipBetweenFwBw impl * add need in/out blob when backward in ops * remove NormalBackwardActor rely to out regst desc id * remove log and fix warning * fix code
-
Jinhui Yuan authored
-
- Aug 06, 2018
-
-
Jinhui Yuan authored
* add tuple switch; * 分离record_load和loss_print线程,动态分配线程id; 让persistence,comm_net线程和其创建顺序解耦. * seperate persistence thread; dynamically setting the thread id of the persistence tasknode and creating persistence threads. * improve seperate thread * improve algorithm; add unit test; * simplify algorithm * merge master * solve conflict * add braces for if statement * improve code * change persitence_work_num to max_mdsave_work_num; add CHECK; add a unique function(remove duplicated elements) into util.h * use std::unique; improve code; update unit test. * simplify code: use std::set to unique; replace "+=" into "=";
-
- Aug 04, 2018
-
-
Jinhui Yuan authored
-
Jinhui Yuan authored
-
Jinhui Yuan authored
-
strickland12 authored
-
Niu Chong authored
* feat(task_graph.cpp): reconnect the edge between ReduceLocalAdd and ReduceGlobalAdd * feat: round up the packed_blob_desc of md_diff_regst to parallel_num * feat: round up the packed_blob_desc of md_diff_acc regst, and remove MutPackedBlobDesc() interface from RegstDesc * fix(task_graph.cpp): remove the CHECK of same machine when building edge between ReduceScatter and ReduceGlobalAdd * feat(task_graph.cpp): connect same pair of ReduceScatter and ReduceLocalAdd with duplicate edges * refactor(reduce_scatter_comp_task_node): update the bind of produced_regst and obn * chore: add some check * feat(reduce_scatter_op): update the InferBlobDescs() * fix(reduce_scatter_compute_task_node): fix the bug of set produced out regst name * refactor(reduce_scatter_comp_task_node): rename the produced out regst * fix(reduce_scatter_comp_task_node): fix the CHECK macro * refine(reduce_scatter_op): refine the CHECK macro * feat: udpate reduce_local_add tasknode/op/op_conf to support new connecting * feat: update the actor for reduce and reduce_local kernel * fix: cannot find lbi in reduce_local_add, so identify regsts by name_in_producer * fix(normal_backward_comp_task_node): consider normal backward with no model_diff * fix(input_wise_compute_actor): fix the bug of update member status after send to consumer * fix merge conflicts * refactor: deduct the relation between ibn and in_regst in Build() other than ConsumeRegst() for ReduceLocalAdd * refine: refine the CHECK macro
-
- Aug 03, 2018
-
-
Jinhui Yuan authored
* modify the cudnn algorithm selction API * remove the redundant setting of bwd_data * modify the conv algo seletion API to the *Ex * modify the cudnn max workspace to 1g * pre-allocate work space * fix empty fw_cudnn_buf issue
-
- Aug 02, 2018
-
-
cheng cheng authored
-
chengtbf authored
* MemBlobDesc * refine blob runtime * refine copyHd clone reshape kernel * move blob header to cpu * refine RoundUp of Blob size * fix bug of runtime regst desc packed blob * refine (#1059) * add loss test data id note * Dev blob header cpu3 (#1062) * add blob body desc * make blob body desc work * make blob header desc work * rm MemBlobDesc * add CellDesc * add is_packed * add chunk desc * rm CellDesc * chunk desc -> field desc * add RtBlobDesc * refine * refine * body_desc -> body_field * Infer header field desc in BlobDesc.ToProto() * refine * RuntimeBlobDesc constructor * runtime uses RtBlobDesc workable, still has data_id bug * make some accessors of BlobDesc private * rm header_byte_size * fix stupid bug * refine is_packed -> header_is_packed * header_is_packed -> header_is_opaque * refactor BlobDesc.Proto * add constructor BlobDesc -> RtBlobDesc * refactor, remove useless code * refine * fix bug of regst manager allocate separated mem * refine blob_desc constructer * remove BlobDesc(const BLobDescProto&) * refine code * delete test code * remove useless code * add BlobDesc()from proto * make opaque header mutual exclusive (#1077) * make opaque header mutual exclusive * refine kernel.proto * has_data_id -> need_do_data_id
-
chengtbf authored
* half impl * refine activation * kernel if with activation * remove virtual of infer if
-
Jinhui Yuan authored
-
Jinhui Yuan authored
-
chengtbf authored
* half impl * op inferbwbufblobdesc * fw_buf bw_buf regst * buf_blob instead of thread_buf for conv_op * cudnn fw bw buf runnable * fw bw buf for softmax * remove buf of softmax loss * remove buf_size of reduce_sum * remove thread buf & device buf * use fw_buf for reduce_sum_op fw_tmp blob
-
chengtbf authored
-
- Aug 01, 2018
-
-
Jinhui Yuan authored
-
- Jul 28, 2018
-
-
Jinhui Yuan authored
* use BfsTopo instead. DfsTopo causes issues in ReduceStruct * add comment
-
- Jul 27, 2018
-
-
Jinhui Yuan authored
* remove staleness * ensure balanced placement for data and model parallel * add device_num_of_each_machine
-
chengtbf authored
-
- Jul 25, 2018
-
-
chengtbf authored
-