- Oct 01, 2018
-
-
Niu Chong authored
fix: add AsyncSednRegstMsgToConsumer() for send single produced regst, e.g. forward_model_regst (#1274) * fix(normal_model_update_compute_actor): fix send forward_model_regst_ to consumer * fix: add AsyncSednRegstMsgToConsumer() for send single produced regst, e.g. forward_model_regst
-
Shiyuan Shang-Guan authored
* refine cudnn_limit_buf * rename default_cudnn_buf_limit_mbyte -> cudnn_buf_limit_mbyte
-
Niu Chong authored
* fix(normal_forward_compute_actor): fix SendMsgToForwardModelSaveActor() * refine(normal_forward_compute_actor)
-
Jinhui Yuan authored
-
- Sep 30, 2018
-
-
Niu Chong authored
* feat(register_slot): add the RegstSlot * feat(register_slot): update RegstSlot if * feat(actor): update member of Actor to use RegstSlot * fix(register_slot): fix the available_regst_desc_cnt init val * refine(register_slot): rename PushBack/PopFront, FindTheRegstDescId to TryPushBack/TryPopFront, HasRegstDescId * feat(regst_slot): rename ForEachCurRegstDeq/ForEachCurFrontRegst to ForEachRegstDeq/ForEachFrontRegst * feat(regst_slot): add ForChosenRegstDeq/ForChosenFrontRegst, add CHECK empty in ForEachFrontRegst * fix(register_slot): fix the CHECK empty * feat: remove actual_writeable_regst_desc_id_ from Actor, add Naive/CustomizedProducedRegst * fix(normal_model_update_actor): bug: not send customized regst to consumer when SendIntialModel * fix(normal_forward_compute_actor): bug: not add kLoss/kAccuracy produced regst to NaiveProducedRegst * fix(actor): UNIMPLEMENTED() for AsyncSendCustomizedProducedRegstMsgToConsumer * fix(normal_forward_compute_actor): set const_buf_regst to nullptr when recv from consumers * fix(actor): total_reading_data_regst_cnt, not total_reading_ctrl_regst_cnt * refactor: update GetNaiveConsumedRegstDescName to GetNaiveOrCustomizedConsumedRegstDescName(same for Produced) * feat: combine data_regst and ctrl_regst in Actor * fix: fix bugs * fix: fix bugs * fix: remove .swp files and unused LOG * feat: split Act and SendMsg (#1255) * feat: split Act and SendMsg * refine: rename HandleProduced/ConsumedDataRegst.. to HandleProduced/ConsumedNaiveDatRegst.. * fix(input_wise_comp_actor): bug: not set piece id * fix(actor): potential bug: produced msg with no allowed actor still pop from queue * refactor: mv some protected member function to private * fix(actor): fix the condition about sending EORD msg * refactor(input_wise_actor): use RegstSlot in InputWiseActor * fix(copy_comm_net_actor): rename piece_id2regst_ctx to piece_id2regst_ctx_ * refactor: rename Name2RegstDescId to Name2RegstDescIds * refactor(naive_actor): "override final" instead of only "final" * refine(actor): little refine * feat: update the return type of GetNaiveOrCustomizedNamesRegstDescName to enum class RegstNameType
-
- Sep 26, 2018
-
-
Shiyuan Shang-Guan authored
* add lars set * add lars * override ibn&obn to lbi * make model update consistent * check cuda stream sync * add LARSUpdateModelGpu * checkout naive & momentum model update * use cublas::dot compute SumOfSquare * update lars for master * refine lars for master
-
binbinHan authored
* hinge_loss_kernel_test * fix opkernel_test * fix test file * optimize test file * opyimize opkernel test * complete opkernel test interface
-
- Sep 25, 2018
-
-
Juncheng authored
-
Jinhui Yuan authored
* remove useless Copy in device_context * fix cyclic and copy_to_local bug in binary_in_stream_with_local_copy
-
- Sep 24, 2018
-
-
Jinhui Yuan authored
* add nccl dependency * add nccl comm handle * nccl allreduce works * NcclAllreduce -> NcclAllReduce * fix header guard * add NcclReduceScatter, NcclAllGather * complete ReduceScatter and AllGather, (with cuda error) * change variable name * reduce-scatter, all-gather works * add NcclScatter and NcclGather work type * Dev use nccl add nccl comm manager (#1206) * add parallel_set_id * add nccl_comm_manager * log nccl comm create * use NcclCommMgr * bugfix * OF_DISALLOW_COPY_AND_MOVE * remove nccl_scatter_handle and nccl_gather_handle from DeviceCtx * remove nccl handles from cuda_stream_handle * nccl_util and GetNcclDataType * fix rank_num * fix rank_id fix rank_id * CudaCheck->NcclCheck * only GPU * PoorCompTaskNode SoleIn, SoleOut, SoleOp, SoleIbn, SoleObn * PoorCompTaskNode * reformat * format change * Dev use nccl merge reduce share mem (#1216) * add parallel_set_id * add nccl_comm_manager * log nccl comm create * use NcclCommMgr * bugfix * OF_DISALLOW_COPY_AND_MOVE * remove nccl_scatter_handle and nccl_gather_handle from DeviceCtx * remove nccl handles from cuda_stream_handle * nccl_util and GetNcclDataType * fix rank_num * fix rank_id fix rank_id * CudaCheck->NcclCheck * only GPU * PoorCompTaskNode SoleIn, SoleOut, SoleOp, SoleIbn, SoleObn * PoorCompTaskNode * reformat * ReduceGather * GlobalAdd * ReduceScatter * EnableIfNeed * ConcatSplit * EnableMemSharing for pred if need EnableMemSharing for pred if need * CtrlEdge for Gather * CtrlEdge for GlobalAdd * LocalAdd CtrlEdge * CollectReduceTaskNode * reverse nodes * local_add_mem_sharing local add mem sharing * global add mem sharing * reduce_mem_sharing * bugfix * refine * format change (remove empty lines) * format change * fix local_add and gather issues * Dev refactor reduce add (#1218) * change ReduceGlobalAdd to ReduceAdd * rm ReduceLocalAdd * no mem sharing case works * let ReduceAddCompActor decide whether it is local or global * multi machine multi gpus Nccl and Oneflow allreduce works * refine * extract SortEdges * make EdgeInfo protected * Dev use nccl refine (#1220) * const qualifier * PoorCompTaskNode=>PipeCompTaskNode * int=>int32_t * refine ReduceMemSharingCtx * NcclDeviceCtx and NcclActor NcclDeviceCtx and NcclActor * empty line * CudaDeviceCtx<-NcclDeviceCtx * fix wrong rank_id in reduce_add_actor (#1229) * fix wrong rank_id in reduce_add_actor * rm device_num_of_each_machine from parallel_ctx * fix reduce gather control edge (#1235) * fix reduce gather control edge * extract FindNearestReduceAddCompTaskNode * extract method ReduceCompTaskNodeIf::FindPredRduceTaskNodeIf * CHECK nearest_add_copy_d2h * Dev use nccl cross machine nccl all reduce (#1246) * support ncclAllReduce cross machine * fix rank_id and rank_num for mix * reformat * reformat * simplify nccl_kernel (#1256) * simplify REGISTER_BLD_SUB_TSK_GPH_MTHD (#1260) * simplify REGISTER_BLD_SUB_TSK_GPH_MTHD * note * Dev use nccl reduce ranking ctx (#1252) * reformat * compute rank_id and rank_num with FixCompTaskNode * reformat * fix rank_id for reduceadd * ReduceRankingCtx * New Ranking and MemSharing for Reduce * DECLARE_REDUCE_LOGICAL_NODE * Ranking4NcclAllReduce * fix ranking * remove AsTaskNode * reformat * runtime rank ctx * rank_set * bugfix * bugfix * unittest * change use_nccl_all_reduce_cross_machine to use_nccl_inter_node_communication * refine refine * move BuildCtrlRegstBetweenReduceCopyNodes to ReduceAddCompTaskNode * CHECK mem_size_
-
- Sep 23, 2018
-
-
Niu Chong authored
-
- Sep 19, 2018
-
-
Li Xinqi authored
-
Shiyuan Shang-Guan authored
-
- Sep 18, 2018
-
-
Li Xinqi authored
* define_test_blob * decode random compute task node * rename define_test_blob_conf.name => define_test_blob_conf.out * decode random task node color
-
- Sep 17, 2018
-
-
Li Xinqi authored
* moving model * moving_model => forward_model * add todo commit * two model save node * let md_updt actor handle forward_model * remove useless code * rename local variable
-
Shiyuan Shang-Guan authored
* refine model update conf * make todo * add primary_lr and secondary_lr
-
scxfjiang authored
-
Juncheng authored
* add enum ChannelStatus * merge CloseSendEnd and CloseReceiveEnd * update channel_test
-
Jinhui Yuan authored
* only master machine saves plan and has event logger * separate Data, Persistence, Cache, Log FileSystem config * refine * only specify data and snapshot path conf * forbit multiple machines use localfs as snapshot fs * networkfs as localfs * refine * Store log to snapshot (#1109) * use machine id, drop machine name * ensure setting machine id * allow save snapshot to localfs for distributed training (#1113) * Snapshot to master (#1116) * allow save snapshot to localfs for distributed training * fix mdSave to master for model parallel * fix review comment issues * add sanity check for machine id * rm useless comments * update example * Dev refine runtime add log stream mgr (#1142) * add LogStreamMgr * refine and refactor OutStream=>LogStream * bugfix * use LogStreamMgr to write graph, dot, plan, profile and proto * refine * simplify, remove LogStreamMgr (#1243) * simplify, remove LogStreamMgr * TeePersistentLogStream add static factory (#1244)
-
cheng cheng authored
* fix bug of forward model -> copyD2H conflict with out regst * use 1 line
-
- Sep 16, 2018
- Sep 15, 2018
-
-
Li Xinqi authored
-
Shiyuan Shang-Guan authored
* make each blob of the packed blob be updated separately in the ModelUpdate * make blob descs in regst be consistent in bw->md_diff_acc->shared_md_diff_add->md_update->fw * copy lbi2blob_descs from model * add shared_model_diff_add kernel * refine model_update actor and kernel * rm useless TODO * add shared_model_diff_add kernel * refine code
-
- Sep 14, 2018
-
-
Shiyuan Shang-Guan authored
-
Li Xinqi authored
* enable dptr<T>(...) if T is not void * simplify dptr(...) by parameter packing
-
- Sep 13, 2018
-
-
Li Xinqi authored
-
- Sep 10, 2018
-
-
Niu Chong authored
-
Jinhui Yuan authored
-
- Sep 09, 2018
-
-
Jinhui Yuan authored
-
- Sep 07, 2018
-
-
Niu Chong authored
* feat(register_slot): add the RegstSlot * feat(register_slot): update RegstSlot if * feat(actor): update member of Actor to use RegstSlot * fix(register_slot): fix the available_regst_desc_cnt init val * refine(register_slot): rename PushBack/PopFront, FindTheRegstDescId to TryPushBack/TryPopFront, HasRegstDescId * feat(regst_slot): rename ForEachCurRegstDeq/ForEachCurFrontRegst to ForEachRegstDeq/ForEachFrontRegst * feat(regst_slot): add ForChosenRegstDeq/ForChosenFrontRegst, add CHECK empty in ForEachFrontRegst * fix(register_slot): fix the CHECK empty
-
Jinhui Yuan authored
* add ReduceScatter2, ReduceAdd2, ReduceGather2 op and kernel * add ReduceScatter2, ReduceAdd2, ReduceGather2 task node and actor * complete Reduce2 op * TODO: complete ReduceAdd2 kernel * add ReduceScatter2 task to accept model_diff * sketch of connecting ReduceScatter2/Add2/Gather2 * build allreduce2 logical graph * connect allreduce2 task graph * ReduceScatter2 task node * complete ReduceAdd2, ReduceGather2 task node * simplify ReduceAdd2 actor * refactor ReduceAdd2 task node * let global add -> gather share path * separate ReduceLocalAdd2 and ReduceGlobalAdd2 * connect AllReduce2 task graph * complete ReduceGlobalAdd2 op * refine ReduceLocalAdd2 task node * complete ReduceGlobalAdd2 task node * global AllReduce2 works * add device_num_of_each_machine to parallel_context * simplify ReduceGlobalAdd2 runtime * multi machine multi gpus AllReduce2 works * add mem sharing and ctrl edge for AllReduce2 * single machine multiple gpu mem sharing works * refine * remove the previous allreduce * change AllReduce2 to AllReduce variable convention * change filename * complete transfer to allreduce2 * remove unnecessary format change * remove unnecessary format change * simplify * simplify mem sharing rule for reduce add and gather * check for local add * fix reduce_global_add actor bug * refine reduce task node * refine variable name * refine * refine
-
Jinhui Yuan authored
-
- Sep 06, 2018
-
-
guo ran authored
-
- Sep 04, 2018
-
-
binbinHan authored
* add hinge loss * add hinge loss test * hack hinge loss * optimize hinge loss * optimize hinge loss * optimize hinge loss * optimize hinge loss
-
binbinHan authored
* add matmul & dot & multiply * optimize dot kernel * fix multiply kernel code style * optimize matmul kernel
-
Li Xinqi authored
-
binbinHan authored
* add embedding look up infer blob desc * optimize inifer blob desc
-
binbinHan authored
* add hinge loss * add hinge loss test * hack hinge loss * optimize hinge loss * optimize hinge loss * optimize hinge loss * optimize hinge loss
-
- Sep 03, 2018
-
-
Li Xinqi authored
-