Skip to content
Snippets Groups Projects
  1. Oct 01, 2018
  2. Sep 30, 2018
    • Niu Chong's avatar
      Refactor Actor (#1259) · e042befc
      Niu Chong authored
      * feat(register_slot): add the RegstSlot
      
      * feat(register_slot): update RegstSlot if
      
      * feat(actor): update member of Actor to use RegstSlot
      
      * fix(register_slot): fix the available_regst_desc_cnt init val
      
      * refine(register_slot): rename PushBack/PopFront, FindTheRegstDescId to TryPushBack/TryPopFront, HasRegstDescId
      
      * feat(regst_slot): rename ForEachCurRegstDeq/ForEachCurFrontRegst to ForEachRegstDeq/ForEachFrontRegst
      
      * feat(regst_slot): add ForChosenRegstDeq/ForChosenFrontRegst, add CHECK empty in ForEachFrontRegst
      
      * fix(register_slot): fix the CHECK empty
      
      * feat: remove actual_writeable_regst_desc_id_ from Actor, add Naive/CustomizedProducedRegst
      
      * fix(normal_model_update_actor): bug: not send customized regst to consumer when SendIntialModel
      
      * fix(normal_forward_compute_actor): bug: not add kLoss/kAccuracy produced regst to NaiveProducedRegst
      
      * fix(actor): UNIMPLEMENTED() for AsyncSendCustomizedProducedRegstMsgToConsumer
      
      * fix(normal_forward_compute_actor): set const_buf_regst to nullptr when recv from consumers
      
      * fix(actor): total_reading_data_regst_cnt, not total_reading_ctrl_regst_cnt
      
      * refactor: update GetNaiveConsumedRegstDescName to GetNaiveOrCustomizedConsumedRegstDescName(same for Produced)
      
      * feat: combine data_regst and ctrl_regst in Actor
      
      * fix: fix bugs
      
      * fix: fix bugs
      
      * fix: remove .swp files and unused LOG
      
      * feat: split Act and SendMsg (#1255)
      
      * feat: split Act and SendMsg
      
      * refine: rename HandleProduced/ConsumedDataRegst.. to HandleProduced/ConsumedNaiveDatRegst..
      
      * fix(input_wise_comp_actor): bug: not set piece id
      
      * fix(actor): potential bug: produced msg with no allowed actor still pop from queue
      
      * refactor: mv some protected member function to private
      
      * fix(actor): fix the condition about sending EORD msg
      
      * refactor(input_wise_actor): use RegstSlot in InputWiseActor
      
      * fix(copy_comm_net_actor): rename piece_id2regst_ctx to piece_id2regst_ctx_
      
      * refactor: rename Name2RegstDescId to Name2RegstDescIds
      
      * refactor(naive_actor): "override final" instead of only "final"
      
      * refine(actor): little refine
      
      * feat: update the return type of GetNaiveOrCustomizedNamesRegstDescName to enum class RegstNameType
      e042befc
  3. Sep 26, 2018
    • Shiyuan Shang-Guan's avatar
      add impl of lars (#1163) · 9518970b
      Shiyuan Shang-Guan authored
      * add lars set
      
      * add lars
      
      * override ibn&obn to lbi
      
      * make model update consistent
      
      * check cuda stream sync
      
      * add LARSUpdateModelGpu
      
      * checkout naive & momentum model update
      
      * use cublas::dot compute SumOfSquare
      
      * update lars for master
      
      * refine lars for master
      9518970b
    • binbinHan's avatar
      Hinge loss test (#1263) · 7faf75a6
      binbinHan authored
      * hinge_loss_kernel_test
      
      * fix opkernel_test
      
      * fix test file
      
      * optimize test file
      
      * opyimize opkernel test
      
      * complete opkernel test interface
      7faf75a6
  4. Sep 25, 2018
  5. Sep 24, 2018
    • Jinhui Yuan's avatar
      Dev use nccl (#1198) · 55496813
      Jinhui Yuan authored
      * add nccl dependency
      
      * add nccl comm handle
      
      * nccl allreduce works
      
      * NcclAllreduce -> NcclAllReduce
      
      * fix header guard
      
      * add NcclReduceScatter, NcclAllGather
      
      * complete ReduceScatter and AllGather, (with cuda error)
      
      * change variable name
      
      * reduce-scatter, all-gather works
      
      * add NcclScatter and NcclGather work type
      
      * Dev use nccl add nccl comm manager (#1206)
      
      * add parallel_set_id
      
      * add nccl_comm_manager
      
      * log nccl comm create
      
      * use NcclCommMgr
      
      * bugfix
      
      * OF_DISALLOW_COPY_AND_MOVE
      
      * remove nccl_scatter_handle and nccl_gather_handle from DeviceCtx
      
      * remove nccl handles from cuda_stream_handle
      
      * nccl_util and GetNcclDataType
      
      * fix rank_num
      
      * fix rank_id
      
      
      fix rank_id
      
      * CudaCheck->NcclCheck
      
      * only GPU
      
      * PoorCompTaskNode
      
      SoleIn, SoleOut, SoleOp, SoleIbn, SoleObn
      
      * PoorCompTaskNode
      
      * reformat
      
      * format change
      
      * Dev use nccl merge reduce share mem (#1216)
      
      * add parallel_set_id
      
      * add nccl_comm_manager
      
      * log nccl comm create
      
      * use NcclCommMgr
      
      * bugfix
      
      * OF_DISALLOW_COPY_AND_MOVE
      
      * remove nccl_scatter_handle and nccl_gather_handle from DeviceCtx
      
      * remove nccl handles from cuda_stream_handle
      
      * nccl_util and GetNcclDataType
      
      * fix rank_num
      
      * fix rank_id
      
      
      fix rank_id
      
      * CudaCheck->NcclCheck
      
      * only GPU
      
      * PoorCompTaskNode
      
      SoleIn, SoleOut, SoleOp, SoleIbn, SoleObn
      
      * PoorCompTaskNode
      
      * reformat
      
      * ReduceGather
      
      * GlobalAdd
      
      * ReduceScatter
      
      * EnableIfNeed
      
      * ConcatSplit
      
      * EnableMemSharing for pred if need
      
      
      EnableMemSharing for pred if need
      
      * CtrlEdge for Gather
      
      * CtrlEdge for GlobalAdd
      
      * LocalAdd CtrlEdge
      
      * CollectReduceTaskNode
      
      * reverse nodes
      
      * local_add_mem_sharing
      
      
      local add mem sharing
      
      * global add mem sharing
      
      * reduce_mem_sharing
      
      * bugfix
      
      * refine
      
      * format change (remove empty lines)
      
      * format change
      
      * fix local_add and gather issues
      
      * Dev refactor reduce add (#1218)
      
      * change ReduceGlobalAdd to ReduceAdd
      
      * rm ReduceLocalAdd
      
      * no mem sharing case works
      
      * let ReduceAddCompActor decide whether it is local or global
      
      * multi machine multi gpus Nccl and Oneflow allreduce works
      
      * refine
      
      * extract SortEdges
      
      * make EdgeInfo protected
      
      * Dev use nccl refine (#1220)
      
      * const qualifier
      
      * PoorCompTaskNode=>PipeCompTaskNode
      
      * int=>int32_t
      
      * refine ReduceMemSharingCtx
      
      * NcclDeviceCtx and NcclActor
      
      
      NcclDeviceCtx and NcclActor
      
      * empty line
      
      * CudaDeviceCtx<-NcclDeviceCtx
      
      * fix wrong rank_id in reduce_add_actor (#1229)
      
      * fix wrong rank_id in reduce_add_actor
      
      * rm device_num_of_each_machine from parallel_ctx
      
      * fix reduce gather control edge (#1235)
      
      * fix reduce gather control edge
      
      * extract FindNearestReduceAddCompTaskNode
      
      * extract method ReduceCompTaskNodeIf::FindPredRduceTaskNodeIf
      
      * CHECK nearest_add_copy_d2h
      
      * Dev use nccl cross machine nccl all reduce (#1246)
      
      * support ncclAllReduce cross machine
      
      * fix rank_id and rank_num for mix
      
      * reformat
      
      * reformat
      
      * simplify nccl_kernel (#1256)
      
      * simplify REGISTER_BLD_SUB_TSK_GPH_MTHD (#1260)
      
      * simplify REGISTER_BLD_SUB_TSK_GPH_MTHD
      
      * note
      
      * Dev use nccl reduce ranking ctx (#1252)
      
      * reformat
      
      * compute rank_id and rank_num with FixCompTaskNode
      
      * reformat
      
      * fix rank_id for reduceadd
      
      * ReduceRankingCtx
      
      * New Ranking and MemSharing for Reduce
      
      * DECLARE_REDUCE_LOGICAL_NODE
      
      * Ranking4NcclAllReduce
      
      * fix ranking
      
      * remove AsTaskNode
      
      * reformat
      
      * runtime rank ctx
      
      * rank_set
      
      * bugfix
      
      * bugfix
      
      * unittest
      
      * change use_nccl_all_reduce_cross_machine to use_nccl_inter_node_communication
      
      * refine
      
      
      refine
      
      * move BuildCtrlRegstBetweenReduceCopyNodes to ReduceAddCompTaskNode
      
      * CHECK mem_size_
      55496813
  6. Sep 23, 2018
  7. Sep 19, 2018
  8. Sep 18, 2018
    • Li Xinqi's avatar
      Dev define test blob (#1247) · 0476d2c2
      Li Xinqi authored
      * define_test_blob
      
      * decode random compute task node
      
      * rename define_test_blob_conf.name => define_test_blob_conf.out
      
      * decode random task node color
      0476d2c2
  9. Sep 17, 2018
    • Li Xinqi's avatar
      moving model (#1234) · baa146bd
      Li Xinqi authored
      * moving model
      
      * moving_model => forward_model
      
      * add todo commit
      
      * two model save node
      
      * let md_updt actor handle forward_model
      
      * remove useless code
      
      * rename local variable
      baa146bd
    • Shiyuan Shang-Guan's avatar
      refine model update conf (#1240) · 5ccd29d7
      Shiyuan Shang-Guan authored
      * refine model update conf
      
      * make todo
      
      * add primary_lr and secondary_lr
      5ccd29d7
    • scxfjiang's avatar
    • Juncheng's avatar
      Dev refactor channel (#1181) · fda25987
      Juncheng authored
      * add enum ChannelStatus
      
      * merge CloseSendEnd and CloseReceiveEnd
      
      * update channel_test
      fda25987
    • Jinhui Yuan's avatar
      Refine runtime (#1108) · d76513b3
      Jinhui Yuan authored
      * only master machine saves plan and has event logger
      
      * separate Data, Persistence, Cache, Log FileSystem config
      
      * refine
      
      * only specify data and snapshot path conf
      
      * forbit multiple machines use localfs as snapshot fs
      
      * networkfs as localfs
      
      * refine
      
      * Store log to snapshot (#1109)
      
      * use machine id, drop machine name
      
      * ensure setting machine id
      
      * allow save snapshot to localfs for distributed training (#1113)
      
      * Snapshot to master (#1116)
      
      * allow save snapshot to localfs for distributed training
      
      * fix mdSave to master for model parallel
      
      * fix review comment issues
      
      * add sanity check for machine id
      
      * rm useless comments
      
      * update example
      
      * Dev refine runtime add log stream mgr (#1142)
      
      * add LogStreamMgr
      
      * refine and refactor OutStream=>LogStream
      
      * bugfix
      
      * use LogStreamMgr to write graph, dot, plan, profile and proto
      
      * refine
      
      * simplify, remove LogStreamMgr (#1243)
      
      * simplify, remove LogStreamMgr
      
      * TeePersistentLogStream add static factory (#1244)
      d76513b3
    • cheng cheng's avatar
      fix bug of forward model -> copyD2H conflict with out regst (#1242) · 0da0646c
      cheng cheng authored
      * fix bug of forward model -> copyD2H conflict with out regst
      
      * use 1 line
      0da0646c
  10. Sep 16, 2018
  11. Sep 15, 2018
    • Li Xinqi's avatar
      pb list data type (#1237) · 58f43ff5
      Li Xinqi authored
      58f43ff5
    • Shiyuan Shang-Guan's avatar
      separate model for update (#1232) · 11408363
      Shiyuan Shang-Guan authored
      * make each blob of the packed blob be updated separately in the ModelUpdate
      
      * make blob descs in regst be consistent in bw->md_diff_acc->shared_md_diff_add->md_update->fw
      
      * copy lbi2blob_descs from model
      
      * add shared_model_diff_add kernel
      
      * refine model_update actor and kernel
      
      * rm useless TODO
      
      * add shared_model_diff_add kernel
      
      * refine code
      11408363
  12. Sep 14, 2018
  13. Sep 13, 2018
  14. Sep 10, 2018
  15. Sep 09, 2018
  16. Sep 07, 2018
    • Niu Chong's avatar
      feat: update the data members to use RegstSlot in Actor (#1208) · 38a50de4
      Niu Chong authored
      * feat(register_slot): add the RegstSlot
      
      * feat(register_slot): update RegstSlot if
      
      * feat(actor): update member of Actor to use RegstSlot
      
      * fix(register_slot): fix the available_regst_desc_cnt init val
      
      * refine(register_slot): rename PushBack/PopFront, FindTheRegstDescId to TryPushBack/TryPopFront, HasRegstDescId
      
      * feat(regst_slot): rename ForEachCurRegstDeq/ForEachCurFrontRegst to ForEachRegstDeq/ForEachFrontRegst
      
      * feat(regst_slot): add ForChosenRegstDeq/ForChosenFrontRegst, add CHECK empty in ForEachFrontRegst
      
      * fix(register_slot): fix the CHECK empty
      38a50de4
    • Jinhui Yuan's avatar
      Dev allreduce2 (#1211) · 5909cc43
      Jinhui Yuan authored
      * add ReduceScatter2, ReduceAdd2, ReduceGather2 op and kernel
      
      * add ReduceScatter2, ReduceAdd2, ReduceGather2 task node and actor
      
      * complete Reduce2 op
      
      * TODO: complete ReduceAdd2 kernel
      
      * add ReduceScatter2 task to accept model_diff
      
      * sketch of connecting ReduceScatter2/Add2/Gather2
      
      * build allreduce2 logical graph
      
      * connect allreduce2 task graph
      
      * ReduceScatter2 task node
      
      * complete ReduceAdd2, ReduceGather2 task node
      
      * simplify ReduceAdd2 actor
      
      * refactor ReduceAdd2 task node
      
      * let global add -> gather share path
      
      * separate ReduceLocalAdd2 and ReduceGlobalAdd2
      
      * connect AllReduce2 task graph
      
      * complete ReduceGlobalAdd2 op
      
      * refine ReduceLocalAdd2 task node
      
      * complete ReduceGlobalAdd2 task node
      
      * global AllReduce2 works
      
      * add device_num_of_each_machine to parallel_context
      
      * simplify ReduceGlobalAdd2 runtime
      
      * multi machine multi gpus AllReduce2 works
      
      * add mem sharing and ctrl edge for AllReduce2
      
      * single machine multiple gpu mem sharing works
      
      * refine
      
      * remove the previous allreduce
      
      * change AllReduce2 to AllReduce variable convention
      
      * change filename
      
      * complete transfer to allreduce2
      
      * remove unnecessary format change
      
      * remove unnecessary format change
      
      * simplify
      
      * simplify mem sharing rule for reduce add and gather
      
      * check for local add
      
      * fix reduce_global_add actor bug
      
      * refine reduce task node
      
      * refine variable name
      
      * refine
      
      * refine
      5909cc43
    • Jinhui Yuan's avatar
      fix bug in add kernel of allreduce (#1214) · 34ce4862
      Jinhui Yuan authored
      34ce4862
  17. Sep 06, 2018
  18. Sep 04, 2018
  19. Sep 03, 2018