NCCL use compute stream to memory cost & speed up (#4221)
* Enable insert nccl logical op pass * FindMaxConnectedSubgraphForGpuExecOrder~ * through order and interface * implement of insert nccl logical op in pass * add nccl logical op using UserOp Implement and EagerNcclCommMgr * add NCCL ReduceScatter op/kernel; refine pass impl of topo order * add NCCL logical op/kernel AllGather * fix bug of reduce scatter/ all gather infer shape * refine log and note * fix complier err build with CPU ONLY * support NCCL ALL2ALL and test pass of alexnet model parallel * rollback of diff in checkpointing_pass.cpp * rename to nccl_use_compute_stream; ResourceDesc::nccl_use_compute_stream; refine name for review; create nccl_comm_ in KernelCompute; * refine code for review * add unittest for nccl use compute stream * format test scripts * refine align
Showing
- oneflow/core/graph/op_graph.h 2 additions, 1 deletiononeflow/core/graph/op_graph.h
- oneflow/core/graph/task_graph.cpp 3 additions, 1 deletiononeflow/core/graph/task_graph.cpp
- oneflow/core/job/resource.proto 3 additions, 0 deletionsoneflow/core/job/resource.proto
- oneflow/core/job/resource_desc.cpp 11 additions, 0 deletionsoneflow/core/job/resource_desc.cpp
- oneflow/core/job/resource_desc.h 1 addition, 0 deletionsoneflow/core/job/resource_desc.h
- oneflow/core/job_rewriter/insert_nccl_logical_op_pass.cpp 264 additions, 0 deletionsoneflow/core/job_rewriter/insert_nccl_logical_op_pass.cpp
- oneflow/core/job_rewriter/job_completer.cpp 8 additions, 0 deletionsoneflow/core/job_rewriter/job_completer.cpp
- oneflow/python/framework/config_util.py 17 additions, 0 deletionsoneflow/python/framework/config_util.py
- oneflow/python/test/ops/test_nccl_use_compute_stream.py 142 additions, 0 deletionsoneflow/python/test/ops/test_nccl_use_compute_stream.py
- oneflow/user/kernels/nccl_logical_kernels.cpp 295 additions, 0 deletionsoneflow/user/kernels/nccl_logical_kernels.cpp
- oneflow/user/ops/nccl_logical_ops.cpp 140 additions, 0 deletionsoneflow/user/ops/nccl_logical_ops.cpp
Please register or sign in to comment