NCCL logical support Pipeline Parallel By independent NcclComputeStream. (#4806)
* Fw/Bw support double compute stream * NCCL comm create by stream id * 2D NCCL logical kernel support BW independent stream * StreamIndex: NcclComputeStream for each subgraph insert nccl logical. * refactor code * refine code for review * Add WITH_CUDA in DoJobPass(InsertNcclLogicalOpPass)
Showing
- oneflow/core/device/cuda_stream_index.h 8 additions, 0 deletionsoneflow/core/device/cuda_stream_index.h
- oneflow/core/framework/op_kernel.h 1 addition, 0 deletionsoneflow/core/framework/op_kernel.h
- oneflow/core/graph/task_graph.cpp 9 additions, 3 deletionsoneflow/core/graph/task_graph.cpp
- oneflow/core/job/eager_nccl_comm_manager.cpp 69 additions, 29 deletionsoneflow/core/job/eager_nccl_comm_manager.cpp
- oneflow/core/job/eager_nccl_comm_manager.h 3 additions, 0 deletionsoneflow/core/job/eager_nccl_comm_manager.h
- oneflow/core/job_rewriter/insert_nccl_logical_op_pass.cpp 34 additions, 4 deletionsoneflow/core/job_rewriter/insert_nccl_logical_op_pass.cpp
- oneflow/core/job_rewriter/job_completer.cpp 2 additions, 0 deletionsoneflow/core/job_rewriter/job_completer.cpp
- oneflow/core/job_rewriter/pipeline_buffer_pass.cpp 1 addition, 0 deletionsoneflow/core/job_rewriter/pipeline_buffer_pass.cpp
- oneflow/core/operator/op_conf.proto 1 addition, 0 deletionsoneflow/core/operator/op_conf.proto
- oneflow/user/kernels/nccl_logical_2d_sbp_kernels.cpp 10 additions, 1 deletiononeflow/user/kernels/nccl_logical_2d_sbp_kernels.cpp
- oneflow/user/kernels/nccl_logical_kernels.cpp 12 additions, 2 deletionsoneflow/user/kernels/nccl_logical_kernels.cpp
Please register or sign in to comment