Skip to content
Snippets Groups Projects
Unverified Commit 45697b0c authored by cheng cheng's avatar cheng cheng Committed by GitHub
Browse files

NCCL use compute stream to memory cost & speed up (#4221)

* Enable insert nccl logical op pass

* FindMaxConnectedSubgraphForGpuExecOrder~

* through order and interface

* implement of insert nccl logical op in pass

* add nccl logical op using UserOp Implement and EagerNcclCommMgr

* add NCCL ReduceScatter op/kernel; refine pass impl of topo order

* add NCCL logical op/kernel AllGather

* fix bug of reduce scatter/ all gather infer shape

* refine log and note

* fix complier err build with CPU ONLY

* support NCCL ALL2ALL and test pass of alexnet model parallel

* rollback of diff in checkpointing_pass.cpp

* rename to nccl_use_compute_stream; ResourceDesc::nccl_use_compute_stream; refine name for review; create nccl_comm_ in KernelCompute;

* refine code for review

* add unittest for nccl use compute stream

* format test scripts

* refine align
parent 5d259566
No related branches found
No related tags found
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment