NCCL use compute stream to memory cost & speed up (#4221) (45697b0c) · Commits · Summer2021 / 210130121

Unverified Commit 45697b0c authored 4 years ago by

cheng cheng Committed by GitHub 4 years ago

NCCL use compute stream to memory cost & speed up (#4221)

* Enable insert nccl logical op pass

* FindMaxConnectedSubgraphForGpuExecOrder~

* through order and interface

* implement of insert nccl logical op in pass

* add nccl logical op using UserOp Implement and EagerNcclCommMgr

* add NCCL ReduceScatter op/kernel; refine pass impl of topo order

* add NCCL logical op/kernel AllGather

* fix bug of reduce scatter/ all gather infer shape

* refine log and note

* fix complier err build with CPU ONLY

* support NCCL ALL2ALL and test pass of alexnet model parallel

* rollback of diff in checkpointing_pass.cpp

* rename to nccl_use_compute_stream; ResourceDesc::nccl_use_compute_stream; refine name for review; create nccl_comm_ in KernelCompute;

* refine code for review

* add unittest for nccl use compute stream

* format test scripts

* refine align

parent 5d259566

No related branches found

No related tags found

Hide whitespace changes

Inline Side-by-side

Showing with 886 additions and 2 deletions

Please register or to comment