Skip to content
Snippets Groups Projects
Unverified Commit 38dc8377 authored by cheng cheng's avatar cheng cheng Committed by GitHub
Browse files

Feat: InsertNcclLogicalOps support multi-subgraph by placement group (#4753)


* Pipeline Parallelism: checkpointing insert identity buffer op

* fix complier err

* identity buffer op custom out regst num

* fix bug and runnable

* Chain merge divide fw/bw; MemChain ignore merge; copyhd regst num hack

* Pipeline buffer pass

* Pipeline runnable

* rollback NOT merge mem chain hack

* pipeline_stage_id_hint and rollback checkpointing buffer

* Pipeline buffer only. test pass.

* rollback repeat hack

* Remove CopyHd Hack; Add buffer cross label loader and loss

* InsertNcclLogicalOps support multi-subgraph by placement group

* Implement of InsertNcclLogicalOpPass support multi-subgraph and batch acc and pipeline parallel

* Pipeline + 2D-SBP runnable

* remote note

* WARNING to INFO

* refine code for review & fix for new dtype infer

* add note

* collection reserve for saving rehash cost

Co-authored-by: default avataroneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
parent abab556a
No related branches found
No related tags found
No related merge requests found
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment