Pipeline Parallelism by stage buffer (#4666)
* Pipeline Parallelism: checkpointing insert identity buffer op
* fix complier err
* identity buffer op custom out regst num
* fix bug and runnable
* Chain merge divide fw/bw; MemChain ignore merge; copyhd regst num hack
* Pipeline buffer pass
* Pipeline runnable
* rollback NOT merge mem chain hack
* pipeline_stage_id_hint and rollback checkpointing buffer
* Pipeline buffer only. test pass.
* rollback repeat hack
* Remove CopyHd Hack; Add buffer cross label loader and loss
* refine code for review & fix for new dtype infer
* add note
Co-authored-by:
oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Showing
- oneflow/core/graph/copy_task_node.cpp 2 additions, 18 deletionsoneflow/core/graph/copy_task_node.cpp
- oneflow/core/graph/task_graph.cpp 14 additions, 9 deletionsoneflow/core/graph/task_graph.cpp
- oneflow/core/graph_impl/normal_forward_compute_task_node.cpp 3 additions, 0 deletionsoneflow/core/graph_impl/normal_forward_compute_task_node.cpp
- oneflow/core/job/job_build_and_infer_ctx.cpp 1 addition, 0 deletionsoneflow/core/job/job_build_and_infer_ctx.cpp
- oneflow/core/job/pipeline_config_def.cpp 30 additions, 0 deletionsoneflow/core/job/pipeline_config_def.cpp
- oneflow/core/job_rewriter/insert_nccl_logical_op_pass.cpp 1 addition, 1 deletiononeflow/core/job_rewriter/insert_nccl_logical_op_pass.cpp
- oneflow/core/job_rewriter/pipeline_buffer_pass.cpp 217 additions, 0 deletionsoneflow/core/job_rewriter/pipeline_buffer_pass.cpp
- oneflow/user/kernels/identity_kernel.cpp 2 additions, 0 deletionsoneflow/user/kernels/identity_kernel.cpp
- oneflow/user/ops/buffer_op.cpp 52 additions, 0 deletionsoneflow/user/ops/buffer_op.cpp
Please register or sign in to comment