Skip to content
Snippets Groups Projects
Select Git revision
  • d2ae7e00ee75e2b78d5f3ae2b0c86c9e6bb6b818
  • master default protected
  • cpp-infer-session
3 results

python

  • Clone with SSH
  • Clone with HTTPS
  • user avatar
    daquexian authored and GitHub committed
    * add changes for multi dev demo
    
    Signed-off-by: default avatardaquexian <daquexian566@gmail.com>
    
    * add part of backward hook
    
    Signed-off-by: default avatardaquexian <daquexian566@gmail.com>
    
    * update
    
    Signed-off-by: default avatardaquexian <daquexian566@gmail.com>
    
    * add naive init_with_env
    
    Signed-off-by: default avatardaquexian <daquexian566@gmail.com>
    
    * update
    
    Signed-off-by: default avatardaquexian <daquexian566@gmail.com>
    
    * update
    
    Signed-off-by: default avatardaquexian <daquexian566@gmail.com>
    
    * support_multi_client
    
    * update
    
    Signed-off-by: default avatardaquexian <daquexian566@gmail.com>
    
    * Remove unused code
    
    * Fix multi client launch
    
    * fix __main__ bug
    
    * update abcd op
    
    Signed-off-by: default avatardaquexian <daquexian566@gmail.com>
    
    * fix multi client sync, make nccl instr ordered
    
    Signed-off-by: default avatardaquexian <daquexian566@gmail.com>
    
    * temp changes
    
    Signed-off-by: default avatardaquexian <daquexian566@gmail.com>
    
    * Use functional api instead of op_expr_helper::XXXOp.
    
    * align with latest master, remove unused code
    
    Signed-off-by: default avatardaquexian <daquexian566@gmail.com>
    
    * local rank returns 0 when no env var, save is_multi_client in EnvDesc
    
    Signed-off-by: default avatardaquexian <daquexian566@gmail.com>
    
    * move is_multi_client to ProcessCtx, rename cuda_d2d device to nccl, remove unused code
    
    Signed-off-by: default avatardaquexian <daquexian566@gmail.com>
    
    * abcd -> return_first_input op
    
    Signed-off-by: default avatardaquexian <daquexian566@gmail.com>
    
    * remove launch.py for now
    
    Signed-off-by: default avatardaquexian <daquexian566@gmail.com>
    
    * refine
    
    Signed-off-by: default avatardaquexian <daquexian566@gmail.com>
    
    * update IsMultiClient in env_util.py
    
    Signed-off-by: default avatardaquexian <daquexian566@gmail.com>
    
    * rm multi_dev_demo.py
    
    Signed-off-by: default avatardaquexian <daquexian566@gmail.com>
    
    * remove exported functions in env_util.py
    
    Signed-off-by: default avatardaquexian <daquexian566@gmail.com>
    
    * remove unused op expr helper func
    
    Signed-off-by: default avatardaquexian <daquexian566@gmail.com>
    
    * fix bug
    
    Signed-off-by: default avatardaquexian <daquexian566@gmail.com>
    
    * add DevVmDepObjectConsumeMode and set it as NONE in backward
    
    Signed-off-by: default avatardaquexian <daquexian566@gmail.com>
    
    * move return_first_input op from math_ops.py to tensor_ops.py
    
    Signed-off-by: default avatardaquexian <daquexian566@gmail.com>
    
    * fix compile error
    
    Signed-off-by: default avatardaquexian <daquexian566@gmail.com>
    
    * refine
    
    Signed-off-by: default avatardaquexian <daquexian566@gmail.com>
    
    * add comments
    
    Signed-off-by: default avatardaquexian <daquexian566@gmail.com>
    
    * fix exit bug in init.py
    
    Signed-off-by: default avatardaquexian <daquexian566@gmail.com>
    
    * align with master
    
    Signed-off-by: default avatardaquexian <daquexian566@gmail.com>
    
    * update device ctor
    
    Signed-off-by: default avatardaquexian <daquexian566@gmail.com>
    
    * default dev id = local rank % gpu num
    
    Signed-off-by: default avatardaquexian <daquexian566@gmail.com>
    
    * assert single machine
    
    Signed-off-by: default avatardaquexian <daquexian566@gmail.com>
    
    * reformat
    
    Signed-off-by: default avatardaquexian <daquexian566@gmail.com>
    
    * fix consume mode, implement eager_nccl_allreduce by process ranks
    
    Signed-off-by: default avatardaquexian <daquexian566@gmail.com>
    
    * fill sorted_ranks field in old code, reformat
    
    Signed-off-by: default avatardaquexian <daquexian566@gmail.com>
    
    * set default val for op conf, align with master
    
    Signed-off-by: default avatardaquexian <daquexian566@gmail.com>
    
    * impl return_first_input as functional api, impl allreduce as module
    
    Signed-off-by: default avatardaquexian <daquexian566@gmail.com>
    
    * add more tests
    
    Signed-off-by: default avatardaquexian <daquexian566@gmail.com>
    
    * reformat
    
    Signed-off-by: default avatardaquexian <daquexian566@gmail.com>
    
    * align with master
    
    Signed-off-by: default avatardaquexian <daquexian566@gmail.com>
    
    * rename ddp to flow.nn.parallel.DistributedDataParallel
    
    Signed-off-by: default avatardaquexian <daquexian566@gmail.com>
    
    * refine eager nccl comm
    
    Signed-off-by: default avatardaquexian <daquexian566@gmail.com>
    
    * refine eager nccl comm, divide grad by group size
    
    Signed-off-by: default avatardaquexian <daquexian566@gmail.com>
    
    * rename reversed_param_list -> ddp_state_for_reversed_params
    
    Signed-off-by: default avatardaquexian <daquexian566@gmail.com>
    
    * make return_first_input inplace
    
    Signed-off-by: default avatardaquexian <daquexian566@gmail.com>
    
    * restore eager allreduce
    
    Signed-off-by: default avatardaquexian <daquexian566@gmail.com>
    
    * add static all zero tensor and select first
    
    Signed-off-by: default avatardaquexian <daquexian566@gmail.com>
    
    * refine
    
    Signed-off-by: default avatardaquexian <daquexian566@gmail.com>
    
    * add functional allreduce op and use current rank group
    
    Signed-off-by: default avatardaquexian <daquexian566@gmail.com>
    
    * meterialize StaticAllZeroTensor in allreduce, support it in scalar mul
    
    Signed-off-by: default avatardaquexian <daquexian566@gmail.com>
    
    * materialize static zeros tensor in set_acc_grad
    
    Signed-off-by: default avatardaquexian <daquexian566@gmail.com>
    
    * rename
    
    Signed-off-by: default avatardaquexian <daquexian566@gmail.com>
    
    * auto format by CI
    
    Co-authored-by: default avatarclackhan <han_binbin@163.com>
    Co-authored-by: default avatarhjchen2 <chenhoujiangcug@gmail.com>
    Co-authored-by: default avataroneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
    Co-authored-by: default avataroneflow-ci-bot <ci-bot@oneflow.org>
    d2ae7e00
    History
    Name Last commit Last update
    ..