Skip to content
Snippets Groups Projects
Unverified Commit e74f6349 authored by i-robot's avatar i-robot Committed by Gitee
Browse files

!3110 modify tinybert network hccl connect time

Merge pull request !3110 from anzhengqi/modify-networks
parents ca7106e1 e65307e7
No related branches found
No related tags found
No related merge requests found
......@@ -684,6 +684,8 @@ In run_general_distill.py, we set the random seed to make sure distribute traini
If accuracy < standard, may be scipy version < 1.7.
if this error occurs, `connect p2p timeout, timeout: 120s.`, please add `export HCCL_CONNECT_TIMEOUT=600` in shell to resolve it.
# [ModelZoo Homepage](#contents)
Please check the official [homepage](https://gitee.com/mindspore/models).
......@@ -680,6 +680,8 @@ run_general_distill.py文件中设置了随机种子,确保分布式训练初
若结果精度不达标,可能原因为使用的scipy版本低于1.7
若出现`connect p2p timeout, timeout: 120s.`报错信息,可以添加环境变量`export HCCL_CONNECT_TIMEOUT=600`方式适当延长HCCL建链时长解决该问题.
# ModelZoo主页
请浏览官网[主页](https://gitee.com/mindspore/models)。
......@@ -27,6 +27,7 @@ EPOCH_SIZE=$2
PROJECT_DIR=$(cd "$(dirname "$0")" || exit; pwd)
export RANK_TABLE_FILE=$3
export RANK_SIZE=$1
export HCCL_CONNECT_TIMEOUT=600
cores=`cat /proc/cpuinfo|grep "processor" |wc -l`
echo "the number of logical core" $cores
avg_core_per_rank=`expr $cores \/ $RANK_SIZE`
......
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment