Skip to content
Snippets Groups Projects
Commit 70bb1e9d authored by maijianqiang's avatar maijianqiang
Browse files

add newtwork run demo

parent b4c6581d
No related branches found
No related tags found
No related merge requests found
Showing
with 55 additions and 48 deletions
......@@ -225,7 +225,7 @@ ckpt_file: /home/FCN8s/ckpt/FCN8s_1-133_300.ckpt
#Ascend八卡并行训练
bash scripts/run_train.sh [DEVICE_NUM] rank_table.json
# example: bash scripts/run_train.sh 8 /home/hccl_8p_01234567_10.155.170.71.json
# example: bash scripts/run_train.sh 8 ~/hccl_8p.json
```
分布式训练需要提前创建JSON格式的HCCL配置文件,请遵循[链接说明](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/utils/hccl_tools)
......
......@@ -71,9 +71,9 @@ After installing MindSpore via the official website, you can start training and
```bash
# enter script dir, train MCNN example
sh run_standalone_train_ascend.sh 0 ./formatted_trainval/shanghaitech_part_A_patches_9/train ./formatted_trainval/shanghaitech_part_A_patches_9/train_den ./formatted_trainval/shanghaitech_part_A_patches_9/val ./formatted_trainval/shanghaitech_part_A_patches_9/val_den ./ckpt
bash run_standalone_train_ascend.sh 0 ./formatted_trainval/shanghaitech_part_A_patches_9/train ./formatted_trainval/shanghaitech_part_A_patches_9/train_den ./formatted_trainval/shanghaitech_part_A_patches_9/val ./formatted_trainval/shanghaitech_part_A_patches_9/val_den ./ckpt
# enter script dir, evaluate MCNN example
sh run_standalone_eval_ascend.sh 0 ./original/shanghaitech/part_A_final/test_data/images ./original/shanghaitech/part_A_final/test_data/ground_truth_csv ./train/ckpt/best.ckpt
bash run_standalone_eval_ascend.sh 0 ./original/shanghaitech/part_A_final/test_data/images ./original/shanghaitech/part_A_final/test_data/ground_truth_csv ./train/ckpt/best.ckpt
```
# [Script Description](#contents)
......@@ -126,14 +126,14 @@ Major parameters in train.py and config.py as follows:
```bash
# enter script dir, and run the distribute script
sh run_distribute_train.sh ./hccl_table.json ./formatted_trainval/shanghaitech_part_A_patches_9/train ./formatted_trainval/shanghaitech_part_A_patches_9/train_den ./formatted_trainval/shanghaitech_part_A_patches_9/val ./formatted_trainval/shanghaitech_part_A_patches_9/val_den ./ckpt
bash run_distribute_train.sh ~/hccl_8p.json ./formatted_trainval/shanghaitech_part_A_patches_9/train ./formatted_trainval/shanghaitech_part_A_patches_9/train_den ./formatted_trainval/shanghaitech_part_A_patches_9/val ./formatted_trainval/shanghaitech_part_A_patches_9/val_den ./ckpt
# enter script dir, and run the standalone script
sh run_standalone_train_ascend.sh 0 ./formatted_trainval/shanghaitech_part_A_patches_9/train ./formatted_trainval/shanghaitech_part_A_patches_9/train_den ./formatted_trainval/shanghaitech_part_A_patches_9/val ./formatted_trainval/shanghaitech_part_A_patches_9/val_den ./ckpt
bash run_standalone_train_ascend.sh 0 ./formatted_trainval/shanghaitech_part_A_patches_9/train ./formatted_trainval/shanghaitech_part_A_patches_9/train_den ./formatted_trainval/shanghaitech_part_A_patches_9/val ./formatted_trainval/shanghaitech_part_A_patches_9/val_den ./ckpt
```
After training, the loss value will be achieved as follows:
```text
```log
# grep "loss is " log
epoch: 1 step: 305, loss is 0.00041025918
epoch: 2 step: 305, loss is 3.7117527e-05
......@@ -161,7 +161,7 @@ Before running the command below, please check the checkpoint path used for eval
You can view the results through the file "eval_log". The accuracy of the test dataset will be as follows:
```text
```log
# grep "MAE: " eval_log
MAE: 105.87984801910736 MSE: 161.6687899899305
```
......
numpy
pandas
opencv
\ No newline at end of file
......@@ -46,7 +46,7 @@ Dataset used: [CIFAR-10](<http://www.cs.toronto.edu/~kriz/cifar.html>)
- Note:Data will be processed in dataset.py
- Download the dataset, the directory structure is as follows:
```bash
```cifar10
├─cifar-10-batches-bin
└─cifar-10-verify-bin
......
......@@ -48,7 +48,7 @@ AlexNet由5个卷积层和3个全连接层组成。多个卷积核用于提取
- 注意:数据在dataset.py中处理。
- 下载数据集。目录结构如下:
```bash
```cifar10
├─cifar-10-batches-bin
└─cifar-10-verify-bin
......
......@@ -179,7 +179,7 @@ bash scripts/run_distribute_train_ascend.sh [rank_table] [train_dataset_path] [P
For example, you can run the shell command below to launch the training procedure.
```shell
bash run_distribute_train_ascend.sh /home/hccl_8p_01234567_10.155.170.71.json /home/DataSet/FSNS/train/
bash run_distribute_train_ascend.sh ~/hccl_8p.json /home/DataSet/FSNS/train/
```
- running on ModelArts
......
......@@ -147,7 +147,7 @@ bash scripts/run_standalone_train_gpu.sh $PRETRAINED_CKPT(options)
```bash
bash scripts/run_distribute_train_ascend.sh $RANK_TABLE_FILE $PRETRAINED_CKPT(options)
# example: bash scripts/run_distribute_train_ascend.sh /home/hccl_8p_01234567_10.155.170.71.json
# example: bash scripts/run_distribute_train_ascend.sh ~/hccl_8p.json
```
- Distributed GPU Training:
......@@ -254,7 +254,7 @@ Results and checkpoints are written to `./train` folder. Log can be found in `./
```bash
bash scripts/run_distribute_train_ascend.sh [RANK_TABLE_FILE] [PRETRAINED_CKPT(options)]
# example: bash scripts/run_distribute_train_ascend.sh /home/hccl_8p_01234567_10.155.170.71.json
# example: bash scripts/run_distribute_train_ascend.sh ~/hccl_8p.json
```
For distributed training, a hccl configuration file with JSON format needs to be created in advance.
......
......@@ -144,7 +144,7 @@ bash scripts/run_standalone_train_ascend.sh [DEVICE_ID] [PRETRAINED_CKPT(options
```shell
bash scripts/run_distribute_train_ascend.sh [RANK_TABLE_FILE] [PRETRAINED_CKPT(options)]
# example: bash scripts/run_distribute_train_ascend.sh /home/hccl_8p_01234567_10.155.170.71.json
# example: bash scripts/run_distribute_train_ascend.sh ~/hccl_8p.json
```
- 评估:
......@@ -239,7 +239,7 @@ bash scripts/run_standalone_train_ascend.sh [DEVICE_ID] [PRETRAINED_CKPT(options
```shell
bash scripts/run_distribute_train_ascend.sh [RANK_TABLE_FILE] [PRETRAINED_CKPT(options)]
# example: bash scripts/run_distribute_train_ascend.sh /home/hccl_8p_01234567_10.155.170.71.json
# example: bash scripts/run_distribute_train_ascend.sh ~/hccl_8p.json
```
结果和检查点分别写入设备`i``./train_parallel_{i}`文件夹。
......
......@@ -171,8 +171,7 @@ Modify the parameters according to the actual path
```bash
# distribute training
bash scripts/run_distribute_train_ascend.sh [RANK_TABLE_FILE] [TASK_TYPE] [PRETRAINED_PATH]
# example: bash scripts/run_distribute_train_ascend.sh /home/hccl_8p_01234567_10.155.170.71.json Pretraining(or Finetune) \
# /home/DataSet/ctpn_dataset/backbone/0-150_5004.ckpt
# example: bash scripts/run_distribute_train_ascend.sh ~/hccl_8p.json Pretraining(or Finetune) /home/DataSet/ctpn_dataset/backbone/0-150_5004.ckpt
# standalone training
bash scrpits/run_standalone_train_ascend.sh [TASK_TYPE] [PRETRAINED_PATH] [DEVICE_ID]
......@@ -226,7 +225,7 @@ ICDAR2013, SCUT-FORU to improve precision and recall, and when doing Finetune, w
Ascend:
# distribute training example(8p)
bash run_distribute_train_ascend.sh [RANK_TABLE_FILE] [TASK_TYPE] [PRETRAINED_PATH]
# example: bash scripts/run_distribute_train_ascend.sh /home/hccl_8p_01234567_10.155.170.71.json Pretraining(or Finetune) /home/DataSet/ctpn_dataset/backbone/0-150_5004.ckpt
# example: bash scripts/run_distribute_train_ascend.sh ~/hccl_8p.json Pretraining(or Finetune) /home/DataSet/ctpn_dataset/backbone/0-150_5004.ckpt
# standalone training
bash run_standalone_train_ascend.sh [TASK_TYPE] [PRETRAINED_PATH]
......
......@@ -104,7 +104,7 @@ function cal_acc()
fi
mkdir output
mkdir output_img
python ../postprocess.py --dataset_path=$data_path --result_path=result_Files --label_path=$label_path &> acc.log
python ../postprocess.py --export_dataset_path=$data_path --result_path=result_Files --label_path=$label_path &> acc.log
if [ $? -ne 0 ]; then
echo "calculate accuracy failed"
exit 1
......
......@@ -113,7 +113,7 @@ After installing MindSpore via the official website, you can start training and
- Prepare backbone
Download resnet101 for here(https://download.mindspore.cn/model_zoo/r1.2/resnet101_ascend_v120_imagenet2012_official_cv_bs32_acc78/resnet101_ascend_v120_imagenet2012_official_cv_bs32_acc78.ckpt).
Download resnet101 for [here](https://download.mindspore.cn/model_zoo/r1.2/resnet101_ascend_v120_imagenet2012_official_cv_bs32_acc78/resnet101_ascend_v120_imagenet2012_official_cv_bs32_acc78.ckpt).
- Running on Ascend
......@@ -140,7 +140,7 @@ Enter the shell script to modify the data_file and ckpt_pre_trained parameters
data_file=/home/DataSet/VOC2012/vocaug_mindrecords/vocaug.mindrecord0
ckpt_pre_trained=/home/model/deeplabv3/predtrained/resnet101_ascend_v120_imagenet2012_official_cv_bs32_acc78.ckpt
bash run_distribute_train_s16_r1.sh
bash run_distribute_train_s16_r1.sh ~/hccl_8p.json
```
2. Train s8 with vocaug dataset, finetuning from model in previous step, training script is:
......@@ -151,7 +151,7 @@ Enter the shell script to modify the data_file and ckpt_pre_trained parameters
data_file=/home/DataSet/VOC2012/vocaug_mindrecords/vocaug.mindrecord0
ckpt_pre_trained=/home/model/deeplabv3/predtrained/resnet101_ascend_v120_imagenet2012_official_cv_bs32_acc78.ckpt
bash run_distribute_train_s8_r1.sh
bash run_distribute_train_s8_r1.sh ~/hccl_8p.json
```
3. Train s8 with voctrain dataset, finetuning from model in previous step, training script is:
......@@ -164,17 +164,19 @@ data_file=/home/DataSet/VOC2012/voctrain_mindrecords/votrain.mindrecord0
ckpt_pre_trained=/home/model/deeplabv3/ckpt/deeplabv3-800_330.ckpt
bash run_distribute_train_s8_r2.sh
bash run_distribute_train_s8_r2.sh ~/hccl_8p.json
```
- For evaluation, evaluating steps are as follows:
1. Enter the shell script to modify the data_file and ckpt_pre_trained parameters
```default_config.yaml
```shell
modify the parameter according local path
# example:
data_root=/home/DataSet/VOC2012
data_lst=/home/DataSet/VOC2012/voc_val_lst.txt
ckpt_path=/home/model/deeplabv3/ckpt/deeplabv3-800_330.ckpt
```
2. Eval s16 with voc val dataset, eval script is:
......
......@@ -144,7 +144,7 @@ bash run_standalone_train.sh
data_file=/home/DataSet/VOC2012/vocaug_mindrecords/vocaug.mindrecord0
ckpt_pre_trained=/home/model/deeplabv3/predtrained/resnet101_ascend_v120_imagenet2012_official_cv_bs32_acc78.ckpt
bash run_distribute_train_s16_r1.sh
bash run_distribute_train_s16_r1.sh ~/hccl_8p.json
```
2. 使用VOCaug数据集训练s8,微调上一步的模型。脚本如下:
......@@ -155,7 +155,7 @@ bash run_distribute_train_s16_r1.sh
data_file=/home/DataSet/VOC2012/vocaug_mindrecords/vocaug.mindrecord0
ckpt_pre_trained=/home/model/deeplabv3/predtrained/resnet101_ascend_v120_imagenet2012_official_cv_bs32_acc78.ckpt
bash run_distribute_train_s8_r1.sh
bash run_distribute_train_s8_r1.sh ~/hccl_8p.json
```
3. 使用VOCtrain数据集训练s8,微调上一步的模型。脚本如下:
......@@ -167,17 +167,18 @@ bash run_distribute_train_s8_r1.sh
data_file=/home/DataSet/VOC2012/voctrain_mindrecords/votrain.mindrecord0
ckpt_pre_trained=/home/model/deeplabv3/ckpt/deeplabv3-800_330.ckpt
bash run_distribute_train_s8_r2.sh
bash run_distribute_train_s8_r2.sh ~/hccl_8p.json
```
- 评估步骤如下:
1. 进入对应的shell脚本修改参数
```default_config.yaml
```shell
# example:
data_root=/home/DataSet/VOC2012
data_lst=/home/DataSet/VOC2012/voc_val_lst.txt
ckpt_path=/home/model/deeplabv3/ckpt/deeplabv3-800_330.ckpt
```
2. 使用voc val数据集评估s16。评估脚本如下:
......
......@@ -134,7 +134,7 @@ bash run_alone_train.sh
data_file=/home/DataSet/VOC2012/vocaug_mindrecords/vocaug.mindrecord0
ckpt_pre_trained=/home/model/deeplabv3/predtrained/resnet101_ascend_v120_imagenet2012_official_cv_bs32_acc78.ckpt
bash run_distribute_train_s16_r1.sh
bash run_distribute_train_s16_r1.sh ~/hccl_8p.json
```
2.使用VOCaug数据集训练s8,微调上一步的模型。脚本如下:
......@@ -145,7 +145,7 @@ bash run_distribute_train_s16_r1.sh
data_file=/home/DataSet/VOC2012/vocaug_mindrecords/vocaug.mindrecord0
ckpt_pre_trained=/home/model/deeplabv3/predtrained/resnet101_ascend_v120_imagenet2012_official_cv_bs32_acc78.ckpt
bash run_distribute_train_s8_r1.sh
bash run_distribute_train_s8_r1.sh ~/hccl_8p.json
```
3.使用VOCtrain数据集训练s8,微调上一步的模型。脚本如下:
......@@ -157,17 +157,18 @@ bash run_distribute_train_s8_r1.sh
data_file=/home/DataSet/VOC2012/voctrain_mindrecords/votrain.mindrecord0
ckpt_pre_trained=/home/model/deeplabv3/ckpt/deeplabv3-800_330.ckpt
run_distribute_train_s8_r2.sh
bash run_distribute_train_s8_r2.sh ~/hccl_8p.json
```
评估步骤如下:
1. 进入对应的shell脚本修改参数
```default_cofig.yaml
```shell
# example:
data_root=/home/DataSet/VOC2012
data_lst=/home/DataSet/VOC2012/voc_val_lst.txt
ckpt_path=/home/model/deeplabv3/ckpt/deeplabv3-800_330.ckpt
```
2.使用voc val数据集评估s16。评估脚本如下:
......
......@@ -102,13 +102,13 @@ After installing MindSpore via the official website, you can start training and
# run distributed training example
bash scripts/run_distribute_train.sh [DEVICE_NUM] [RANK_TABLE_FILE] [NET_NAME] [DATASET_NAME] [TRAIN_DATA_DIR]
# example bash scripts/run_distribute_train.sh 8 /root/hccl_8p_01234567_10.155.170.71.json densenet121 imagenet /home/DataSet/ImageNet_Original/train/
# example bash scripts/run_distribute_train.sh 8 ~/hccl_8p.json densenet121 imagenet /home/DataSet/ImageNet_Original/train/
# run evaluation example
python eval.py --net [NET_NAME] --dataset [DATASET_NAME] --eval_data_dir /PATH/TO/DATASET --ckpt_files /PATH/TO/CHECKPOINT > eval.log 2>&1 &
OR
bash scripts/run_distribute_eval.sh [DEVICE_NUM] [RANDK_TABLE_FILE] [NET_NAME] [DATASET_NAME] [EVAL_DATA_DIR][CKPT_PATH]
# example: bash script/run_distribute_eval.sh 8 /root/hccl_8p_01234567_10.155.170.71.json densenet121 imagenet /home/DataSet/ImageNet_Original/train/validation_preprocess/ /home/model/densenet/ckpt/0-120_500.ckpt
# example: bash script/run_distribute_eval.sh 8 ~/hccl_8p.json densenet121 imagenet /home/DataSet/ImageNet_Original/train/validation_preprocess/ /home/model/densenet/ckpt/0-120_500.ckpt
```
For distributed training, a hccl configuration file with JSON format needs to be created in advance.
......@@ -293,7 +293,7 @@ You can modify the training behaviour through the various flags in the `densenet
2020-08-22 17:02:19,921:INFO:local passed
2020-08-22 17:05:43,112:INFO:epoch[2], iter[15011], loss:3.096, mean_fps:6304.53 imgs/sec
2020-08-22 17:05:43,113:INFO:local passed
...
```
- running on GPU
......@@ -326,7 +326,7 @@ You can modify the training behaviour through the various flags in the `densenet
```bash
bash scripts/run_distribute_train.sh [DEVICE_NUM] [RANK_TABLE_FILE] [NET_NAME] [DATASET_NAME] [TRAIN_DATA_DIR]
# example bash scripts/run_distribute_train.sh 8 /root/hccl_8p_01234567_10.155.170.71.json densenet121 imagenet /home/DataSet/ImageNet_Original/train/
# example bash scripts/run_distribute_train.sh 8 ~/hccl_8p.json densenet121 imagenet /home/DataSet/ImageNet_Original/train/
```
......@@ -340,7 +340,8 @@ You can modify the training behaviour through the various flags in the `densenet
2020-08-22 17:09:05,686:INFO:epoch[3], iter[20015], loss:3.113, mean_fps:6304.37 imgs/sec
2020-08-22 17:12:28,925:INFO:epoch[4], iter[25019], loss:3.29, mean_fps:6303.07 imgs/sec
2020-08-22 17:15:52,167:INFO:epoch[5], iter[30023], loss:2.865, mean_fps:6302.98 imgs/sec
...
...
```
- running on GPU
......@@ -367,7 +368,7 @@ You can modify the training behaviour through the various flags in the `densenet
python eval.py --net [NET_NAME] --dataset [DATASET_NAME] --eval_data_dir /PATH/TO/DATASET --ckpt_files /PATH/TO/CHECKPOINT > eval.log 2>&1 &
OR
bash scripts/run_distribute_eval.sh [DEVICE_NUM] [RANDK_TABLE_FILE] [NET_NAME] [DATASET_NAME] [EVAL_DATA_DIR][CKPT_PATH]
# example: bash script/run_distribute_eval.sh 8 /root/hccl_8p_01234567_10.155.170.71.json densenet121 imagenet /home/DataSet/ImageNet_Original/train/validation_preprocess/ /home/model/densenet/ckpt/0-120_500.ckpt
# example: bash script/run_distribute_eval.sh 8 ~/hccl_8p.json densenet121 imagenet /home/DataSet/ImageNet_Original/train/validation_preprocess/ /home/model/densenet/ckpt/0-120_500.ckpt
```
......
......@@ -107,13 +107,13 @@ DenseNet-100使用的数据集: Cifar-10
# 分布式训练示例
bash scripts/run_distribute_train.sh [DEVICE_NUM] [RANK_TABLE_FILE] [NET_NAME] [DATASET_NAME] [TRAIN_DATA_DIR]
# example bash scripts/run_distribute_train.sh 8 /root/hccl_8p_01234567_10.155.170.71.json densenet121 imagenet /home/DataSet/ImageNet_Original/train/
# example bash scripts/run_distribute_train.sh 8 ~/hccl_8p.json densenet121 imagenet /home/DataSet/ImageNet_Original/train/
# 单卡评估示例
python eval.py --net [NET_NAME] --dataset [DATASET_NAME] --eval_data_dir /PATH/TO/DATASET --ckpt_files /PATH/TO/CHECKPOINT > eval.log 2>&1 &
bash scripts/run_distribute_eval.sh [DEVICE_NUM] [RANDK_TABLE_FILE] [NET_NAME] [DATASET_NAME] [EVAL_DATA_DIR][CKPT_PATH]
# example: bash script/run_distribute_eval.sh 8 /root/hccl_8p_01234567_10.155.170.71.json densenet121 imagenet /home/DataSet/ImageNet_Original/train/validation_preprocess/ /home/model/densenet/ckpt/0-120_500.ckpt
# example: bash script/run_distribute_eval.sh 8 ~/hccl_8p.json densenet121 imagenet /home/DataSet/ImageNet_Original/train/validation_preprocess/ /home/model/densenet/ckpt/0-120_500.ckpt
```
分布式训练需要提前创建JSON格式的HCCL配置文件。
......@@ -310,7 +310,7 @@ python train.py --net=[NET_NAME] --dataset=[DATASET_NAME] --train_data_dir=[DATA
```shell
bash scripts/run_distribute_train.sh [DEVICE_NUM] [RANK_TABLE_FILE] [NET_NAME] [DATASET_NAME] [TRAIN_DATA_DIR]
# example bash scripts/run_distribute_train.sh 8 /root/hccl_8p_01234567_10.155.170.71.json densenet121 imagenet /home/DataSet/ImageNet_Original/train/
# example bash scripts/run_distribute_train.sh 8 ~/hccl_8p.json densenet121 imagenet /home/DataSet/ImageNet_Original/train/
```
上述shell脚本将在后台进行分布式训练。可以通过文件`train[X]/output/202x-xx-xx_time_xx_xx_xx/`查看结果日志和模型检查点。在ImageNet数据集上训练DenseNet-121的损失值的实现如下:
......
......@@ -228,7 +228,7 @@ Run `scripts/train_distributed.sh` to train the model distributed. The usage of
```text
bash scripts/train_distributed.sh [rank_table] [train_data_dir] [ckpt_path_to_save] [rank_size] [eval_each_epoch] [pretrained_ckpt(optional)]
# example: bash scripts/train_distributed.sh /root/hccl_8p_01234567_10.155.170.71.json /home/DataSet/ImageNet_Original/train/ ./ckpt/ 8 0
# example: bash scripts/train_distributed.sh ~/hccl_8p.json /home/DataSet/ImageNet_Original/train/ ./ckpt/ 8 0
```
The above shell script will run distribute training in the background. You can view the results through the file `train_parallel[X]/log.txt` as follows:
......
......@@ -81,7 +81,7 @@ Dataset used [ICDAR 2015](https://rrc.cvc.uab.es/?ch=4&com=downloads)
```bash
# distribute training example(8p)
bash run_distribute_train.sh [DATASET_PATH] [PRETRAINED_BACKBONE] [RANK_TABLE_FILE]
# example: bash run_distribute_train.sh /home/DataSet/ICDAR2015/ic15/ home/model/east/pretrained/0-150_5004.ckpt /root/hccl_8p_01234567_10.155.170.71.json
# example: bash run_distribute_train.sh /home/DataSet/ICDAR2015/ic15/ home/model/east/pretrained/0-150_5004.ckpt ~/hccl_8p.json
# standalone training
bash run_standalone_train_ascend.sh [DATASET_PATH] [PRETRAINED_BACKBONE] [DEVICE_ID]
......@@ -106,7 +106,7 @@ bash run_eval_ascend.sh [DATASET_PATH] [CKPT_PATH] [DEVICE_ID]
Ascend:
# distribute training example(8p)
bash run_distribute_train.sh [DATASET_PATH] [PRETRAINED_BACKBONE] [RANK_TABLE_FILE]
# example: bash run_distribute_train.sh /home/DataSet/ICDAR2015/ic15/ home/model/east/pretrained/0-150_5004.ckpt /root/hccl_8p_01234567_10.155.170.71.json
# example: bash run_distribute_train.sh /home/DataSet/ICDAR2015/ic15/ home/model/east/pretrained/0-150_5004.ckpt ~/hccl_8p.json
# standalone training
bash run_standalone_train_ascend.sh [DATASET_PATH] [PRETRAINED_BACKBONE] [DEVICE_ID]
......
......@@ -105,7 +105,7 @@ After installing MindSpore via the official website, you can start training and
# run distributed training example
bash scripts/run_train.sh [RANK_TABLE_FILE] [DATASET_NAME]
# example: bash scripts/run_train.sh /root/hccl_8p_01234567_10.155.170.71.json cifar10
# example: bash scripts/run_train.sh ~/hccl_8p.json cifar10
# run evaluation example
python eval.py > eval.log 2>&1 &
......@@ -401,7 +401,7 @@ For more configuration details, please refer the script `config.py`.
- running on Ascend
```bash
bash scripts/run_train.sh /root/hccl_8p_01234567_10.155.170.71.json cifar10
bash scripts/run_train.sh ~/hccl_8p.json cifar10
```
The above shell script will run distribute training in the background. You can view the results through the file `train_parallel[X]/log`. The loss value will be achieved as follows:
......
......@@ -107,7 +107,7 @@ GoogleNet由多个inception模块串联起来,可以更加深入。 降维的
# 运行分布式训练示例
bash scripts/run_train.sh [RANK_TABLE_FILE] [DATASET_NAME]
# example: bash scripts/run_train.sh /root/hccl_8p_01234567_10.155.170.71.json cifar10
# example: bash scripts/run_train.sh ~/hccl_8p.json cifar10
# 运行评估示例
python eval.py > eval.log 2>&1 &
......@@ -371,7 +371,7 @@ GoogleNet由多个inception模块串联起来,可以更加深入。 降维的
- Ascend处理器环境运行
```bash
bash scripts/run_train.sh /root/hccl_8p_01234567_10.155.170.71.json cifar10
bash scripts/run_train.sh ~/hccl_8p.json cifar10
```
上述shell脚本将在后台运行分布训练。您可以通过train_parallel[X]/log文件查看结果。采用以下方式达到损失值:
......
......@@ -284,7 +284,7 @@ Take training cifar10 as an example, the ds_type parameter is set to cifar10
```shell
# distribute training(8p)
bash run_distribute_train.sh [RANK_TABLE_FILE] [DATA_PATH] [CKPT_PATH]
# example: bash run_distribute_train.sh /root/hccl_8p_012345467_10.155.170.71.json /home/DataSet/cifar10/ ./ckpt/
# example: bash run_distribute_train.sh ~/hccl_8p.json /home/DataSet/cifar10/ ./ckpt/
# standalone training
bash scripts/run_standalone_train.sh [DEVICE_ID] [DATA_PATH] [CKPT_PATH]
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment