add gpu scripts to glore_res50 and meger res50 with res200

12ecbcc7 · gaozeyang · 03ac2c1b · 12ecbcc7 · 12ecbcc7 · 12ecbcc7
Commit 12ecbcc7 authored Sep 23, 2021 by gaozeyang
--- a/research/cv/glore_res/README_CN.md
+++ b/research/cv/glore_res/README_CN.md
+
+# 目录
+
+<!-- TOC -->
+
+- [目录](#目录)
+- [Glore_resnet描述](#glore_resnet描述)
+    - [概述](#概述)
+    - [论文](#论文)
+- [模型架构](#模型架构)
+- [数据集](#数据集)
+- [特性](#特性)
+    - [混合精度](#混合精度)
+- [环境要求](#环境要求)
+- [快速入门](#快速入门)
+- [脚本说明](#脚本说明)
+    - [脚本及样例代码](#脚本及样例代码)
+    - [脚本参数](#脚本参数)
+    - [训练过程](#训练过程)
+        - [用法](#用法)
+            - [Ascend处理器环境运行](#ascend处理器环境运行)
+            - [GPU处理器环境运行](#gpu处理器环境运行)
+    - [训练结果](#训练结果)
+    - [推理过程](#推理过程)
+        - [用法](#用法-1)
+            - [Ascend处理器环境运行](#ascend处理器环境运行-1)
+            - [GPU处理器环境运行](#gpu处理器环境运行-1)
+    - [推理结果](#推理结果)
+- [模型描述](#模型描述)
+    - [性能](#性能)
+        - [训练性能](#训练性能)
+            - [ImageNet2012上的Glore_resnet50](#imagenet2012上的glore_resnet50)
+            - [ImageNet2012上的Glore_resnet200](#imagenet2012上的glore_resnet200)
+        - [推理性能](#推理性能)
+            - [ImageNet2012上的Glore_resnet50](#imagenet2012上的glore_resnet50)
+            - [ImageNet2012上的Glore_resnet200](#imagenet2012上的glore_resnet200)
+- [随机情况说明](#随机情况说明)
+- [ModelZoo主页](#modelzoo主页)
+
+<!-- /TOC -->
+
+# Glore_resnet描述
+
+## 概述
+
+卷积神经网络擅长提取局部关系，但是在处理全局上的区域间关系时显得低效，且需要堆叠很多层才可能完成，而在区域之间进行全局建模和推理对很多计算机视觉任务有益。为了进行全局推理，facebook research、新加坡国立大学和360 AI研究所提出了基于图的全局推理模块-Global Reasoning Unit，可以被插入到很多任务的网络模型中。glore_res200是在ResNet200的Stage2, Stage3中分别均匀地插入了2和3个全局推理模块的用于图像分类任务的网络模型。
+
+如下为MindSpore使用ImageNet2012数据集对glore_res50进行训练的示例。glore_res50可参考[论文1](https://arxiv.org/pdf/1811.12814v1.pdf)
+
+## 论文
+
+1.[论文](https://arxiv.org/abs/1811.12814):Yunpeng Chenyz, Marcus Rohrbachy, Zhicheng Yany, Shuicheng Yanz, Jiashi Fengz, Yannis Kalantidisy
+
+# 模型架构
+
+glore_res的总体网络架构如下：
+[链接](https://arxiv.org/pdf/1811.12814v1.pdf)
+
+glore_res200网络模型的backbone是ResNet200, 在Stage2, Stage3中分别均匀地插入了了2个和3个全局推理模块。全局推理模块在Stage2和Stage 3中插入方式相同.
+
+# 数据集
+
+使用的数据集：[ImageNet2012](http://www.image-net.org/)
+
+- 数据集大小：共1000个类、224*224彩色图像
+    - 训练集：共1,281,167张图像  
+    - 测试集：共50,000张图像
+- 数据格式：JPEG
+    - 注：数据在dataset.py中处理。
+- 下载数据集，目录结构如下:
+
+```text
+└─dataset
+    ├─train                # 训练数据集
+    └─val                  # 评估数据集
+```
+
+# 特性
+
+## 混合精度
+
+采用[混合精度](https://www.mindspore.cn/docs/programming_guide/zh-CN/master/enable_mixed_precision.html)的训练方法使用支持单精度和半精度数据来提高深度学习神经网络的训练速度，同时保持单精度训练所能达到的网络精度。混合精度训练提高计算速度、减少内存使用的同时，支持在特定硬件上训练更大的模型或实现更大批次的训练。
+以FP16算子为例，如果输入数据类型为FP32，MindSpore后台会自动降低精度来处理数据。用户可打开INFO日志，搜索“reduce precision”查看精度降低的算子。
+
+# 环境要求
+
+- 硬件(Ascend/GPU)
+    - 准备Ascend或GPU处理器搭建硬件环境。
+- 框架
+    - [MindSpore](https://www.mindspore.cn/install)
+- 如需查看详情，请参见如下资源：
+    - [MindSpore教程](https://www.mindspore.cn/tutorials/zh-CN/master/index.html)
+    - [MindSpore Python API](https://www.mindspore.cn/docs/api/zh-CN/master/index.html)
+
+# 快速入门
+
+通过官方网站安装MindSpore后，您可以按照如下步骤进行训练和评估：
+
+- Ascend处理器环境运行
+
+```bash
+# 分布式训练
+用法:bash run_distribute_train.sh [DATASET_PATH] [RANK_TABLE] [CONFIG_PATH]
+
+# 单机训练
+用法:bash run_standalone_train.sh [DATASET_PATH] [DEVICE_ID] [CONFIG_PATH]
+
+# 运行评估示例
+用法:bash run_eval.sh [DATASET_PATH] [DEVICE_ID] [CHECKPOINT_PATH] [CONFIG_PATH]
+```
+
+- GPU处理器环境运行
+
+```bash
+# 分布式训练
+用法:bash run_distribute_train_gpu.sh [DATASET_PATH] [RANK_SIZE] [CONFIG_PATH]
+
+# 单机训练
+用法:bash run_standalone_train_gpu.sh [DATASET_PATH] [CONFIG_PATH]
+
+# 运行评估示例
+用法:bash run_eval_gpu.sh [DATASET_PATH] [CHECKPOINT_PATH] [CONFIG_PATH]
+```
+
+  对于分布式训练，需要提前创建JSON格式的hccl配置文件。
+
+  请遵循以下链接中的说明：
+
+ <https://gitee.com/mindspore/mindspore/tree/master/model_zoo/utils/hccl_tools.>
+
+# 脚本说明
+
+## 脚本及样例代码
+
+```shell
+.
+└──Glore_resnet
+  ├── README.md
+  ├── script
+    ├── run_distribute_train.sh            # 启动Ascend分布式训练（8卡）
+    ├── run_distribute_train_gpu.sh        # 启动GPU分布式训练（8卡）
+    ├── run_eval.sh                        # 启动Ascend、GPU推理（单卡）
+    └── run_standalone_train_gpu.sh        # 启动Ascend、GPU单机训练（单卡）
+  ├── src
+    ├── _init_.py
+    ├── config.py                   #参数配置
+    ├── dataset.py                  # 加载数据集
+    ├── autoaugment.py                     # AutoAugment组件与类
+    ├── lr_generator.py             # 学习率策略
+    ├── loss.py                            # ImageNet2012数据集的损失定义
+    ├── save_callback.py                   # 训练时推理并保存最优精度下的参数
+    ├── glore_resnet200.py          # glore_resnet200网络
+    ├── glore_resnet50.py          # glore_resnet50网络
+    ├── transform.py                # 数据增强
+    └── transform_utils.py          # 数据增强
+  ├── eval.py                       # 推理脚本
+  ├── export.py                     # 将checkpoint导出
+  └── train.py                      # 训练脚本
+```
+
+## 脚本参数
+
+- 配置Glore_resnet50在ImageNet2012数据集参数(Ascend)。
+
+```text
+"class_num":1000,                # 数据集类数
+"batch_size":128,                # 输入张量的批次大小
+"loss_scale":1024,               # 损失等级
+"momentum":0.9,                  # 动量优化器
+"weight_decay":1e-4,             # 权重衰减
+"epoch_size":120,                # 此值仅适用于训练；应用于推理时固定为1
+"pretrained": False,             # 加载预训练权重
+"pretrain_epoch_size": 0,        # 加载预训练检查点之前已经训练好的模型的周期大小；实际训练周期大小等于epoch_size减去pretrain_epoch_size
+"save_checkpoint":True,          # 是否保存检查点
+"save_checkpoint_epochs":5,      # 两个检查点之间的周期间隔；默认情况下，最后一个检查点将在最后一个周期完成后保存
+"keep_checkpoint_max":10,        # 只保存最后一个keep_checkpoint_max检查点
+"save_checkpoint_path":"./",     # 检查点相对于执行路径的保存路径
+"warmup_epochs":0,               # 热身周期数
+"lr_decay_mode":"Linear",        # 用于生成学习率的衰减模式
+"use_label_smooth":True,         # 标签平滑
+"label_smooth_factor":0.05,      # 标签平滑因子
+"weight_init": "xavier_uniform",      # 权重初始化方式,可选"he_normal", "he_uniform", "xavier_uniform"
+"use_autoaugment": True,         # 是否应用AutoAugment方法
+"lr_init":0,                     # 初始学习率
+"lr_max":0.8,                    # 最大学习率
+"lr_end":0.0,                    # 最小学习率
+```
+
+- 配置Glore_resnet50在ImageNet2012数据集参数(GPU)。
+
+```text
+"class_num":1000,                # 数据集类数
+"batch_size":128,                # 输入张量的批次大小
+"loss_scale":1024,               # 损失等级
+"momentum":0.9,                  # 动量优化器
+"weight_decay":1e-4,             # 权重衰减
+"epoch_size":130,                # 此值仅适用于训练；应用于推理时固定为1
+"pretrained": False,             # 加载预训练权重
+"pretrain_epoch_size": 0,        # 加载预训练检查点之前已经训练好的模型的周期大小；实际训练周期大小等于epoch_size减去pretrain_epoch_size
+"save_checkpoint":True,          # 是否保存检查点
+"save_checkpoint_epochs":5,      # 两个检查点之间的周期间隔；默认情况下，最后一个检查点将在最后一个周期完成后保存
+"keep_checkpoint_max":10,        # 只保存最后一个keep_checkpoint_max检查点
+"save_checkpoint_path":"./",     # 检查点相对于执行路径的保存路径
+"warmup_epochs":0,               # 热身周期数
+"lr_decay_mode":"Linear",        # 用于生成学习率的衰减模式
+"use_label_smooth":True,         # 标签平滑
+"label_smooth_factor":0.05,      # 标签平滑因子
+"weight_init": "xavier_uniform",      # 权重初始化方式,可选"he_normal", "he_uniform", "xavier_uniform"
+"use_autoaugment": True,         # 是否应用AutoAugment方法
+"lr_init":0,                     # 初始学习率
+"lr_max":0.8,                    # 最大学习率
+"lr_end":0.0,                    # 最小学习率
+```
+
+- 配置Glore_resnet200在ImageNet2012数据集参数(Ascend)。
+
+```text
+"class_num":1000,                # 数据集类数
+"batch_size":80,                 # 输入张量的批次大小
+"loss_scale":1024,               # 损失等级
+"momentum":0.08,                 # 动量优化器
+"weight_decay":0.0002,           # 权重衰减
+"epoch_size":150,                # 此值仅适用于训练；应用于推理时固定为1
+"pretrain_epoch_size":0,         # 加载预训练检查点之前已经训练好的模型的周期大小；实际训练周期大小等于epoch_size减去pretrain_epoch_size
+"save_checkpoint":True,          # 是否保存检查点
+"save_checkpoint_epochs":5,      # 两个检查点之间的周期间隔；默认情况下，最后一个检查点将在最后一个周期完成后保存
+"keep_checkpoint_max":10,        # 只保存最后一个keep_checkpoint_max检查点
+"save_checkpoint_path":"./",     # 检查点相对于执行路径的保存路径
+"warmup_epochs":0,               # 热身周期数
+"lr_decay_mode":"poly",          # 用于生成学习率的衰减模式
+"lr_init":0.1,                   # 初始学习率
+"lr_max":0.4,                    # 最大学习率
+"lr_end":0.0,                    # 最小学习率
+```
+
+- 配置Glore_resnet200在ImageNet2012数据集参数(GPU)。
+
+```text
+"class_num":1000,                # 数据集类数
+"batch_size":64,                 # 输入张量的批次大小
+"loss_scale":1024,               # 损失等级
+"momentum":0.08,                 # 动量优化器
+"weight_decay":0.0002,           # 权重衰减
+"epoch_size":150,                # 此值仅适用于训练；应用于推理时固定为1
+"pretrain_epoch_size":0,         # 加载预训练检查点之前已经训练好的模型的周期大小；实际训练周期大小等于epoch_size减去pretrain_epoch_size
+"save_checkpoint":True,          # 是否保存检查点
+"save_checkpoint_epochs":5,      # 两个检查点之间的周期间隔；默认情况下，最后一个检查点将在最后一个周期完成后保存
+"keep_checkpoint_max":10,        # 只保存最后一个keep_checkpoint_max检查点
+"save_checkpoint_path":"./",     # 检查点相对于执行路径的保存路径
+"warmup_epochs":0,               # 热身周期数
+"lr_decay_mode":"poly",          # 用于生成学习率的衰减模式
+"lr_init":0.1,                   # 初始学习率
+"lr_max":0.4,                    # 最大学习率
+"lr_end":0.0,                    # 最小学习率
+```
+
+更多配置细节请参考脚本`config.py`。
+
+## 训练过程
+
+### 用法
+
+#### Ascend处理器环境运行
+
+```text
+# 分布式训练
+用法:bash run_distribute_train.sh [DATASET_PATH] [RANK_TABLE] [CONFIG_PATH]
+
+# 单机训练
+用法:bash run_standalone_train.sh [DATASET_PATH] [DEVICE_ID] [CONFIG_PATH]
+
+# 运行推理示例
+用法:bash run_eval.sh [DATASET_PATH] [DEVICE_ID] [CHECKPOINT_PATH] [CONFIG_PATH]
+```
+
+分布式训练需要提前创建JSON格式的HCCL配置文件。
+
+具体操作，参见[hccn_tools](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/utils/hccl_tools)中的说明。
+
+训练结果保存在示例路径中，文件夹名称以“train”或“train_parallel”开头。您可在此路径下的日志中找到检查点文件以及结果，如下所示。
+
+#### GPU处理器环境运行
+
+```text
+# 分布式训练
+用法:bash run_distribute_train_gpu.sh [DATASET_PATH] [RANK_SIZE] [CONFIG_PATH]
+
+# 单机训练
+用法:bash run_standalone_train_gpu.sh [DATASET_PATH] [CONFIG_PATH]
+
+# 运行推理示例
+用法:bash run_eval_gpu.sh [DATASET_PATH] [CHECKPOINT_PATH] [CONFIG_PATH]
+```
+
+## 训练结果
+
+- 使用ImageNet2012数据集训练Glore_resnet50（8 pcs）
+
+```text
+# 分布式训练结果（8P）
+epoch:1 step:1251, loss is 5.074506
+epoch:2 step:1251, loss is 4.339285
+epoch:3 step:1251, loss is 3.9819345
+epoch:4 step:1251, loss is 3.5608528
+epoch:5 step:1251, loss is 3.3024906
+...
+```
+
+- 使用ImageNet2012数据集训练Glore_resnet200（8 pcs）
+
+```text
+# 分布式训练结果（8P）
+epoch:1 step:1251, loss is 6.0563216
+epoch:2 step:1251, loss is 5.3812423
+epoch:3 step:1251, loss is 4.782114
+epoch:4 step:1251, loss is 4.4079633
+epoch:5 step:1251, loss is 4.080069
+...
+```
+
+## 推理过程
+
+### 用法
+
+#### Ascend处理器环境运行
+
+```bash
+# 推理
+Usage: bash run_eval.sh [DATASET_PATH] [CHECKPOINT_PATH] [CONFIG_PATH]
+```
+
+```bash
+# 推理示例
+bash run_eval.sh ~/Imagenet_Original/ 0 ~/glore_resnet200-150_1251.ckpt ../config/config_resnet50_gpu.yaml
+```
+
+#### GPU处理器环境运行
+
+```bash
+# 推理
+Usage: bash run_eval_gpu.sh [DATASET_PATH] [CHECKPOINT_PATH] [CONFIG_PATH]
+```
+
+```bash
+# 推理示例
+bash run_eval.sh ~/Imagenet  ~/glore_resnet200-150_2502.ckpt ../config/config_resnet50_gpu.yaml
+```
+
+## 推理结果
+
+```text
+result:{'top_1 acc':0.802303685897436}
+```
+
+# 模型描述
+
+## 性能
+
+### 训练性能
+
+#### ImageNet2012上的Glore_resnet50
+
+| 参数                 | Ascend 910                                   |          GPU                       |
+| -------------------------- | -------------------------------------- |------------------------------------|
+| 模型版本              | Glore_resnet50                            |Glore_resnet50                     |
+| 资源                   | Ascend 910；CPU：2.60GHz，192核；内存：2048G |GPU-V100 PCIE 32G                     |
+| 上传日期              | 2021-03-21                                  |2021-09-22                         |
+| MindSpore版本          | r1.1                                  |1.3.0                          |
+| 数据集                    | ImageNet2012                             | ImageNet2012                      |
+| 训练参数        | epoch=120, steps per epoch=1251, batch_size = 128  |epoch=130, steps per epoch=1251, batch_size = 128 |
+| 优化器                  | Momentum                                       | Momentum                                           |
+| 损失函数              | SoftmaxCrossEntropyExpand                    |SoftmaxCrossEntropyExpand          |
+| 输出                    | 概率                                       |概率                               |
+| 损失                       |1.8464266                                |1.7463021                        |
+| 速度                      | 263.483毫秒/步（8卡）                     |655 毫秒/步（8卡）             |
+| 总时长                 | 10.98小时                                   |58.5 小时                          |
+| 参数(M)             | 30.5                                            |30.5                          |
+| 微调检查点| 233.46M（.ckpt文件）                                      |233.46M（.ckpt文件）                          |
+| 脚本                    | [链接](https://gitee.com/mindspore/models/tree/master/research/cv/glore_res) |
+
+#### ImageNet2012上的Glore_resnet200
+
+| 参数                 | Ascend 910                                   |          GPU                       |
+| -------------------------- | -------------------------------------- |------------------------------------|
+| 模型版本              | Glore_resnet200                             |Glore_resnet200                     |
+| 资源                   | Ascend 910；CPU：2.60GHz，192核；内存：2048G |GPU-V100(SXM2)                     |
+| 上传日期              | 2021-03-34                                   |2021-05-25                         |
+| MindSpore版本          | 1.3.0                                   |1.2.0                          |
+| 数据集                    | ImageNet2012                             | ImageNet2012                      |
+| 训练参数        | epoch=150, steps per epoch=2001, batch_size = 80  |epoch=150, steps per epoch=2502, batch_size = 64 |
+| 优化器                  | NAG                                        | NAG                                           |
+| 损失函数              | SoftmaxCrossEntropyExpand                    |SoftmaxCrossEntropyExpand          |
+| 输出                    | 概率                                       |概率                               |
+| 损失                       |0.8068262                                |0.55614954                        |
+| 速度                      | 400.343毫秒/步（8卡）                     |912.211 毫秒/步（8卡）             |
+| 总时长                 | 33时35分钟                                   |94时08分                          |
+| 参数(M)             | 70.6                                           |70.6                          |
+| 微调检查点| 807.57M（.ckpt文件）                                      |808.28(.ckpt)                          |
+| 脚本                    | [链接](https://gitee.com/mindspore/models/tree/master/research/cv/glore_res) |
+
+### 推理性能
+
+#### ImageNet2012上的Glore_resnet50
+
+| 参数          | Ascend                      |   GPU                        |
+| ------------------- | ----------------------|------------------------------|
+| 模型版本       | Glore_resnet50              |  Glore_resnet50          |
+| 资源            | Ascend 910                |   GPU-V100 PCIE 32G                        |
+| 上传日期       | 2021-03-21                  |2021-09-22                    |
+| MindSpore版本   | r1.1                 |1.3.0                    |
+| 数据集             | ImageNet2012测试集(6.4GB)              | ImageNet2012测试集(6.4GB)                   |
+| batch_size          | 128                   |128                          |
+| 输出             | 概率                     |概率                         |
+| 准确性            | 8卡: 78.44%             |8卡：78.50%                 |
+
+#### ImageNet2012上的Glore_resnet200
+
+| 参数          | Ascend                      |   GPU                        |
+| ------------------- | ----------------------|------------------------------|
+| 模型版本       | Glore_resnet200              |  Glore_resnet200           |
+| 资源            | Ascend 910                |   GPU-V100(SXM2)                       |
+| 上传日期       | 2021-3-24                  |2021-05-25                    |
+| MindSpore版本   | 1.3.0                 |1.2.0                    |
+| 数据集             | ImageNet2012测试集(6.4GB)             | ImageNet2012测试集(6.4GB)                   |
+| batch_size          | 80                   |64                          |
+| 输出             | 概率                     |概率                         |
+| 准确性            | 8卡: 80.23%             |8卡：80.603%                 |
+
+# 随机情况说明
+
+transform_utils.py中使用数据增强时采用了随机选择策略，train.py中使用了随机种子。
+
+# ModelZoo主页
+
+ 请浏览官网[主页](https://gitee.com/mindspore/mindspore/tree/master/model_zoo)
--- a/research/cv/glore_res/__init__.py
+++ b/research/cv/glore_res/__init__.py
--- a/research/cv/glore_res/config/config_resnet101_ascend.yaml
+++ b/research/cv/glore_res/config/config_resnet101_ascend.yaml
+# Builtin Configurations(DO NOT CHANGE THESE CONFIGURATIONS unless you know exactly what you are doing)
+isModelArts: false
+# Url for modelarts
+data_url: ""
+train_url: ""
+checkpoint_url: ""
+# Path for local
+run_distribute: true
+enable_profiling: False
+data_path: "/cache/data"
+output_path: "/cache/train"
+load_path: "/cache/checkpoint_path/"
+device_target: "Ascend"
+checkpoint_path: "./checkpoint/"
+
+# ==============================================================================
+# Training options
+batch_size: 80
+class_num: 1000
+epoch_size: 150
+keep_checkpoint_max: 10
+loss_scale: 1024
+lr_decay_mode: poly
+lr_end: 0
+lr_init: 0.1
+lr_max: 0.4
+momentum: 0.08
+pretrain_epoch_size: 0
+use_glore: true
+save_checkpoint: true
+save_checkpoint_epochs: 5
+save_checkpoint_path: ./
+use_glore: true
+use_label_smooth: false
+warmup_epochs: 0
+weight_decay: 0.0002
+net: "resnet101"
+cast_fp16: true
+
+device_target: "Ascend"
+device_id: 0
+device_num: 8
+data_url: ""
+pretrained_ckpt: ""
+parameter_server: ""
+
+
+# Export options
+device_id: 0
+file_name: "resnet200"
+file_format: "MINDIR"
+ckpt_url: ""
+
+# Image options
+image_size: 224
+
+---
+# Help description for each configuration
+enable_modelarts: "Whether training on modelarts, default: False"
+data_url: "Dataset url for obs"
+checkpoint_url: "The location of checkpoint for obs"
+data_path: "Dataset path for local"
+output_path: "Training output path for local"
+load_path: "The location of checkpoint for obs"
+device_target: "Target device type, available: [Ascend, GPU, CPU]"
+enable_profiling: "Whether enable profiling while training, default: False"
+num_classes: "Class for dataset"
+batch_size: "Batch size for training and evaluation"
+epoch_size: "Total training epochs."
+checkpoint_path: "The location of the checkpoint file."
\ No newline at end of file
--- a/research/cv/glore_res/config/config_resnet101_gpu.yaml
+++ b/research/cv/glore_res/config/config_resnet101_gpu.yaml
+# Builtin Configurations(DO NOT CHANGE THESE CONFIGURATIONS unless you know exactly what you are doing)
+isModelArts: false
+# Url for modelarts
+data_url: ""
+train_url: ""
+checkpoint_url: ""
+# Path for local
+run_distribute: true
+enable_profiling: False
+data_path: "/cache/data"
+output_path: "/cache/train"
+load_path: "/cache/checkpoint_path/"
+device_target: "GPU"
+checkpoint_path: "./checkpoint/"
+
+# ==============================================================================
+# Training options
+batch_size: 32
+class_num: 1000
+epoch_size: 150
+keep_checkpoint_max: 10
+loss_scale: 1024
+lr_decay_mode: poly
+lr_end: 0
+lr_init: 0.1
+lr_max: 0.4
+momentum: 0.08
+pretrain_epoch_size: 0
+use_glore: true
+use_label_smooth: false
+save_checkpoint: true
+save_checkpoint_epochs: 5
+save_checkpoint_path: ./
+warmup_epochs: 0
+weight_decay: 0.0002
+net: "resnet101"
+cast_fp16: true
+
+device_target: "GPU"
+device_id: 0
+device_num: 8
+data_url: ""
+pretrained_ckpt: ""
+parameter_server: ""
+
+
+# Export options
+device_id: 0
+file_name: "resnet200"
+file_format: "MINDIR"
+ckpt_url: ""
+
+# Image options
+image_size: 224
+
+---
+# Help description for each configuration
+isModelArts: "Whether training on modelarts, default: False"
+data_url: "Dataset url for obs"
+checkpoint_url: "The location of checkpoint for obs"
+data_path: "Dataset path for local"
+output_path: "Training output path for local"
+load_path: "The location of checkpoint for obs"
+device_target: "Target device type, available: [Ascend, GPU, CPU]"
+enable_profiling: "Whether enable profiling while training, default: False"
+num_classes: "Class for dataset"
+batch_size: "Batch size for training and evaluation"
+epoch_size: "Total training epochs."
+checkpoint_path: "The location of the checkpoint file."
\ No newline at end of file
--- a/research/cv/glore_res/config/config_resnet200_ascend.yaml
+++ b/research/cv/glore_res/config/config_resnet200_ascend.yaml
+# Builtin Configurations(DO NOT CHANGE THESE CONFIGURATIONS unless you know exactly what you are doing)
+isModelArts: false
+# Url for modelarts
+data_url: ""
+train_url: ""
+checkpoint_url: ""
+# Path for local
+run_distribute: true
+enable_profiling: False
+data_path: "/cache/data"
+output_path: "/cache/train"
+load_path: "/cache/checkpoint_path/"
+device_target: "Ascend"
+checkpoint_path: "./checkpoint/"
+
+# ==============================================================================
+# Training options
+batch_size: 80
+class_num: 1000
+epoch_size: 150
+keep_checkpoint_max: 10
+loss_scale: 1024
+lr_decay_mode: poly
+lr_end: 0
+lr_init: 0.1
+lr_max: 0.4
+momentum: 0.08
+pretrain_epoch_size: 0
+use_glore: true
+save_checkpoint: true
+save_checkpoint_epochs: 5
+save_checkpoint_path: ./
+use_glore: true
+use_label_smooth: false
+warmup_epochs: 0
+weight_decay: 0.0002
+net: "resnet200"
+cast_fp16: true
+
+device_target: "Ascend"
+device_id: 0
+device_num: 8
+data_url: ""
+pretrained_ckpt: ""
+parameter_server: ""
+
+
+# Export options
+device_id: 0
+file_name: "resnet200"
+file_format: "MINDIR"
+ckpt_url: ""
+
+# Image options
+image_size: 224
+
+---
+# Help description for each configuration
+enable_modelarts: "Whether training on modelarts, default: False"
+data_url: "Dataset url for obs"
+checkpoint_url: "The location of checkpoint for obs"
+data_path: "Dataset path for local"
+output_path: "Training output path for local"
+load_path: "The location of checkpoint for obs"
+device_target: "Target device type, available: [Ascend, GPU, CPU]"
+enable_profiling: "Whether enable profiling while training, default: False"
+num_classes: "Class for dataset"
+batch_size: "Batch size for training and evaluation"
+epoch_size: "Total training epochs."
+checkpoint_path: "The location of the checkpoint file."
\ No newline at end of file
--- a/research/cv/glore_res/config/config_resnet200_gpu.yaml
+++ b/research/cv/glore_res/config/config_resnet200_gpu.yaml
+# Builtin Configurations(DO NOT CHANGE THESE CONFIGURATIONS unless you know exactly what you are doing)
+isModelArts: false
+# Url for modelarts
+data_url: ""
+train_url: ""
+checkpoint_url: ""
+# Path for local
+run_distribute: true
+enable_profiling: False
+data_path: "/cache/data"
+output_path: "/cache/train"
+load_path: "/cache/checkpoint_path/"
+device_target: "GPU"
+checkpoint_path: "./checkpoint/"
+
+# ==============================================================================
+# Training options
+batch_size: 32
+class_num: 1000
+epoch_size: 150
+keep_checkpoint_max: 10
+loss_scale: 1024
+lr_decay_mode: poly
+lr_end: 0
+lr_init: 0.1
+lr_max: 0.4
+momentum: 0.08
+pretrain_epoch_size: 0
+use_glore: true
+use_label_smooth: false
+save_checkpoint: true
+save_checkpoint_epochs: 5
+save_checkpoint_path: ./
+warmup_epochs: 0
+weight_decay: 0.0002
+net: "resnet200"
+cast_fp16: true
+
+device_target: "GPU"
+device_id: 0
+device_num: 8
+data_url: ""
+pretrained_ckpt: ""
+parameter_server: ""
+
+
+# Export options
+device_id: 0
+file_name: "resnet200"
+file_format: "MINDIR"
+ckpt_url: ""
+
+# Image options
+image_size: 224
+
+---
+# Help description for each configuration
+isModelArts: "Whether training on modelarts, default: False"
+data_url: "Dataset url for obs"
+checkpoint_url: "The location of checkpoint for obs"
+data_path: "Dataset path for local"
+output_path: "Training output path for local"
+load_path: "The location of checkpoint for obs"
+device_target: "Target device type, available: [Ascend, GPU, CPU]"
+enable_profiling: "Whether enable profiling while training, default: False"
+num_classes: "Class for dataset"
+batch_size: "Batch size for training and evaluation"
+epoch_size: "Total training epochs."
+checkpoint_path: "The location of the checkpoint file."
\ No newline at end of file
--- a/research/cv/glore_res/config/config_resnet50_ascend.yaml
+++ b/research/cv/glore_res/config/config_resnet50_ascend.yaml
+# Builtin Configurations(DO NOT CHANGE THESE CONFIGURATIONS unless you know exactly what you are doing)
+isModelArts: false
+# Url for modelarts
+data_url: ""
+train_url: ""
+checkpoint_url: ""
+# Path for local
+run_distribute: true
+enable_profiling: False
+data_path: "/cache/data"
+output_path: "/cache/train"
+load_path: "/cache/checkpoint_path/"
+device_target: "Ascend"
+checkpoint_path: "./checkpoint/"
+
+# ==============================================================================
+# Training options
+batch_size: 128
+class_num: 1000
+epoch_size: 120
+keep_checkpoint_max: 5
+label_smooth_factor: 0.1
+loss_scale: 1024
+lr_decay_mode: poly
+lr_end: 0.0
+lr_init: 0
+lr_max: 0.6
+momentum: 0.9
+pretrain_epoch_size: 0
+pretrained: false
+save_checkpoint: true
+save_checkpoint_epochs: 5
+save_checkpoint_path: ./
+use_glore: true
+use_autoaugment: true
+use_label_smooth: true
+warmup_epochs: 5
+weight_decay: 0.0001
+weight_init: xavier_uniform
+net: "resnet50"
+cast_fp16: false
+
+device_target: "Ascend"
+device_id: 0
+device_num: 8
+data_url: ""
+pretrained_ckpt: ""
+parameter_server: ""
+
+
+# Export options
+device_id: 0
+file_name: "resnet50"
+file_format: "MINDIR"
+ckpt_url: ""
+
+# Image options
+image_size: 224
+
+---
+# Help description for each configuration
+enable_modelarts: "Whether training on modelarts, default: False"
+data_url: "Dataset url for obs"
+checkpoint_url: "The location of checkpoint for obs"
+data_path: "Dataset path for local"
+output_path: "Training output path for local"
+load_path: "The location of checkpoint for obs"
+device_target: "Target device type, available: [Ascend, GPU, CPU]"
+enable_profiling: "Whether enable profiling while training, default: False"
+num_classes: "Class for dataset"
+batch_size: "Batch size for training and evaluation"
+epoch_size: "Total training epochs."
+checkpoint_path: "The location of the checkpoint file."
\ No newline at end of file
--- a/research/cv/glore_res/config/config_resnet50_gpu.yaml
+++ b/research/cv/glore_res/config/config_resnet50_gpu.yaml
+# Builtin Configurations(DO NOT CHANGE THESE CONFIGURATIONS unless you know exactly what you are doing)
+isModelArts: false
+# Url for modelarts
+data_url: ""
+train_url: ""
+checkpoint_url: ""
+# Path for local
+run_distribute: true
+enable_profiling: False
+data_path: "/cache/data"
+output_path: "/cache/train"
+load_path: "/cache/checkpoint_path/"
+device_target: "GPU"
+checkpoint_path: "./checkpoint/"
+
+# ==============================================================================
+# Training options
+batch_size: 128
+class_num: 1000
+epoch_size: 130
+keep_checkpoint_max: 5
+label_smooth_factor: 0.1
+loss_scale: 1024
+lr_decay_mode: poly
+lr_end: 0.0
+lr_init: 0
+lr_max: 0.6
+momentum: 0.9
+pretrain_epoch_size: 0
+pretrained: false
+save_checkpoint: true
+save_checkpoint_epochs: 5
+save_checkpoint_path: ./
+use_glore: true
+use_autoaugment: true
+use_label_smooth: true
+warmup_epochs: 5
+weight_decay: 0.0001
+weight_init: xavier_uniform
+net: "resnet50"
+cast_fp16: false
+
+device_target: "GPU"
+device_id: 0
+device_num: 8
+data_url: ""
+pretrained_ckpt: ""
+parameter_server: ""
+
+
+# Export options
+device_id: 0
+file_name: "resnet50"
+file_format: "MINDIR"
+ckpt_url: ""
+
+# Image options
+image_size: 224
+
+---
+# Help description for each configuration
+enable_modelarts: "Whether training on modelarts, default: False"
+data_url: "Dataset url for obs"
+checkpoint_url: "The location of checkpoint for obs"
+data_path: "Dataset path for local"
+output_path: "Training output path for local"
+load_path: "The location of checkpoint for obs"
+device_target: "Target device type, available: [Ascend, GPU, CPU]"
+enable_profiling: "Whether enable profiling while training, default: False"
+num_classes: "Class for dataset"
+batch_size: "Batch size for training and evaluation"
+epoch_size: "Total training epochs."
+checkpoint_path: "The location of the checkpoint file."
\ No newline at end of file
--- a/research/cv/glore_res/eval.py
+++ b/research/cv/glore_res/eval.py
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""
+################################eval glore_resnet series################################
+python eval.py
+"""
+
+import os
+import random
+import numpy as np
+
+from mindspore import context
+from mindspore import dataset as de
+from mindspore.train.model import Model
+from mindspore.train.serialization import load_checkpoint, load_param_into_net
+from src.glore_resnet import glore_resnet200, glore_resnet50
+from src.dataset import create_eval_dataset
+from src.dataset import create_dataset_ImageNet as ImageNet
+from src.loss import CrossEntropySmooth, SoftmaxCrossEntropyExpand
+from src.config import config
+
+if config.isModelArts:
+    import moxing as mox
+if config.net == 'resnet200':
+    if config.device_target == "GPU":
+        config.cast_fp16 = False
+
+
+random.seed(1)
+np.random.seed(1)
+de.config.set_seed(1)
+
+if __name__ == '__main__':
+    target = config.device_target
+    # init context
+    device_id = config.device_id
+    context.set_context(mode=context.GRAPH_MODE, device_target=target, save_graphs=False,
+                        device_id=device_id)
+
+    # dataset
+    eval_dataset_path = os.path.join(config.data_url, 'val')
+    if config.isModelArts:
+        mox.file.copy_parallel(src_url=config.data_url, dst_url='/cache/dataset')
+        eval_dataset_path = '/cache/dataset/'
+    if config.net == 'resnet50':
+        predict_data = create_eval_dataset(dataset_path=eval_dataset_path, repeat_num=1, batch_size=config.batch_size)
+    elif config.net == 'resnet200':
+        predict_data = ImageNet(dataset_path=eval_dataset_path,
+                                do_train=False,
+                                repeat_num=1,
+                                batch_size=config.batch_size,
+                                target=target)
+    step_size = predict_data.get_dataset_size()
+    if step_size == 0:
+        raise ValueError("Please check dataset size > 0 and batch_size <= dataset size")
+
+    # define net
+    if config.net == 'resnet50':
+        net = glore_resnet50(class_num=config.class_num, use_glore=config.use_glore)
+    elif config.net == 'resnet200':
+        net = glore_resnet200(class_num=config.class_num, use_glore=config.use_glore)
+
+    # load checkpoint
+    param_dict = load_checkpoint(config.ckpt_url)
+    load_param_into_net(net, param_dict)
+
+    # define loss, model
+    if config.net == 'resnet50':
+        if config.use_label_smooth:
+            loss = CrossEntropySmooth(sparse=True, reduction="mean", smooth_factor=config.label_smooth_factor,
+                                      num_classes=config.class_num)
+    else:
+        loss = SoftmaxCrossEntropyExpand(sparse=True)
+    model = Model(net, loss_fn=loss, metrics={'top_1_accuracy', 'top_5_accuracy'})
+    print("============== Starting Testing ==============")
+    print("ckpt path : {}".format(config.ckpt_url))
+    print("data path : {}".format(eval_dataset_path))
+    acc = model.eval(predict_data)
+    print("==============Acc: {} ==============".format(acc))
--- a/research/cv/glore_res/export.py
+++ b/research/cv/glore_res/export.py
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""
+##############export checkpoint file into air, onnx, mindir models#################
+python export.py
+"""
+import numpy as np
+
+import mindspore.common.dtype as mstype
+from mindspore import Tensor, load_checkpoint, load_param_into_net, export, context
+
+from src.config import config
+
+context.set_context(mode=context.GRAPH_MODE, device_target=config.device_target)
+if config.device_target == "Ascend":
+    context.set_context(device_id=config.device_id)
+
+if __name__ == '__main__':
+    if config.net == 'resnet50':
+        from src.glore_resnet import glore_resnet50
+        net = glore_resnet50(class_num=config.class_num)
+    elif config.net == 'resnet200':
+        from src.glore_resnet import glore_resnet200
+        net = glore_resnet200(class_num=config.class_num)
+
+    assert config.ckpt_url is not None, "config.ckpt_url is None."
+    param_dict = load_checkpoint(config.ckpt_url)
+    load_param_into_net(net, param_dict)
+
+    input_arr = Tensor(np.ones([config.batch_size, 3, 224, 224]), mstype.float32)
+    export(net, input_arr, file_name=config.file_name, file_format=config.file_format)
--- a/research/cv/glore_res/scripts/run_distribute_train.sh
+++ b/research/cv/glore_res/scripts/run_distribute_train.sh
+#!/bin/bash
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+echo "=============================================================================================================="
+echo "Please run the script as: "
+echo "bash run_distribute_train.sh DATA_PATH RANK_TABLE CONFIG_PATH"
+echo "For example: bash run_distribute_train.sh /path/dataset /path/rank_table ../config/config_resnet50_gpu.yaml"
+echo "It is better to use the absolute path."
+echo "=============================================================================================================="
+set -e
+if [ $# != 3 ]
+then
+    echo "Usage: bash run_distribute_train.sh [DATASET_PATH] [RANK_TABLE] [CONFIG_PATH]"
+    exit 1
+fi
+get_real_path(){
+  if [ "${1:0:1}" == "/" ]; then
+    echo "$1"
+  else
+    echo "$(realpath -m $PWD/$1)"
+  fi
+}
+DATA_PATH=$(get_real_path $1)
+export DATA_PATH=${DATA_PATH}
+RANK_TABLE=$(get_real_path $2)
+CONFIG_PATH=$(get_real_path $3)
+export RANK_TABLE_FILE=${RANK_TABLE}
+export RANK_SIZE=8
+
+echo "$EXEC_PATH"
+
+export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python
+
+for((i=0;i<8;i++))
+do
+    rm -rf device$i
+    mkdir device$i
+    cd ./device$i
+    mkdir src
+    cd ../
+    cp ../*.py ./device$i
+    cp ../src/*.py ./device$i/src
+    cd ./device$i
+    export DEVICE_ID=$i
+    export RANK_ID=$i
+    echo "start training for device $i"
+    env > env$i.log
+    python3 train.py --data_url $1 --isModelArts False --run_distribute True --config_path=$CONFIG_PATH > train$i.log 2>&1 &
+    if [ $? -eq 0 ];then
+        echo "start training for device$i" 
+    else
+        echo "training device$i failed"
+    exit 2
+    fi
+    echo "$i finish"
+    cd ../
+done
+
--- a/research/cv/glore_res/scripts/run_distribute_train_gpu.sh
+++ b/research/cv/glore_res/scripts/run_distribute_train_gpu.sh
+#!/bin/bash
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+echo "=============================================================================================================="
+echo "Please run the script as: "
+echo "bash run_distribute_train_gpu.sh DATA_PATH RANK_SIZE CONFIG_PATH"
+echo "For example: bash run_distribute_train.sh /path/dataset 8 ../config/config_resnet50_gpu.yaml"
+echo "It is better to use the absolute path."
+echo "=============================================================================================================="
+if [ $# != 3 ]
+then
+    echo "Usage: bash run_distribute_train_gpu.sh [DATASET_PATH] [RANK_SIZE] [CONFIG_PATH]"
+    exit 1
+fi
+
+get_real_path(){
+  if [ "${1:0:1}" == "/" ]; then
+    echo "$1"
+  else
+    echo "$(realpath -m $PWD/$1)"
+  fi
+}
+
+set -e
+DEVICE_NUM=$2
+DATA_PATH=$(get_real_path $1)
+CONFIG_PATH=$(get_real_path $3)
+export DATA_PATH=${DATA_PATH}
+export DEVICE_NUM=$2
+export RANK_SIZE=$2
+export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python
+
+cd ../
+rm -rf ./train_parallel
+mkdir ./train_parallel
+cd ./train_parallel
+mkdir src
+cd ../
+cp *.py ./train_parallel
+cp src/*.py ./train_parallel/src
+cd ./train_parallel
+env > env.log
+echo "start training"
+    mpirun -n $2 --allow-run-as-root \
+           python3 train.py --data_url=$DATA_PATH --isModelArts=False --run_distribute=True \
+           --device_target="GPU" --config_path=$CONFIG_PATH --device_num $2 > train.log 2>&1 &
+
--- a/research/cv/glore_res/scripts/run_eval.sh
+++ b/research/cv/glore_res/scripts/run_eval.sh
+#!/bin/bash
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+echo "=============================================================================================================="
+echo "Please run the script as: "
+echo "bash run_eval.sh DATA_PATH DEVICE_ID CKPT_PATH CONFIG_PATH"
+echo "For example: bash run_eval.sh /path/dataset 0 /path/ckpt ../config/config_resnet50_ascend.yaml"
+echo "It is better to use the absolute path."
+echo "=============================================================================================================="
+
+if [ $# != 4 ]
+then
+    echo "Usage: bash run_eval.sh [DATASET_PATH] [DEVICE_ID] [CKPT_PATH] [CONFIG_PATH]"
+    exit 1
+fi
+get_real_path(){
+  if [ "${1:0:1}" == "/" ]; then
+    echo "$1"
+  else
+    echo "$(realpath -m $PWD/$1)"
+  fi
+}
+
+set -e
+DATA_PATH=$(get_real_path $1)
+DEVICE_ID=$2
+export DATA_PATH=${DATA_PATH}
+CKPT_PATH=$(get_real_path $3)
+CONFIG_PATH=$(get_real_path $4)
+
+EXEC_PATH=$(pwd)
+
+echo "$EXEC_PATH"
+
+export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python
+
+cd ../
+export DEVICE_ID=$2
+export RANK_ID=0
+env > env0.log
+python3 eval.py --data_url $1 --isModelArts False --device_id $2 --ckpt_url $CKPT_PATH --config_path=$CONFIG_PATH > eval.log 2>&1
+
+if [ $? -eq 0 ];then
+    echo "testing success"
+else
+    echo "testing failed"
+    exit 2
+fi
+echo "finish"
+cd ../
--- a/research/cv/glore_res/scripts/run_standalone_train.sh
+++ b/research/cv/glore_res/scripts/run_standalone_train.sh
+#!/bin/bash
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+echo "=============================================================================================================="
+echo "Please run the script as: "
+echo "bash run_standalone_train.sh DATA_PATH DEVICE_ID CONFIG_PATH"
+echo "For example: bash run_standalone_train.sh /path/dataset 0 ../config/config_resnet50_ascend.yaml"
+echo "It is better to use the absolute path."
+echo "=============================================================================================================="
+
+if [ $# != 3 ]
+then
+    echo "Usage: bash run_standalone_train.sh [DATASET_PATH] [DEVICE_ID] [CONFIG_PATH]"
+    exit 1
+fi
+
+get_real_path(){
+  if [ "${1:0:1}" == "/" ]; then
+    echo "$1"
+  else
+    echo "$(realpath -m $PWD/$1)"
+  fi
+}
+
+set -e
+DATA_PATH=$(get_real_path $1)
+DEVICE_ID=$2
+export DATA_PATH=${DATA_PATH}
+CONFIG_PATH=$(get_real_path $3)
+EXEC_PATH=$(pwd)
+
+echo "$EXEC_PATH"
+
+export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python
+
+cd ../
+export DEVICE_ID=$2
+export RANK_ID=$2
+env > env0.log
+python3 train.py --data_url $1 --isModelArts False --run_distribute False --device_id $2 --config_path $CONFIG_PATH> train.log 2>&1
+
+if [ $? -eq 0 ];then
+    echo "training success"
+else
+    echo "training failed"
+    exit 2
+fi
+echo "finish"
+cd ../
--- a/research/cv/glore_res/src/__init__.py
+++ b/research/cv/glore_res/src/__init__.py
--- a/research/cv/glore_res/src/autoaugment.py
+++ b/research/cv/glore_res/src/autoaugment.py
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""define autoaugment"""
+import os
+import mindspore.dataset.engine as de
+import mindspore.dataset.transforms.c_transforms as c_transforms
+import mindspore.dataset.vision.c_transforms as c_vision
+from mindspore import dtype as mstype
+from mindspore.communication.management import init, get_rank, get_group_size
+
+# define Auto Augmentation operators
+PARAMETER_MAX = 10
+
+
+def float_parameter(level, maxval):
+    return float(level) * maxval / PARAMETER_MAX
+
+
+def int_parameter(level, maxval):
+    return int(level * maxval / PARAMETER_MAX)
+
+
+def shear_x(level):
+    v = float_parameter(level, 0.3)
+    return c_transforms.RandomChoice(
+        [c_vision.RandomAffine(degrees=0, shear=(-v, -v)), c_vision.RandomAffine(degrees=0, shear=(v, v))])
+
+
+def shear_y(level):
+    v = float_parameter(level, 0.3)
+    return c_transforms.RandomChoice(
+        [c_vision.RandomAffine(degrees=0, shear=(0, 0, -v, -v)), c_vision.RandomAffine(degrees=0, shear=(0, 0, v, v))])
+
+
+def translate_x(level):
+    v = float_parameter(level, 150 / 331)
+    return c_transforms.RandomChoice(
+        [c_vision.RandomAffine(degrees=0, translate=(-v, -v)), c_vision.RandomAffine(degrees=0, translate=(v, v))])
+
+
+def translate_y(level):
+    v = float_parameter(level, 150 / 331)
+    return c_transforms.RandomChoice([c_vision.RandomAffine(degrees=0, translate=(0, 0, -v, -v)),
+                                      c_vision.RandomAffine(degrees=0, translate=(0, 0, v, v))])
+
+
+def color_impl(level):
+    v = float_parameter(level, 1.8) + 0.1
+    return c_vision.RandomColor(degrees=(v, v))
+
+
+def rotate_impl(level):
+    v = int_parameter(level, 30)
+    return c_transforms.RandomChoice(
+        [c_vision.RandomRotation(degrees=(-v, -v)), c_vision.RandomRotation(degrees=(v, v))])
+
+
+def solarize_impl(level):
+    level = int_parameter(level, 256)
+    v = 256 - level
+    return c_vision.RandomSolarize(threshold=(0, v))
+
+
+def posterize_impl(level):
+    level = int_parameter(level, 4)
+    v = 4 - level
+    return c_vision.RandomPosterize(bits=(v, v))
+
+
+def contrast_impl(level):
+    v = float_parameter(level, 1.8) + 0.1
+    return c_vision.RandomColorAdjust(contrast=(v, v))
+
+
+def autocontrast_impl(level):
+    return c_vision.AutoContrast()
+
+
+def sharpness_impl(level):
+    v = float_parameter(level, 1.8) + 0.1
+    return c_vision.RandomSharpness(degrees=(v, v))
+
+
+def brightness_impl(level):
+    v = float_parameter(level, 1.8) + 0.1
+    return c_vision.RandomColorAdjust(brightness=(v, v))
+
+
+# define the Auto Augmentation policy
+imagenet_policy = [
+    [(posterize_impl(8), 0.4), (rotate_impl(9), 0.6)],
+    [(solarize_impl(5), 0.6), (autocontrast_impl(5), 0.6)],
+    [(c_vision.Equalize(), 0.8), (c_vision.Equalize(), 0.6)],
+    [(posterize_impl(7), 0.6), (posterize_impl(6), 0.6)],
+    [(c_vision.Equalize(), 0.4), (solarize_impl(4), 0.2)],
+
+    [(c_vision.Equalize(), 0.4), (rotate_impl(8), 0.8)],
+    [(solarize_impl(3), 0.6), (c_vision.Equalize(), 0.6)],
+    [(posterize_impl(5), 0.8), (c_vision.Equalize(), 1.0)],
+    [(rotate_impl(3), 0.2), (solarize_impl(8), 0.6)],
+    [(c_vision.Equalize(), 0.6), (posterize_impl(6), 0.4)],
+
+    [(rotate_impl(8), 0.8), (color_impl(0), 0.4)],
+    [(rotate_impl(9), 0.4), (c_vision.Equalize(), 0.6)],
+    [(c_vision.Equalize(), 0.0), (c_vision.Equalize(), 0.8)],
+    [(c_vision.Invert(), 0.6), (c_vision.Equalize(), 1.0)],
+    [(color_impl(4), 0.6), (contrast_impl(8), 1.0)],
+
+    [(rotate_impl(8), 0.8), (color_impl(2), 1.0)],
+    [(color_impl(8), 0.8), (solarize_impl(7), 0.8)],
+    [(sharpness_impl(7), 0.4), (c_vision.Invert(), 0.6)],
+    [(shear_x(5), 0.6), (c_vision.Equalize(), 1.0)],
+    [(color_impl(0), 0.4), (c_vision.Equalize(), 0.6)],
+
+    [(c_vision.Equalize(), 0.4), (solarize_impl(4), 0.2)],
+    [(solarize_impl(5), 0.6), (autocontrast_impl(5), 0.6)],
+    [(c_vision.Invert(), 0.6), (c_vision.Equalize(), 1.0)],
+    [(color_impl(4), 0.6), (contrast_impl(8), 1.0)],
+    [(c_vision.Equalize(), 0.8), (c_vision.Equalize(), 0.6)],
+]
+
+
+def autoaugment(dataset_path, repeat_num=1, batch_size=32, target="Ascend"):
+    """
+    define dataset with autoaugment
+    """
+    if target == "Ascend":
+        device_num, rank_id = _get_rank_info()
+    else:
+        init("nccl")
+        rank_id = get_rank()
+        device_num = get_group_size()
+
+    if device_num == 1:
+        ds = de.ImageFolderDataset(dataset_path, num_parallel_workers=8, shuffle=True)
+    else:
+        ds = de.ImageFolderDataset(dataset_path, num_parallel_workers=8, shuffle=True,
+                                   num_shards=device_num, shard_id=rank_id)
+
+    image_size = 224
+    mean = [0.485 * 255, 0.456 * 255, 0.406 * 255]
+    std = [0.229 * 255, 0.224 * 255, 0.225 * 255]
+    trans = [
+        c_vision.RandomCropDecodeResize(image_size, scale=(0.08, 1.0), ratio=(0.75, 1.333)),
+    ]
+
+    post_trans = [
+        c_vision.RandomHorizontalFlip(prob=0.5),
+        c_vision.Normalize(mean=mean, std=std),
+        c_vision.HWC2CHW()
+    ]
+    dataset = ds.map(operations=trans, input_columns="image")
+    dataset = dataset.map(operations=c_vision.RandomSelectSubpolicy(imagenet_policy), input_columns=["image"])
+    dataset = dataset.map(operations=post_trans, input_columns="image")
+
+    type_cast_op = c_transforms.TypeCast(mstype.int32)
+    dataset = dataset.map(operations=type_cast_op, input_columns="label")
+    # apply the batch operation
+    dataset = dataset.batch(batch_size, drop_remainder=True)
+    # apply the repeat operation
+    dataset = dataset.repeat(repeat_num)
+
+    return dataset
+
+
+def _get_rank_info():
+    """
+    get rank size and rank id
+    """
+    rank_size = int(os.environ.get("RANK_SIZE", "1"))
+    rank_id = int(os.environ.get("RANK_ID", "0"))
+    return rank_size, rank_id
--- a/research/cv/glore_res/src/config.py
+++ b/research/cv/glore_res/src/config.py
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+"""Parse arguments"""
+
+import os
+import ast
+import argparse
+from pprint import pprint, pformat
+import yaml
+
+_config_path = "../config/config_resnet50_gpu.yaml"
+
+class Config:
+    """
+    Configuration namespace. Convert dictionary to members.
+    """
+    def __init__(self, cfg_dict):
+        for k, v in cfg_dict.items():
+            if isinstance(v, (list, tuple)):
+                setattr(self, k, [Config(x) if isinstance(x, dict) else x for x in v])
+            else:
+                setattr(self, k, Config(v) if isinstance(v, dict) else v)
+
+    def __str__(self):
+        return pformat(self.__dict__)
+
+    def __repr__(self):
+        return self.__str__()
+
+
+def parse_cli_to_yaml(parser, cfg, helper=None, choices=None, cfg_path="resnet50_cifar10_config.yaml"):
+    """
+    Parse command line arguments to the configuration according to the default yaml.
+
+    Args:
+        parser: Parent parser.
+        cfg: Base configuration.
+        helper: Helper description.
+        cfg_path: Path to the default yaml config.
+    """
+    parser = argparse.ArgumentParser(description="[REPLACE THIS at config.py]",
+                                     parents=[parser])
+    helper = {} if helper is None else helper
+    choices = {} if choices is None else choices
+    for item in cfg:
+        if not isinstance(cfg[item], list) and not isinstance(cfg[item], dict):
+            help_description = helper[item] if item in helper else "Please reference to {}".format(cfg_path)
+            choice = choices[item] if item in choices else None
+            if isinstance(cfg[item], bool):
+                parser.add_argument("--" + item, type=ast.literal_eval, default=cfg[item], choices=choice,
+                                    help=help_description)
+            else:
+                parser.add_argument("--" + item, type=type(cfg[item]), default=cfg[item], choices=choice,
+                                    help=help_description)
+    args = parser.parse_args()
+    return args
+
+
+def parse_yaml(yaml_path):
+    """
+    Parse the yaml config file.
+
+    Args:
+        yaml_path: Path to the yaml config.
+    """
+    with open(yaml_path, 'r') as fin:
+        try:
+            cfgs = yaml.load_all(fin.read(), Loader=yaml.FullLoader)
+            cfgs = [x for x in cfgs]
+            if len(cfgs) == 1:
+                cfg_helper = {}
+                cfg = cfgs[0]
+                cfg_choices = {}
+            elif len(cfgs) == 2:
+                cfg, cfg_helper = cfgs
+                cfg_choices = {}
+            elif len(cfgs) == 3:
+                cfg, cfg_helper, cfg_choices = cfgs
+            else:
+                raise ValueError("At most 3 docs (config description for help, choices) are supported in config yaml")
+            print(cfg_helper)
+        except:
+            raise ValueError("Failed to parse yaml")
+    return cfg, cfg_helper, cfg_choices
+
+
+def merge(args, cfg):
+    """
+    Merge the base config from yaml file and command line arguments.
+
+    Args:
+        args: Command line arguments.
+        cfg: Base configuration.
+    """
+    args_var = vars(args)
+    for item in args_var:
+        cfg[item] = args_var[item]
+    return cfg
+
+
+def get_config():
+    """
+    Get Config according to the yaml file and cli arguments.
+    """
+    parser = argparse.ArgumentParser(description="default name", add_help=False)
+    current_dir = os.path.dirname(os.path.abspath(__file__))
+    parser.add_argument("--config_path", type=str, default=os.path.join(current_dir, \
+        "../config/config_resnet50_gpu.yaml"), help="Config file path")
+    path_args, _ = parser.parse_known_args()
+    default, helper, choices = parse_yaml(path_args.config_path)
+    pprint(default)
+    args = parse_cli_to_yaml(parser=parser, cfg=default, helper=helper, choices=choices, cfg_path=path_args.config_path)
+    final_config = merge(args, default)
+    return Config(final_config)
+
+config = get_config()
--- a/research/cv/glore_res/src/dataset.py
+++ b/research/cv/glore_res/src/dataset.py
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""
+create train or eval dataset.
+"""
+import os
+import mindspore.common.dtype as mstype
+import mindspore.dataset as ds
+import mindspore.dataset.vision.c_transforms as C
+from mindspore.dataset.vision import Inter
+import mindspore.dataset.transforms.c_transforms as C2
+from mindspore.communication.management import init, get_rank, get_group_size
+from src.transform import RandAugment
+from src.config import config
+
+def cifar10(dataset_path, do_train, repeat_num=1, batch_size=32, target="Ascend", distribute=False):
+    """
+    create a train or evaluate cifar10 dataset for resnet50
+    Args:
+        dataset_path(string): the path of dataset.
+        do_train(bool): whether dataset is used for train or eval.
+        repeat_num(int): the repeat times of dataset. Default: 1
+        batch_size(int): the batch size of dataset. Default: 32
+        target(str): the device target. Default: Ascend
+        distribute(bool): data for distribute or not. Default: False
+
+    Returns:
+        dataset
+    """
+    if target == "Ascend":
+        device_num, rank_id = _get_rank_info()
+    else:
+        if distribute:
+            init()
+            rank_id = get_rank()
+            device_num = get_group_size()
+        else:
+            device_num = 1
+    if device_num == 1:
+        data_set = ds.Cifar10Dataset(dataset_path, num_parallel_workers=8, shuffle=True)
+    else:
+        data_set = ds.Cifar10Dataset(dataset_path, num_parallel_workers=8, shuffle=True,
+                                     num_shards=device_num, shard_id=rank_id)
+
+    # define map operations
+    trans = []
+    if do_train:
+        trans += [
+            C.RandomCrop((32, 32), (4, 4, 4, 4)),
+            C.RandomHorizontalFlip(prob=0.5)
+        ]
+
+    trans += [
+        C.Resize((config.image_size, config.image_size)),
+        C.Rescale(1.0 / 255.0, 0.0),
+        C.Normalize([0.4914, 0.4822, 0.4465], [0.2023, 0.1994, 0.2010]),
+        C.HWC2CHW()
+    ]
+
+    type_cast_op = C2.TypeCast(mstype.int32)
+
+    data_set = data_set.map(operations=type_cast_op, input_columns="label", num_parallel_workers=8)
+    data_set = data_set.map(operations=trans, input_columns="image", num_parallel_workers=8)
+
+    # apply batch operations
+    data_set = data_set.batch(batch_size, drop_remainder=True)
+    # apply dataset repeat operation
+    data_set = data_set.repeat(repeat_num)
+
+    return data_set
+
+
+def create_train_dataset(dataset_path, repeat_num=1, batch_size=32, target="Ascend"):
+    """
+    create a train or eval imagenet2012 dataset for resnet50
+
+    Args:
+        dataset_path(string): the path of dataset.
+        repeat_num(int): the repeat times of dataset. Default: 1
+        batch_size(int): the batch size of dataset. Default: 32
+        target(str): the device target. Default: Ascend
+        distribute(bool): data for distribute or not. Default: False
+
+    Returns:
+        dataset
+    """
+    if target == "Ascend":
+        device_num, rank_id = _get_rank_info()
+
+    if device_num == 1:
+        data_set = ds.ImageFolderDataset(dataset_path, num_parallel_workers=8, shuffle=True)
+    else:
+        data_set = ds.ImageFolderDataset(dataset_path, num_parallel_workers=8, shuffle=True,
+                                         num_shards=device_num, shard_id=rank_id)
+
+    image_size = config.image_size
+    mean = [0.485 * 255, 0.456 * 255, 0.406 * 255]
+    std = [0.229 * 255, 0.224 * 255, 0.225 * 255]
+
+    # define map operations
+
+    trans = [
+        C.RandomCropDecodeResize(image_size, scale=(0.08, 1.0), ratio=(0.75, 1.333)),
+        C.RandomHorizontalFlip(prob=0.5),
+        C.Normalize(mean=mean, std=std),
+        C.HWC2CHW()
+    ]
+
+    type_cast_op = C2.TypeCast(mstype.int32)
+
+    data_set = data_set.map(operations=trans, input_columns="image", num_parallel_workers=8)
+    data_set = data_set.map(operations=type_cast_op, input_columns="label", num_parallel_workers=8)
+
+    # apply batch operations
+    data_set = data_set.batch(batch_size, drop_remainder=True)
+
+    # apply dataset repeat operation
+    data_set = data_set.repeat(repeat_num)
+
+    return data_set
+
+
+def create_eval_dataset(dataset_path, repeat_num=1, batch_size=32, target="Ascend"):
+    """
+    create a train or eval imagenet2012 dataset for resnet50
+
+    Args:
+        dataset_path(string): the path of dataset.
+        repeat_num(int): the repeat times of dataset. Default: 1
+        batch_size(int): the batch size of dataset. Default: 32
+        target(str): the device target. Default: Ascend
+    Returns:
+        dataset
+    """
+    data_set = ds.ImageFolderDataset(dataset_path, num_parallel_workers=8, shuffle=True)
+
+    image_size = config.image_size
+    mean = [0.485 * 255, 0.456 * 255, 0.406 * 255]
+    std = [0.229 * 255, 0.224 * 255, 0.225 * 255]
+
+    # define map operations
+
+    trans = [
+        C.Decode(),
+        C.Resize(256),
+        C.CenterCrop(image_size),
+        C.Normalize(mean=mean, std=std),
+        C.HWC2CHW()
+    ]
+
+    type_cast_op = C2.TypeCast(mstype.int32)
+
+    data_set = data_set.map(operations=trans, input_columns="image", num_parallel_workers=8)
+    data_set = data_set.map(operations=type_cast_op, input_columns="label", num_parallel_workers=8)
+
+    # apply batch operations
+    data_set = data_set.batch(batch_size, drop_remainder=True)
+
+    # apply dataset repeat operation
+    data_set = data_set.repeat(repeat_num)
+
+    return data_set
+
+def create_dataset_ImageNet(dataset_path, do_train, use_randaugment=False, repeat_num=1, batch_size=32,
+                            target="Ascend"):
+    """
+        create a train or eval imagenet2012 dataset for resnet50
+
+        Args:
+            dataset_path(string): the path of dataset.
+            do_train(bool): whether dataset is used for train or eval.
+            use_randaugment(bool): enable randAugment.
+            repeat_num(int): the repeat times of dataset. Default: 1
+            batch_size(int): the batch size of dataset. Default: 32
+            target(str): the device target. Default: Ascend
+
+        Returns:
+            dataset
+        """
+    if target == "Ascend":
+        device_num, rank_id = _get_rank_info()
+    elif target == "GPU":
+        init("nccl")
+        rank_id = get_rank()
+        device_num = get_group_size()
+
+    if device_num == 1:
+        da = ds.ImageFolderDataset(dataset_path, num_parallel_workers=8, shuffle=True)
+    else:
+        da = ds.ImageFolderDataset(dataset_path, num_parallel_workers=8, shuffle=True,
+                                   num_shards=device_num, shard_id=rank_id)
+
+    image_size = 224
+    mean = [0.485 * 255, 0.456 * 255, 0.406 * 255]
+    std = [0.229 * 255, 0.224 * 255, 0.225 * 255]
+
+    # define map operations
+
+    if do_train:
+        if use_randaugment:
+            trans = [
+                C.Decode(),
+                C.RandomResizedCrop(size=(image_size, image_size),
+                                    scale=(0.08, 1.0),
+                                    ratio=(3. / 4., 4. / 3.),
+                                    interpolation=Inter.BICUBIC),
+                C.RandomHorizontalFlip(prob=0.5),
+            ]
+        else:
+            trans = [
+                C.RandomCropDecodeResize(image_size, scale=(0.08, 1.0), ratio=(0.75, 1.333)),
+                C.RandomHorizontalFlip(prob=0.5),
+                C.Normalize(mean=mean, std=std),
+                C.HWC2CHW()
+            ]
+
+    else:
+        use_randaugment = False
+        trans = [
+            C.Decode(),
+            C.Resize(256),
+            C.CenterCrop(image_size),
+            C.Normalize(mean=mean, std=std),
+            C.HWC2CHW()
+        ]
+
+    type_cast_op = C2.TypeCast(mstype.int32)
+
+    da = da.map(input_columns="image", num_parallel_workers=8, operations=trans)
+    da = da.map(input_columns="label", num_parallel_workers=8, operations=type_cast_op)
+
+    # apply batch operations
+    if use_randaugment:
+        efficient_rand_augment = RandAugment()
+        da = da.batch(batch_size,
+                      per_batch_map=efficient_rand_augment,
+                      input_columns=['image', 'label'],
+                      num_parallel_workers=2,
+                      drop_remainder=True)
+    else:
+        da = da.batch(batch_size, drop_remainder=True)
+    # apply dataset repeat operation
+    da = da.repeat(repeat_num)
+
+    return da
+
+def _get_rank_info():
+    """
+    get rank size and rank id
+    """
+    rank_size = int(os.environ.get("RANK_SIZE", 1))
+
+    if rank_size > 1:
+        rank_size = get_group_size()
+        rank_id = get_rank()
+    else:
+        rank_size = 1
+        rank_id = 0
+
+    return rank_size, rank_id
--- a/research/cv/glore_res/src/glore_resnet.py
+++ b/research/cv/glore_res/src/glore_resnet.py
--- a/research/cv/glore_res/src/loss.py
+++ b/research/cv/glore_res/src/loss.py
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""define loss for glore_resnet50"""
+import mindspore.nn as nn
+from mindspore import Tensor
+from mindspore.common import dtype as mstype
+from mindspore.nn.loss.loss import LossBase
+from mindspore.ops import functional as F
+from mindspore.ops import operations as P
+import mindspore.ops as ops
+
+
+class SoftmaxCrossEntropyExpand(nn.Cell):
+    '''SoftmaxCrossEntropy'''
+    def __init__(self, sparse=False):
+        super(SoftmaxCrossEntropyExpand, self).__init__()
+        self.exp = ops.Exp()
+        self.sum = ops.ReduceSum(keep_dims=True)
+        self.onehot = ops.OneHot()
+        self.on_value = Tensor(1.0, mstype.float32)
+        self.off_value = Tensor(0.0, mstype.float32)
+        self.div = ops.RealDiv()
+        self.log = ops.Log()
+        self.sum_cross_entropy = ops.ReduceSum(keep_dims=False)
+        self.mul = ops.Mul()
+        self.mul2 = ops.Mul()
+        self.mean = ops.ReduceMean(keep_dims=False)
+        self.sparse = sparse
+        self.max = ops.ReduceMax(keep_dims=True)
+        self.sub = ops.Sub()
+        self.eps = Tensor(1e-24, mstype.float32)
+
+    def construct(self, logit, label):
+        '''construct SoftmaxCrossEntropy'''
+        logit_max = self.max(logit, -1)
+        exp = self.exp(self.sub(logit, logit_max))
+        exp_sum = self.sum(exp, -1)
+        softmax_result = self.div(exp, exp_sum)
+        if self.sparse:
+            label = self.onehot(label, ops.shape(logit)[1], self.on_value, self.off_value)
+
+        softmax_result_log = self.log(softmax_result + self.eps)
+        loss = self.sum_cross_entropy((self.mul(softmax_result_log, label)), -1)
+        loss = self.mul2(ops.scalar_to_array(-1.0), loss)
+        loss = self.mean(loss, -1)
+
+        return loss
+
+
+class CrossEntropySmooth(LossBase):
+    """CrossEntropy"""
+
+    def __init__(self, sparse=True, reduction='mean', smooth_factor=0., num_classes=1000):
+        super(CrossEntropySmooth, self).__init__()
+        self.onehot = P.OneHot()
+        self.sparse = sparse
+        self.on_value = Tensor(1.0 - smooth_factor, mstype.float32)
+        self.off_value = Tensor(1.0 * smooth_factor / (num_classes - 1), mstype.float32)
+        self.ce = nn.SoftmaxCrossEntropyWithLogits(reduction=reduction)
+
+    def construct(self, logit, label):
+        if self.sparse:
+            label = self.onehot(label, F.shape(logit)[1], self.on_value, self.off_value)
+        loss = self.ce(logit, label)
+        return loss