!2869 Implemented training and evaluation of simple baselines on a GPU

Merge pull request !2869 from Marina Molchanova/simple_baseline_gpu

!2869 Implemented training and evaluation of simple baselines on a GPU
Merge pull request !2869 from Marina Molchanova/simple_baseline_gpu
99450158 · i-robot · Gitee · 248d58d7 · 2d984191 · 99450158
Unverified Commit 99450158 authored 2 years ago by i-robot Committed by Gitee 2 years ago
--- a/research/cv/simple_baselines/README.md
+++ b/research/cv/simple_baselines/README.md
+# Contents
+
+<!-- TOC -->
+
+[查看中文](./README_CN.md)
+
+- [Simple Baselines Description](#simple_baselines-description)
+- [Model Architecture](#model-architecture)
+- [Dataset](#dataset)
+- [Features](#features)
+    - [Mixed Precision](#mixed-precision)
+- [Environment Requirements](#environment-requirements)
+- [Quick Start](#quick-start)
+- [Script Description](#script-description)
+    - [Script and Sample Code](#script-and-sample-code)
+    - [Script Parameters](#script-parameters)
+    - [Training Process](#training-process)
+        - [Usage](#usage1)
+        - [Result](#result1)
+    - [Evaluation Process](#evaluation-process)
+        - [Usage](#usage2)
+        - [Result](#result2)
+    - [Inference Process](#inference-process)
+        - [Model Export](#model-export)
+        - [Infer on Ascend310](#infer-ascend310)
+        - [Result](#result)
+- [Model Description](#model-description)
+    - [Performance](#performance)
+- [Description of Random State](#description-of-random-state)
+- [ModelZoo Homepage](#ModelZoo-homepage)
+
+<!-- /TOC -->
+
+# Simple Baselines Description
+
+## Overview
+
+Simple Baselines proposed by Bin Xiao, Haiping Wu, and Yichen Wei from Microsoft Research Asia. The authors believe that
+the current popular human pose estimation and tracking methods are too complicated. The existing human pose estimation and
+pose tracking models seem to be quite different in structure, but It's really close in terms of performance. The author proposes
+a simple and effective baseline method by adding a deconvolution layer on the backbone network ResNet, which is precisely the
+simplest method to estimate the heatmap from the high and low resolution feature maps, thereby helping to stimulate and evaluate
+new ideas in the field.
+
+For more details refer to [paper](https://arxiv.org/pdf/1804.06208.pdf).
+Mindspore implementation is based on [original pytorch version](https://github.com/microsoft/human-pose-estimation.pytorch) released by Microsoft Asia Research Institute.
+
+## Paper
+
+[Paper](https://arxiv.org/pdf/1804.06208.pdf): Bin Xiao, Haiping Wu, Yichen Wei "Simple baselines for human pose estimation and tracking"
+
+# Model Architecture
+
+The overall network architecture of simple baselines is [here](https://arxiv.org/pdf/1804.06208.pdf).
+
+# Dataset
+
+Dataset used: [COCO2017](https://gitee.com/link?target=https%3A%2F%2Fcocodataset.org%2F%23download)
+
+- Dataset size：
+    - Train：19.56GB, 57k images, 149813 person instances
+    - Test：825MB, 5k images, 6352 person instances
+- Data Format：JPG
+    - Note: Data is processed in src/dataset.py
+
+# Features
+
+## Mixed Precision
+
+The [mixed precision](https://www.mindspore.cn/tutorials/experts/en/master/others/mixed_precision.html) training
+method accelerates the deep learning neural network training process by using both the single-precision and half-precision
+data types, and maintains the network precision achieved by the single-precision training at the same time. Mixed precision
+training can accelerate the computation process, reduce memory usage, and enable a larger model or batch size to be trained
+on specific hardware. For FP16 operators, if the input data type is FP32, the backend of MindSpore will automatically handle
+it with reduced precision. Users could check the reduced-precision operators by enabling INFO log and then searching ‘reduce precision’.
+
+# Environment Requirements
+
+- Hardware（Ascend/GPU）
+    - Prepare hardware environment with Ascend or GPU.
+- Framework
+    - [MindSpore](https://www.mindspore.cn/install/en)
+- For more information about MindSpore, please check the resources below：
+    - [MindSpore Tutorials](https://www.mindspore.cn/tutorials/zh-CN/master/index.html)
+    - [MindSpore Python API](https://www.mindspore.cn/docs/api/zh-CN/master/index.html)
+
+# Quick Start
+
+After installing MindSpore through the official website, you can follow the steps below for training and evaluation.
+
+- Dataset preparation
+
+The simple baselines uses the COCO2017 dataset for training and evaluation. Download dataset from [official website](https://cocodataset.org/).
+
+- Running on Ascend
+
+```shell
+# Distributed training
+bash scripts/run_distribute_train.sh RANK_TABLE
+
+# Standalone training
+bash scripts/run_standalone_train.sh DEVICE_ID
+
+# Evaluation
+bash scripts/run_eval.sh
+```
+
+- Running on GPU
+
+```shell
+# Distributed training
+bash scripts/run_distribute_train_gpu.sh DEVICE_NUM
+
+# Standalone training
+bash scripts/run_standalone_train_gpu.sh DEVICE_ID
+
+# Evaluation
+bash scripts/run_eval_gpu.sh DEVICE_ID
+```
+
+# Script Description
+
+## Script and Sample Code
+
+```text
+.
+└──simple_baselines
+  ├── README.md
+  ├── scripts
+    ├── run_distribute_train.sh            # train on Ascend
+    ├── run_distribute_train_gpu.sh        # train on GPU
+    ├── run_eval.sh                        # eval on Ascend
+    ├── run_eval_gpu.sh                    # eval on GPU
+    ├── run_standalone_train.sh            # train on Ascend
+    ├── run_standalone_train_gpu.sh        # train on GPU
+    └── run_infer_310.sh
+  ├── src
+    ├── utils
+        ├── coco.py                        # COCO dataset evaluation results
+        ├── nms.py
+        └── transforms.py                  # Image processing conversion
+    ├── config.py
+    ├── dataset.py                         # Data preprocessing
+    ├── network_with_loss.py               # Loss function
+    ├── pose_resnet.py                     # Backbone network
+    └── predict.py                         # Heatmap key point prediction
+  ├── export.py
+  ├── postprocess.py
+  ├── preprocess.py
+  ├── eval.py
+  └── train.py
+```
+
+## Script Parameters
+
+Before training configure parameters and paths in src/config.py.
+
+- Model parameters:
+
+```text
+config.MODEL.INIT_WEIGHTS = True                                 # Initialize model weights
+config.MODEL.PRETRAINED = 'resnet50.ckpt'                        # Pre-trained model
+config.MODEL.NUM_JOINTS = 17                                     # Number of key points
+config.MODEL.IMAGE_SIZE = [192, 256]                             # Image size
+```
+
+- Network parameters:
+
+```text
+config.NETWORK.NUM_LAYERS = 50                                   # Resnet backbone layers
+config.NETWORK.DECONV_WITH_BIAS = False                          # Network deconvolution bias
+config.NETWORK.NUM_DECONV_LAYERS = 3                             # Number of network deconvolution layers
+config.NETWORK.NUM_DECONV_FILTERS = [256, 256, 256]              # Deconvolution layer filter size
+config.NETWORK.NUM_DECONV_KERNELS = [4, 4, 4]                    # Deconvolution layer kernel size
+config.NETWORK.FINAL_CONV_KERNEL = 1                             # Final convolutional layer kernel size
+config.NETWORK.HEATMAP_SIZE = [48, 64]
+```
+
+- Training parameters:
+
+```text
+config.TRAIN.SHUFFLE = True
+config.TRAIN.BATCH_SIZE = 64
+config.TRAIN.BEGIN_EPOCH = 0
+config.TRAIN.END_EPOCH = 140
+config.TRAIN.LR = 0.001
+config.TRAIN.LR_FACTOR = 0.1                 # learning rate reduction factor
+config.TRAIN.LR_STEP = [90, 120]
+config.TRAIN.NUM_PARALLEL_WORKERS = 8
+config.TRAIN.SAVE_CKPT = True
+config.TRAIN.CKPT_PATH = "./model"           # directory of pretrained resnet50 and to save ckpt
+config.TRAIN.SAVE_CKPT_EPOCH = 3
+config.TRAIN.KEEP_CKPT_MAX = 10
+```
+
+- Evaluation parameters:
+
+```text
+config.TEST.BATCH_SIZE = 32
+config.TEST.FLIP_TEST = True
+config.TEST.USE_GT_BBOX = False
+```
+
+- nms parameters:
+
+```text
+config.TEST.OKS_THRE = 0.9                                       # OKS threshold
+config.TEST.IN_VIS_THRE = 0.2                                    # Visualization threshold
+config.TEST.BBOX_THRE = 1.0                                      # Candidate box threshold
+config.TEST.IMAGE_THRE = 0.0                                     # Image threshold
+config.TEST.NMS_THRE = 1.0                                       # nms threshold
+```
+
+## Training Process
+
+### Usage
+
+- Ascend
+
+```shell
+# Distributed training 8p
+bash scripts/run_distribute_train.sh RANK_TABLE
+
+# Standalone training
+bash scripts/run_standalone_train.sh DEVICE_ID
+
+# Evaluation
+bash scripts/run_eval.sh
+```
+
+- GPU
+
+```shell
+# Distributed training
+bash scripts/run_distribute_train_gpu.sh DEVICE_NUM
+
+# Standalone training
+bash scripts/run_standalone_train_gpu.sh DEVICE_ID
+
+# Evaluation
+bash scripts/run_eval_gpu.sh DEVICE_ID
+```
+
+### Result
+
+- Use COCO2017 dataset to train simple_baselines
+
+```text
+# Standalone training results （1P）
+epoch:1 step:2340, loss is 0.0008106
+epoch:2 step:2340, loss is 0.0006160
+epoch:3 step:2340, loss is 0.0006480
+epoch:4 step:2340, loss is 0.0005620
+epoch:5 step:2340, loss is 0.0005207
+...
+epoch:138 step:2340, loss is 0.0003183
+epoch:139 step:2340, loss is 0.0002866
+epoch:140 step:2340, loss is 0.0003393
+```
+
+## Evaluation Process
+
+### Usage
+
+The corresponding model inference can be performed by changing the "config.TEST.MODEL_FILE" file in the config.py file.
+Use val2017 in the COCO2017 dataset folder to evaluate simple_baselines.
+
+- Ascend
+
+```shell
+# Evaluation
+bash scripts/run_eval.sh
+```
+
+- GPU
+
+```shell
+# Evaluation
+bash scripts/run_eval_gpu.sh DEVICE_ID
+```
+
+### Result
+
+results will be saved in keypoints_results.pkl
+
+```text
+AP: 0.704
+```
+
+## Inference Process
+
+### Model Export
+
+- Export in local
+
+```shell
+python export.py
+```
+
+- Export in ModelArts (If you want to run in modelarts, please check [modelarts official document](https://support.huaweicloud.com/modelarts/).
+
+```text
+# (1) Upload the code folder to S3 bucket.
+# (2) Click to "create training task" on the website UI interface.
+# (3) Set the code directory to "/{path}/simple_pose" on the website UI interface.
+# (4) Set the startup file to /{path}/simple_pose/export.py" on the website UI interface.
+# (5) Perform a .
+#     a. setting parameters in /{path}/simple_pose/default_config.yaml.
+#         1. Set ”enable_modelarts: True“
+#         2. Set “TEST.MODEL_FILE: ./{path}/*.ckpt”('TEST.MODEL_FILE' indicates the path of the weight file to be exported relative to the file `export.py`, and the weight file must be included in the code directory.)
+#         3. Set ”EXPORT.FILE_NAME: simple_pose“
+#         4. Set ”EXPORT.FILE_FORMAT：MINDIR“
+# (7) Check the "data storage location" on the website UI interface and set the "Dataset path" path (This step is useless, but necessary.).
+# (8) Set the "Output file path" and "Job log path" to your path on the website UI interface.
+# (9) Under the item "resource pool selection", select the specification of a single card.
+# (10) Create your job.
+# You will see simple_pose.mindir under {Output file path}.
+```
+
+`FILE_FORMAT` should be in ["AIR", "MINDIR"]
+
+### 310 inference
+
+Before performing inference, the mindir file must bu exported by export.py script. We only provide an example of inference using MINDIR model.
+When the network is processing the dataset, if the last batch is not enough, it will not be automatically supplemented. Better set batch_size to 1.
+
+```shell
+# Ascend310 inference
+bash run_infer_310.sh [MINDIR_PATH] [NEED_PREPROCESS] [DEVICE_ID]
+```
+
+- `NEED_PREPROCESS` indicates that dataset is processed in binary format, value are "y" or "n".
+- `DEVICE_ID` optional, default value is 0.
+
+### Result
+
+The inference results are saved in the current path in `acc.log` file.
+
+```text
+AP: 0.7139169694686592
+```
+
+# Model Description
+
+## Performance
+
+| Parameters          | Ascend 910                  | GPU 1p | GPU 8p |
+| ------------------- | --------------------------- | ------------ | ------------ |
+| Model               | simple_baselines            | simple_baselines | simple_baselines |
+| Environment         | Ascend 910；CPU 2.60GHz，192cores；RAM：755G  | Ubuntu 18.04.6, 1p RTX3090, CPU 2.90GHz, 64cores, RAM 252GB; Mindspore 1.5.0 | Ubuntu 18.04.6, 8pcs RTX3090, CPU 2.90GHz, 64cores, RAM 252GB; Mindspore 1.5.0 |
+| Upload date (Y-M-D) | 2021-03-29                  | 2021-12-29 | 2021-12-29 |
+| MindSpore Version   | 1.1.0                       | 1.5.0 | 1.5.0 |
+| Dataset             | COCO2017                    | COCO2017 | COCO2017 |
+| Training params     | epoch=140, batch_size=64    | epoch=140, batch_size=64 | epoch=140, batch_size=64 |
+| Optimizer           | Adam                        | Adam | Adam |
+| Loss function       | Mean Squared Error          | Mean Squared Error | Mean Squared Error |
+| Output              | heatmap                     | heatmap | heatmap |
+| Final Loss          |                             | 0.27 | 0.27 |
+| Training speed      | 1pc: 251.4 ms/step          | 184 ms/step | 285 ms/step |
+| Total training time |                             | 17h | 3.5h |
+| Accuracy            | AP: 0.704                   | AP: 0.7143 | AP: 0.7143 |
+
+# Description of Random State
+
+Random seed is set inside "create_dataset" function in dataset.py.
+Initial network weights are used in model.py.
+
+# ModelZoo Homepage
+
+Please check the official [homepage](https://gitee.com/mindspore/models).
--- a/research/cv/simple_baselines/README_CN.md
+++ b/research/cv/simple_baselines/README_CN.md
@@ -2,6 +2,8 @@

 <!-- TOC -->

+[View English](./README.md)
+
 - [simple_baselines描述](#simple_baselines描述)
 - [模型架构](#模型架构)
 - [数据集](#数据集)
@@ -17,7 +19,6 @@
        - [onnx推理](#onnx推理)
 - [模型描述](#模型描述)
    - [性能](#性能)
-        - [评估性能](#评估性能)
 - [随机情况说明](#随机情况说明)
 - [ModelZoo主页](#ModelZoo主页)

@@ -59,7 +60,7 @@ simple_baselines的总体网络架构如下：

 # 环境要求

- 硬件(Ascend)
+- 硬件（Ascend/GPU）
    - 准备Ascend处理器搭建硬件环境。
 - 框架
    - [MindSpore](https://www.mindspore.cn/install/en)
@@ -81,7 +82,7 @@ simple_baselines的总体网络架构如下：

 - Ascend处理器环境运行

-```text
+```shell
 # 分布式训练
 用法：bash run_distribute_train.sh RANK_TABLE

@@ -92,12 +93,25 @@ simple_baselines的总体网络架构如下：
 用法：bash run_eval.sh
 ```

+- GPU处理器环境运行
+
+```shell
+# 分布式训练
+用法：bash scripts/run_distribute_train_gpu.sh DEVICE_NUM
+
+# 单机训练
+用法：bash scripts/run_standalone_train_gpu.sh DEVICE_ID
+
+# 运行评估示例
+用法：bash scripts/run_eval_gpu.sh DEVICE_ID
+```
+
 # 脚本说明

 ## 脚本及样例代码

-```shell
-
+```text
+.
 └──simple_baselines
  ├── README.md
  ├── scripts
@@ -108,13 +122,13 @@ simple_baselines的总体网络架构如下：
  ├── src
    ├── utils
        ├── coco.py                        # COCO数据集评估结果
-        ├── inference.py                   # 热图关键点预测
        ├── nms.py                         # nms
        ├── transforms.py                  # 图像处理转换
    ├── config.py                          # 参数配置
    ├── dataset.py                         # 数据预处理
    ├── network_with_loss.py               # 损失函数定义
-    └── pose_resnet.py                     # 主干网络定义
+    ├── pose_resnet.py                     # 主干网络定义
+    └── predict.py                         # 热图关键点预测
  ├── eval.py                              # 评估网络
  ├── eval_onnx.py                         # onnx推理
  └── train.py                             # 训练网络
@@ -126,7 +140,7 @@ simple_baselines的总体网络架构如下：

 - 配置模型相关参数：

-```python
+```text
 config.MODEL.INIT_WEIGHTS = True                                 # 初始化模型权重
 config.MODEL.PRETRAINED = 'resnet50.ckpt'                        # 预训练模型
 config.MODEL.NUM_JOINTS = 17                                     # 关键点数量
@@ -135,7 +149,7 @@ config.MODEL.IMAGE_SIZE = [192, 256]                             # 图像大小

 - 配置网络相关参数：

-```python
+```text
 config.NETWORK.NUM_LAYERS = 50                                   # resnet主干网络层数
 config.NETWORK.DECONV_WITH_BIAS = False                          # 网络反卷积偏差
 config.NETWORK.NUM_DECONV_LAYERS = 3                             # 网络反卷积层数
@@ -147,7 +161,7 @@ config.NETWORK.HEATMAP_SIZE = [48, 64]                           # 热图尺寸

 - 配置训练相关参数：

-```python
+```text
 config.TRAIN.SHUFFLE = True                                      # 训练数据随机排序
 config.TRAIN.BATCH_SIZE = 64                                     # 训练批次大小
 config.TRAIN.BEGIN_EPOCH = 0                                     # 测试数据集文件名
@@ -162,7 +176,7 @@ config.TRAIN.LR_FACTOR = 0.1                                     # 学习率降

 - 配置验证相关参数：

-```python
+```text
 config.TEST.BATCH_SIZE = 32                                      # 验证批次大小
 config.TEST.FLIP_TEST = True                                     # 翻转验证
 config.TEST.USE_GT_BBOX = False                                  # 使用标注框
@@ -170,7 +184,7 @@ config.TEST.USE_GT_BBOX = False                                  # 使用标注

 - 配置nms相关参数：

-```python
+```text
 config.TEST.OKS_THRE = 0.9                                       # OKS阈值
 config.TEST.IN_VIS_THRE = 0.2                                    # 可视化阈值
 config.TEST.BBOX_THRE = 1.0                                      # 候选框阈值
@@ -182,9 +196,9 @@ config.TEST.NMS_THRE = 1.0                                       # nms阈值

 ### 用法

-#### Ascend处理器环境运行
+- Ascend处理器环境运行

-```text
+```shell
 # 分布式训练
 用法：bash run_distribute_train.sh RANK_TABLE

@@ -195,6 +209,19 @@ config.TEST.NMS_THRE = 1.0                                       # nms阈值
 用法：bash run_eval.sh
 ```

+- GPU处理器环境运行
+
+```shell
+# 分布式训练
+bash scripts/run_distribute_train_gpu.sh DEVICE_NUM
+
+# 单机训练
+bash scripts/run_standalone_train_gpu.sh DEVICE_ID
+
+# 运行评估示例
+bash scripts/run_eval_gpu.sh DEVICE_ID
+```
+
 ### 结果

 - 使用COCO2017数据集训练simple_baselines
@@ -216,21 +243,27 @@ epoch:140 step:2340, loss is 0.0003393

 ### 用法

-#### Ascend处理器环境运行
-
 可通过改变config.py文件中的"config.TEST.MODEL_FILE"文件进行相应模型推理。

-```bash
+- Ascend处理器环境运行
+
+```shell
 # 评估
 bash eval.sh
 ```

+- GPU处理器环境运行
+
+```shell
+# Evaluation
+bash scripts/run_eval_gpu.sh DEVICE_ID
+```
+
 ### 结果

 使用COCO2017数据集文件夹中val2017进行评估simple_baselines,如下所示：

 ```text
-coco eval results saved to /cache/train_output/multi_train_poseresnet_v5_2-140_2340/keypoints_results.pkl
 AP: 0.704
 ```

@@ -270,7 +303,7 @@ AP:0.72296

 ## 推理过程

-### [导出mindir]
+### 导出mindir

 - 本地导出

@@ -280,7 +313,7 @@ python export.py

 - 在ModelArts上导出（如果想在modelarts中运行，请查看【modelarts】官方文档（https://support huaweicloud.com/modelarts/），如下启动即可）

-```python
+```text
 # (1) Upload the code folder to S3 bucket.
 # (2) Click to "create training task" on the website UI interface.
 # (3) Set the code directory to "/{path}/simple_pose" on the website UI interface.
@@ -317,7 +350,7 @@ bash run_infer_310.sh [MINDIR_PATH] [NEED_PREPROCESS] [DEVICE_ID]

 推理结果保存在当前路径中，您可以在 acc.log 文件中找到这样的结果。

-```bash
+```text
 AP: 0.7139169694686592
 ```

@@ -325,24 +358,21 @@ AP: 0.7139169694686592

 ## 性能

-### 评估性能
-
-#### COCO2017上性能参数
-
-| Parameters          | Ascend 910                   |
-| ------------------- | --------------------------- |
-| 模型版本       | simple_baselines               |
-| 资源            | Ascend 910；CPU：2.60GHz，192核；内存：755G                  |
-| 上传日期       | 2021-03-29 |
-| MindSpore版本   | 1.1.0                       |
-| 数据集             | COCO2017                    |
-| 训练参数 | epoch=140, batch_size=64   |
-| 优化器           | Adam                        |
-| 损失函数       | Mean Squared Error          |
-| 输出             | heatmap                     |
-| 输出             | heatmap                     |
-| 速度               | 1pc: 251.4 ms/step        |
-| 训练性能   | AP: 0.704          |
+| Parameters     | Ascend 910                  | GPU 1p           | GPU 8p |
+| -------------- | --------------------------- | ---------------- | ------------ |
+| 模型版本         | simple_baselines           | simple_baselines | simple_baselines |
+| 资源            | Ascend 910；CPU：2.60GHz，192核；内存：755G | Ubuntu 18.04.6, 1p RTX3090, CPU 2.90GHz, 64cores, RAM 252GB; Mindspore 1.5.0 | Ubuntu 18.04.6, 8pcs RTX3090, CPU 2.90GHz, 64cores, RAM 252GB; Mindspore 1.5.0 |
+| 上传日期         | 2021-03-29                 | 2021-12-29       | 2021-12-29 |
+| MindSpore版本   | 1.1.0                       | 1.5.0           | 1.5.0 |
+| 数据集           | COCO2017                   | COCO2017        | COCO2017 |
+| 训练参数         | epoch=140, batch_size=64    | epoch=140, batch_size=64 | epoch=140, batch_size=64 |
+| 优化器           | Adam                       | Adam            | Adam |
+| 损失函数         | Mean Squared Error          | Mean Squared Error | Mean Squared Error |
+| 输出            | heatmap                     | heatmap        | heatmap |
+| 最终损失         |                             | 0.27           | 0.27 |
+| 速度            | 1pc: 251.4 ms/step         | 184 ms/step      | 285 ms/step |
+| 训练总时间       |                            | 17h              | 3.5h |
+| 精确度          | AP: 0.704                   | AP: 0.7143      | AP: 0.7143 |

 # 随机情况说明

@@ -350,4 +380,4 @@ dataset.py中设置了“create_dataset”函数内的种子，同时在model.py

 # ModelZoo主页

- 请浏览官网[主页](https://gitee.com/mindspore/models)。
+请浏览官网[主页](https://gitee.com/mindspore/models)。
--- a/research/cv/simple_baselines/eval.py
+++ b/research/cv/simple_baselines/eval.py
@@ -12,9 +12,7 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ============================================================================
-'''
-This file evaluates the model used.
-'''
+""" This file evaluates the model """
 from __future__ import division

 import argparse
@@ -32,25 +30,28 @@ from src.dataset import flip_pairs
 from src.dataset import keypoint_dataset
 from src.utils.coco import evaluate
 from src.utils.transforms import flip_back
-from src.utils.inference import get_final_preds
+from src.predict import get_final_preds

 if config.MODELARTS.IS_MODEL_ARTS:
    import moxing as mox

 set_seed(config.GENERAL.EVAL_SEED)
-device_id = int(os.getenv('DEVICE_ID'))
+

 def parse_args():
+    """command line arguments parsing"""
    parser = argparse.ArgumentParser(description='Evaluate')
    parser.add_argument('--data_url', required=False, default=None, help='Location of data.')
    parser.add_argument('--train_url', required=False, default=None, help='Location of evaluate outputs.')
+    parser.add_argument("--device_target", type=str, choices=["Ascend", "GPU", "CPU"], default="Ascend",
+                        help="device target")
+    parser.add_argument("--device_id", type=int, default=0, help="Device id, default is 0.")
    args = parser.parse_args()
    return args

+
 def validate(cfg, val_dataset, model, output_dir, ann_path):
-    '''
-    validate
-    '''
+    """evaluate model"""
    model.set_train(False)
    num_samples = val_dataset.get_dataset_size() * cfg.TEST.BATCH_SIZE
    all_preds = np.zeros((num_samples, cfg.MODEL.NUM_JOINTS, 3),
@@ -98,22 +99,21 @@ def validate(cfg, val_dataset, model, output_dir, ann_path):


 def main():
-    context.set_context(mode=context.GRAPH_MODE,
-                        device_target="Ascend", save_graphs=False, device_id=device_id)
-
+    """main"""
    args = parse_args()
+    context.set_context(mode=context.GRAPH_MODE,
+                        device_target=args.device_target, save_graphs=False, device_id=args.device_id)

    if config.MODELARTS.IS_MODEL_ARTS:
        mox.file.copy_parallel(src_url=args.data_url, dst_url=config.MODELARTS.CACHE_INPUT)

    model = GetPoseResNet(config)

-    ckpt_name = ''
    if config.MODELARTS.IS_MODEL_ARTS:
        ckpt_name = config.MODELARTS.CACHE_INPUT
+        ckpt_name = os.path.join(ckpt_name, config.TEST.MODEL_FILE)
    else:
-        ckpt_name = config.DATASET.ROOT
-    ckpt_name = ckpt_name + config.TEST.MODEL_FILE
+        ckpt_name = config.TEST.MODEL_FILE
    print('loading model ckpt from {}'.format(ckpt_name))
    load_param_into_net(model, load_checkpoint(ckpt_name))

@@ -126,20 +126,19 @@ def main():
    ckpt_name = ckpt_name.split('/')
    ckpt_name = ckpt_name[len(ckpt_name) - 1]
    ckpt_name = ckpt_name.split('.')[0]
-    output_dir = ''
-    ann_path = ''
    if config.MODELARTS.IS_MODEL_ARTS:
        output_dir = config.MODELARTS.CACHE_OUTPUT
        ann_path = config.MODELARTS.CACHE_INPUT
    else:
        output_dir = config.TEST.OUTPUT_DIR
        ann_path = config.DATASET.ROOT
-    output_dir = output_dir + ckpt_name
-    ann_path = ann_path + config.DATASET.TEST_JSON
+    output_dir = os.path.join(output_dir, ckpt_name)
+    ann_path = os.path.join(ann_path, config.DATASET.TEST_JSON)
    validate(config, valid_dataset, model, output_dir, ann_path)

    if config.MODELARTS.IS_MODEL_ARTS:
        mox.file.copy_parallel(src_url=config.MODELARTS.CACHE_OUTPUT, dst_url=args.train_url)

+
 if __name__ == '__main__':
    main()
--- a/research/cv/simple_baselines/export.py
+++ b/research/cv/simple_baselines/export.py
@@ -12,9 +12,7 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ============================================================================
-"""
-export simple_baseline to mindir or air
-"""
+""" Export simple_baseline to mindir or air """
 import argparse
 import numpy as np
 from mindspore import context, Tensor, export

--- a/research/cv/simple_baselines/postprocess.py
+++ b/research/cv/simple_baselines/postprocess.py
@@ -12,9 +12,7 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ============================================================================
-"""
-postprocess.
-"""
+""" postprocess script """
 import os
 import numpy as np

@@ -24,8 +22,9 @@ from src.predict import get_final_preds
 from src.dataset import flip_pairs
 from src.config import config

+
 def get_acc():
-    '''calculate accuracy'''
+    """ calculate accuracy """
    ckpt_file = config.TEST.MODEL_FILE
    output_dir = ckpt_file.split('.')[0]
    if config.enable_modelarts:
@@ -86,5 +85,6 @@ def get_acc():
        cfg, all_preds[:idx], output_dir, all_boxes[:idx], image_id, None)
    print("AP:", perf_indicator)

+
 if __name__ == '__main__':
    get_acc()
--- a/research/cv/simple_baselines/preprocess.py
+++ b/research/cv/simple_baselines/preprocess.py
@@ -12,17 +12,16 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ============================================================================
-"""
-preprocess.
-"""
+""" preprocess script """
 import os
 import numpy as np

 from src.dataset import keypoint_dataset
 from src.config import config

+
 def get_bin():
-    ''' get bin files'''
+    """ get bin files"""
    valid_dataset, _ = keypoint_dataset(
        config,
        bbox_file=config.TEST.COCO_BBOX_FILE,
@@ -62,5 +61,6 @@ def get_bin():
        np.save(os.path.join(id_path, file_name), item['id'])
    print("=" * 20, "export bin files finished", "=" * 20)

+
 if __name__ == '__main__':
    get_bin()
--- a/research/cv/simple_baselines/scripts/run_distribute_train_gpu.sh
+++ b/research/cv/simple_baselines/scripts/run_distribute_train_gpu.sh
+#!/bin/bash
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+if [ $# -ne 1 ]; then
+    echo "Please run the script as: "
+    echo "bash scripts/run_distribute_train_gpu.sh [RANK_SIZE]"
+    echo "For example: bash scripts/run_distribute_train_gpu.sh 8"
+    echo "It is better to use the absolute path."
+    echo "========================================================================"
+    exit 1
+fi
+
+export RANK_SIZE=$1
+
+rm -rf ./train_parallel
+mkdir ./train_parallel
+cp ./*.py ./train_parallel
+cp -r ./src ./train_parallel
+cd ./train_parallel
+
+echo "start training on GPU $RANK_SIZE devices"
+env > env.log
+
+mpirun -n $RANK_SIZE --output-filename log_output --merge-stderr-to-stdout \
+python train.py \
+    --device_target="GPU" \
+    --is_model_arts=False \
+    --run_distribute=True > train.log 2>&1 &
+cd ..
--- a/research/cv/simple_baselines/scripts/run_eval_gpu.sh
+++ b/research/cv/simple_baselines/scripts/run_eval_gpu.sh
+#!/bin/bash
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 1.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-1.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+if [ $# -ne 1 ]; then
+    echo "Please run the script as: "
+    echo "bash scripts/run_eval_gpu.sh [DEVICE_ID]"
+    echo "For example: bash scripts/run_eval_gpu.sh 0"
+    echo "It is better to use the absolute path."
+    echo "========================================================================"
+    exit 1
+fi
+
+export DEVICE_NUM=1
+export DEVICE_ID=$1
+
+rm -rf ./eval
+mkdir ./eval
+cp ./*.py ./eval
+cp -r ./src ./eval
+cd ./eval || exit
+
+echo "start evaluation on GPU device $DEVICE_ID"
+env > env.log
+
+python eval.py --device_target="GPU" --device_id=$DEVICE_ID > eval.log 2>&1 &
--- a/research/cv/simple_baselines/scripts/run_standalone_train_gpu.sh
+++ b/research/cv/simple_baselines/scripts/run_standalone_train_gpu.sh
+#!/bin/bash
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+if [ $# -ne 1 ]; then
+    echo "Please run the script as: "
+    echo "bash scripts/run_standalone_train_gpu.sh [DEVICE_ID]"
+    echo "For example: bash scripts/run_standalone_train_gpu.sh 0"
+    echo "It is better to use the absolute path."
+    echo "========================================================================"
+    exit 1
+fi
+
+export RANK_SIZE=1
+export DEVICE_ID=$1
+
+rm -rf train$1
+mkdir ./train$1
+cp ./*.py ./train$1
+cp -r ./src ./train$1
+cd ./train$1 || exit
+
+echo "start training on GPU device $DEVICE_ID"
+env > env.log
+
+python train.py \
+    --device_target="GPU" \
+    --device_id=$DEVICE_ID \
+    --is_model_arts=False \
+    --run_distribute=False > train.log 2>&1 &
+
+cd ..
--- a/research/cv/simple_baselines/src/config.py
+++ b/research/cv/simple_baselines/src/config.py
@@ -12,14 +12,12 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ============================================================================
-'''
-config
-'''
+"""Config parameters for simple baselines."""
 from easydict import EasyDict as edict

 config = edict()

-#general
+# general
 config.GENERAL = edict()
 config.GENERAL.VERSION = 'commit'
 config.GENERAL.TRAIN_SEED = 1
@@ -27,7 +25,7 @@ config.GENERAL.EVAL_SEED = 1
 config.GENERAL.DATASET_SEED = 1
 config.GENERAL.RUN_DISTRIBUTE = True

-#model arts
+# model arts
 config.MODELARTS = edict()
 config.MODELARTS.IS_MODEL_ARTS = False
 config.MODELARTS.CACHE_INPUT = '/cache/data_tzh/'
@@ -50,7 +48,6 @@ config.NETWORK.NUM_DECONV_FILTERS = [256, 256, 256]
 config.NETWORK.NUM_DECONV_KERNELS = [4, 4, 4]
 config.NETWORK.FINAL_CONV_KERNEL = 1
 config.NETWORK.REVERSE = True
-
 config.NETWORK.TARGET_TYPE = 'gaussian'
 config.NETWORK.HEATMAP_SIZE = [48, 64]
 config.NETWORK.SIGMA = 2
@@ -66,7 +63,7 @@ config.DATASET.ROOT = '/home/dataset/coco/'
 config.DATASET.TRAIN_SET = 'train2017'
 config.DATASET.TRAIN_JSON = 'annotations/person_keypoints_train2017.json'
 config.DATASET.TEST_SET = 'val2017'
-config.DATASET.TEST_JSON = 'annotations/COCO_val2017_detections_AP_H_56_person.json'
+config.DATASET.TEST_JSON = 'annotations/person_keypoints_val2017.json'

 # training data augmentation
 config.DATASET.FLIP = True
@@ -76,7 +73,7 @@ config.DATASET.ROT_FACTOR = 40
 # train
 config.TRAIN = edict()
 config.TRAIN.SHUFFLE = True
-config.TRAIN.BATCH_SIZE = 64
+config.TRAIN.BATCH_SIZE = 64  # 32 in paper
 config.TRAIN.BEGIN_EPOCH = 0
 config.TRAIN.END_EPOCH = 140
 config.TRAIN.LR = 0.001
@@ -84,7 +81,9 @@ config.TRAIN.LR_FACTOR = 0.1
 config.TRAIN.LR_STEP = [90, 120]
 config.TRAIN.NUM_PARALLEL_WORKERS = 8
 config.TRAIN.SAVE_CKPT = True
-config.TRAIN.CKPT_PATH = "/home/dataset/coco/"
+config.TRAIN.CKPT_PATH = "/home/model/"
+config.TRAIN.SAVE_CKPT_EPOCH = 3
+config.TRAIN.KEEP_CKPT_MAX = 10

 # valid
 config.TEST = edict()
@@ -93,10 +92,10 @@ config.TEST.FLIP_TEST = True
 config.TEST.POST_PROCESS = True
 config.TEST.SHIFT_HEATMAP = True
 config.TEST.USE_GT_BBOX = False
-config.TEST.NUM_PARALLEL_WORKERS = 2
-config.TEST.MODEL_FILE = '/home/dataset/coco/multi_train_poseresnet_commit_0-140_292.ckpt'
+config.TEST.NUM_PARALLEL_WORKERS = 8
+config.TEST.MODEL_FILE = '/home/model/multi_train_poseresnet_commit_5-140_292.ckpt'
 config.TEST.COCO_BBOX_FILE = '/home/dataset/coco/annotations/COCO_val2017_detections_AP_H_56_person.json'
-config.TEST.OUTPUT_DIR = 'results/'
+config.TEST.OUTPUT_DIR = '/home/results/'

 # nms
 config.TEST.OKS_THRE = 0.9
@@ -105,7 +104,7 @@ config.TEST.BBOX_THRE = 1.0
 config.TEST.IMAGE_THRE = 0.0
 config.TEST.NMS_THRE = 1.0

-#310 infer-related
+# 310 infer-related
 config.INFER = edict()
 config.INFER.PRE_RESULT_PATH = './preprocess_Result'
 config.INFER.POST_RESULT_PATH = './result_Files'

--- a/research/cv/simple_baselines/src/dataset.py
+++ b/research/cv/simple_baselines/src/dataset.py
@@ -12,16 +12,15 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ============================================================================
-'''
-dataset processing
-'''
+""" dataset processing """
+
 from __future__ import division

 import json
 import os
 from copy import deepcopy
 import random
-
+import multiprocessing as mp
 import numpy as np
 import cv2

@@ -29,14 +28,15 @@ import mindspore.dataset as ds
 import mindspore.dataset.vision as C
 from src.utils.transforms import fliplr_joints, get_affine_transform, affine_transform

-ds.config.set_seed(1) # Set Random Seed
+ds.config.set_seed(1)  # Set Random Seed
 flip_pairs = [[1, 2], [3, 4], [5, 6], [7, 8],
              [9, 10], [11, 12], [13, 14], [15, 16]]

+
 class KeypointDatasetGenerator:
-    '''
-    About the specific operations of coco2017 data set processing
-    '''
+    """
+    About the specific operations of coco2017 dataset processing
+    """
    def __init__(self, cfg, is_train=False):
        self.image_thre = cfg.TEST.IMAGE_THRE
        self.image_size = np.array(cfg.MODEL.IMAGE_SIZE, dtype=np.int32)
@@ -56,9 +56,7 @@ class KeypointDatasetGenerator:
        self.num_joints = 17

    def load_gt_dataset(self, image_path, ann_file):
-        '''
-        load_gt_dataset
-        '''
+        """ load_gt_dataset """
        self.db = []

        with open(ann_file, "rb") as f:
@@ -134,11 +132,8 @@ class KeypointDatasetGenerator:
                })

    def load_detect_dataset(self, image_path, ann_file, bbox_file):
-        '''
-        load_detect_dataset
-        '''
+        """ load detection dataset """
        self.db = []
-        all_boxes = None
        with open(bbox_file, 'r') as f:
            all_boxes = json.load(f)

@@ -245,9 +240,7 @@ class KeypointDatasetGenerator:
        return image, target, target_weight, s, c, score, db_rec['id']

    def generate_heatmap(self, joints, joints_vis):
-        '''
-        generate_heatmap
-        '''
+        """ generate heatmap"""
        target_weight = np.ones((self.num_joints, 1), dtype=np.float32)
        target_weight[:, 0] = joints_vis[:, 0]

@@ -300,6 +293,7 @@ class KeypointDatasetGenerator:
    def __len__(self):
        return len(self.db)

+
 def keypoint_dataset(config,
                     ann_file=None,
                     image_path=None,
@@ -307,7 +301,7 @@ def keypoint_dataset(config,
                     rank=0,
                     group_size=1,
                     train_mode=True,
-                     num_parallel_workers=8,
+                     num_parallel_workers=mp.cpu_count(),
                     transform=None,
                     shuffle=None):
    """
@@ -315,9 +309,8 @@ def keypoint_dataset(config,

    Args:
        rank (int): The shard ID within num_shards (default=None).
-        group_size (int): Number of shards that the dataset should be divided
-            into (default=None).
-         mode (str): "train" or others. Default: " train".
+        group_size (int): Number of shards that the dataset should be divided into (default=None).
+        mode (str): "train" or others. Default: " train".
        num_parallel_workers (int): Number of workers to read the data. Default: None.
    """
    # config

--- a/research/cv/simple_baselines/src/network_with_loss.py
+++ b/research/cv/simple_baselines/src/network_with_loss.py
@@ -12,9 +12,7 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ============================================================================
-'''
-network_with_loss
-'''
+""" network_with_loss """
 from __future__ import division

 import mindspore.nn as nn
@@ -23,10 +21,9 @@ from mindspore.ops import functional as F
 from mindspore.nn.loss.loss import LossBase
 from mindspore.common import dtype as mstype

+
 class JointsMSELoss(LossBase):
-    '''
-    JointsMSELoss
-    '''
+    """JointsMSELoss"""
    def __init__(self, use_target_weight):
        super(JointsMSELoss, self).__init__()
        self.criterion = nn.MSELoss(reduction='mean')
@@ -37,9 +34,7 @@ class JointsMSELoss(LossBase):
        self.mul = P.Mul()

    def construct(self, output, target, target_weight):
-        '''
-        construct
-        '''
+        """ construct """
        total_shape = self.shape(output)
        batch_size = total_shape[0]
        num_joints = total_shape[1]
@@ -64,6 +59,7 @@ class JointsMSELoss(LossBase):

        return loss / num_joints

+
 class PoseResNetWithLoss(nn.Cell):
    """
    Pack the model network and loss function together to calculate the loss value.

--- a/research/cv/simple_baselines/src/pose_resnet.py
+++ b/research/cv/simple_baselines/src/pose_resnet.py
@@ -12,9 +12,7 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ============================================================================
-'''
-simple_baselines network
-'''
+""" simple_baselines network """
 from __future__ import division
 import os
 import mindspore.nn as nn
@@ -24,10 +22,9 @@ from mindspore.train.serialization import load_checkpoint, load_param_into_net

 BN_MOMENTUM = 0.1

+
 class MPReverse(nn.Cell):
-    '''
-    MPReverse
-    '''
+    """MPReverse"""
    def __init__(self, kernel_size=1, stride=1, pad_mode="valid"):
        super(MPReverse, self).__init__()
        self.maxpool = nn.MaxPool2d(kernel_size=kernel_size, stride=stride, pad_mode=pad_mode)
@@ -39,10 +36,9 @@ class MPReverse(nn.Cell):
        x = self.reverse(x)
        return x

+
 class Bottleneck(nn.Cell):
-    '''
-    model part of network
-    '''
+    """model part of network"""
    expansion = 4

    def __init__(self, inplanes, planes, stride=1, downsample=None):
@@ -59,9 +55,7 @@ class Bottleneck(nn.Cell):
        self.stride = stride

    def construct(self, x):
-        '''
-        construct
-        '''
+        """construct"""
        residual = x

        out = self.conv1(x)
@@ -85,10 +79,7 @@ class Bottleneck(nn.Cell):


 class PoseResNet(nn.Cell):
-    '''
-    PoseResNet
-    '''
-
+    """pose-resnet"""
    def __init__(self, block, layers, cfg):
        self.inplanes = 64
        self.deconv_with_bias = cfg.NETWORK.DECONV_WITH_BIAS
@@ -122,9 +113,7 @@ class PoseResNet(nn.Cell):
        )

    def _make_layer(self, block, planes, blocks, stride=1):
-        '''
-        _make_layer
-        '''
+        """make layer"""
        downsample = None
        if stride != 1 or self.inplanes != planes * block.expansion:
            downsample = nn.SequentialCell([nn.Conv2d(self.inplanes, planes * block.expansion,
@@ -134,16 +123,13 @@ class PoseResNet(nn.Cell):
        layers = []
        layers.append(block(self.inplanes, planes, stride, downsample))
        self.inplanes = planes * block.expansion
-        for i in range(1, blocks):
+        for _ in range(1, blocks):
            layers.append(block(self.inplanes, planes))
-            print(i)

        return nn.SequentialCell(layers)

    def _make_deconv_layer(self, num_layers, num_filters, num_kernels):
-        '''
-        _make_deconv_layer
-        '''
+        """make deconvolutional layer"""
        assert num_layers == len(num_filters), \
            'ERROR: num_deconv_layers is different len(num_deconv_filters)'
        assert num_layers == len(num_kernels), \
@@ -171,9 +157,6 @@ class PoseResNet(nn.Cell):
        return nn.SequentialCell(layers)

    def construct(self, x):
-        '''
-        construct
-        '''
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
@@ -186,6 +169,7 @@ class PoseResNet(nn.Cell):

        x = self.deconv_layers(x)
        x = self.final_layer(x)
+
        return x

    def init_weights(self, pretrained=''):
@@ -205,18 +189,16 @@ resnet_spec = {50: (Bottleneck, [3, 4, 6, 3]),


 def GetPoseResNet(cfg):
-    '''
-    GetPoseResNet
-    '''
+    """get pose-resnet"""
    num_layers = cfg.NETWORK.NUM_LAYERS
    block_class, layers = resnet_spec[num_layers]
    network = PoseResNet(block_class, layers, cfg)

    if cfg.MODEL.IS_TRAINED and cfg.MODEL.INIT_WEIGHTS:
-        pretrained = ''
        if cfg.MODELARTS.IS_MODEL_ARTS:
-            pretrained = cfg.MODELARTS.CACHE_INPUT + cfg.MODEL.PRETRAINED
+            pretrained = os.path.join(cfg.MODELARTS.CACHE_INPUT, cfg.MODEL.PRETRAINED)
        else:
-            pretrained = cfg.TRAIN.CKPT_PATH + cfg.MODEL.PRETRAINED
+            pretrained = os.path.join(cfg.TRAIN.CKPT_PATH, cfg.MODEL.PRETRAINED)
        network.init_weights(pretrained)
+
    return network
--- a/research/cv/simple_baselines/src/predict.py
+++ b/research/cv/simple_baselines/src/predict.py
@@ -12,21 +12,19 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ============================================================================
-'''
-prediction picture
-'''
+""" prediction picture """
 import math
 import numpy as np

 from src.utils.transforms import transform_preds

+
 def get_max_preds(batch_heatmaps):
-    '''
+    """
    get predictions from score maps
    heatmaps: numpy.ndarray([batch_size, num_joints, height, width])
-    '''
-    assert isinstance(batch_heatmaps, np.ndarray), \
-        'batch_heatmaps should be numpy.ndarray'
+    """
+    assert isinstance(batch_heatmaps, np.ndarray), 'batch_heatmaps should be numpy.ndarray'
    assert batch_heatmaps.ndim == 4, 'batch_images should be 4-ndim'

    batch_size = batch_heatmaps.shape[0]
@@ -50,10 +48,11 @@ def get_max_preds(batch_heatmaps):
    preds *= pred_mask
    return preds, maxvals

+
 def get_final_preds(config, batch_heatmaps, center, scale):
-    '''
+    """
    get final predictions from score maps
-    '''
+    """
    coords, maxvals = get_max_preds(batch_heatmaps)
    heatmap_height = batch_heatmaps.shape[2]
    heatmap_width = batch_heatmaps.shape[3]

--- a/research/cv/simple_baselines/src/utils/coco.py
+++ b/research/cv/simple_baselines/src/utils/coco.py
@@ -12,9 +12,7 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ============================================================================
-'''
-coco
-'''
+"""coco"""
 from __future__ import division

 import json
@@ -33,10 +31,8 @@ except ImportError:

 from src.utils.nms import oks_nms

+
 def _write_coco_keypoint_results(img_kpts, num_joints, res_file):
-    '''
-    _write_coco_keypoint_results
-    '''
    results = []

    for img, items in img_kpts.items():
@@ -62,9 +58,6 @@ def _write_coco_keypoint_results(img_kpts, num_joints, res_file):


 def _do_python_keypoint_eval(res_file, res_folder, ann_path):
-    '''
-    _do_python_keypoint_eval
-    '''
    coco = COCO(ann_path)
    coco_dt = coco.loadRes(res_file)
    coco_eval = COCOeval(coco, coco_dt, 'keypoints')
@@ -87,10 +80,8 @@ def _do_python_keypoint_eval(res_file, res_folder, ann_path):

    return info_str

+
 def evaluate(cfg, preds, output_dir, all_boxes, img_id, ann_path):
-    '''
-    evaluate
-    '''
    if not os.path.exists(output_dir):
        os.makedirs(output_dir)
    res_file = os.path.join(output_dir, 'keypoints_results.json')

--- a/research/cv/simple_baselines/src/utils/nms.py
+++ b/research/cv/simple_baselines/src/utils/nms.py
@@ -13,16 +13,12 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ============================================================================
-'''
-nms operation
-'''
+""" nms operation """
 from __future__ import division
 import numpy as np

+
 def oks_iou(g, d, a_g, a_d, sigmas=None, in_vis_thre=None):
-    '''
-    oks_iou
-    '''
    if not isinstance(sigmas, np.ndarray):
        sigmas = np.array([.26, .25, .25, .35, .35, .79, .79, .72, .72,
                           .62, .62, 1.07, 1.07, .87, .87, .89, .89]) / 10.0
@@ -44,6 +40,7 @@ def oks_iou(g, d, a_g, a_d, sigmas=None, in_vis_thre=None):
        ious[n_d] = np.sum(np.exp(-e)) / e.shape[0] if e.shape[0] != 0 else 0.0
    return ious

+
 def oks_nms(kpts_db, thresh, sigmas=None, in_vis_thre=None):
    """
    greedily select boxes with high confidence and overlap with current maximum <= thresh

--- a/research/cv/simple_baselines/src/utils/transforms.py
+++ b/research/cv/simple_baselines/src/utils/transforms.py
@@ -13,13 +13,12 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ============================================================================
-'''
-transforms
-'''
+""" transforms """
 from __future__ import division
 import numpy as np
 import cv2

+
 def flip_back(output_flipped, matched_parts):
    '''
    ouput_flipped: numpy.ndarray(batch_size, num_joints, height, width)
@@ -55,9 +54,7 @@ def fliplr_joints(joints, joints_vis, width, matched_parts):


 def transform_preds(coords, center, scale, output_size):
-    '''
-    transform_preds
-    '''
+    """transform_preds"""
    target_coords = np.zeros(coords.shape)
    trans = get_affine_transform(center, scale, 0, output_size, inv=1)
    for p in range(coords.shape[0]):
@@ -65,15 +62,8 @@ def transform_preds(coords, center, scale, output_size):
    return target_coords


-def get_affine_transform(center,
-                         scale,
-                         rot,
-                         output_size,
-                         shift=np.array([0, 0], dtype=np.float32),
-                         inv=0):
-    '''
-    get_affine_transform
-    '''
+def get_affine_transform(center, scale, rot, output_size, shift=np.array([0, 0], dtype=np.float32), inv=0):
+    """get_affine_transform"""
    if not isinstance(scale, np.ndarray) and not isinstance(scale, list):
        print(scale)
        scale = np.array([scale, scale])

--- a/research/cv/simple_baselines/train.py
+++ b/research/cv/simple_baselines/train.py
-# Copyright 2021 Huawei Technologies Co., Ltd
+# Copyright 2021-2022 Huawei Technologies Co., Ltd
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
@@ -12,18 +12,17 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ============================================================================
-'''
-train
-'''
+"""Train simple baselines."""
 from __future__ import division

 import os
 import ast
 import argparse
 import numpy as np
+
 from mindspore import context, Tensor
 from mindspore.context import ParallelMode
-from mindspore.communication.management import init
+from mindspore.communication.management import init, get_rank, get_group_size
 from mindspore.train import Model
 from mindspore.train.callback import TimeMonitor, LossMonitor, ModelCheckpoint, CheckpointConfig
 from mindspore.nn.optim import Adam
@@ -38,16 +37,10 @@ if config.MODELARTS.IS_MODEL_ARTS:
    import moxing as mox

 set_seed(config.GENERAL.TRAIN_SEED)
-def get_lr(begin_epoch,
-           total_epochs,
-           steps_per_epoch,
-           lr_init=0.1,
-           factor=0.1,
-           epoch_number_to_drop=(90, 120)
-           ):
-    '''
-    get_lr
-    '''
+
+
+def get_lr(begin_epoch, total_epochs, steps_per_epoch, lr_init=0.1, factor=0.1,
+           epoch_number_to_drop=(90, 120)):
    lr_each_step = []
    total_steps = steps_per_epoch * total_epochs
    step_number_to_drop = [steps_per_epoch * x for x in epoch_number_to_drop]
@@ -60,11 +53,10 @@ def get_lr(begin_epoch,
    learning_rate = lr_each_step[current_step:]
    return learning_rate

+
 def parse_args():
-    '''
-    args
-    '''
-    parser = argparse.ArgumentParser(description="Simplebaseline training")
+    """command line arguments parsing"""
+    parser = argparse.ArgumentParser(description="Simple Baselines training")
    parser.add_argument('--data_url', required=False, default=None, help='Location of data.')
    parser.add_argument('--train_url', required=False, default=None, help='Location of training outputs.')
    parser.add_argument('--device_id', required=False, default=None, type=int, help='Location of training outputs.')
@@ -75,30 +67,38 @@ def parse_args():
    args = parser.parse_args()
    return args

+
 def main():
    print("loading parse...")
    args = parse_args()
-    device_id = args.device_id
    device_target = args.device_target
    config.GENERAL.RUN_DISTRIBUTE = args.run_distribute
    config.MODELARTS.IS_MODEL_ARTS = args.is_model_arts
-    if config.GENERAL.RUN_DISTRIBUTE or config.MODELARTS.IS_MODEL_ARTS:
-        device_id = int(os.getenv('DEVICE_ID'))
-    context.set_context(mode=context.GRAPH_MODE,
-                        device_target=device_target,
-                        save_graphs=False,
-                        device_id=device_id)
-
-    if config.GENERAL.RUN_DISTRIBUTE:
-        init()
-        rank = int(os.getenv('DEVICE_ID'))
-        device_num = int(os.getenv('RANK_SIZE'))
-        context.set_auto_parallel_context(device_num=device_num,
-                                          parallel_mode=ParallelMode.DATA_PARALLEL,
-                                          gradients_mean=True)
-    else:
+
+    context.set_context(mode=context.GRAPH_MODE, device_target=device_target, save_graphs=False)
+    if device_target == "Ascend":
        rank = 0
        device_num = 1
+        context.set_context(device_id=rank)
+        if config.GENERAL.RUN_DISTRIBUTE:
+            init()
+            rank = int(os.getenv('DEVICE_ID'))
+            device_num = int(os.getenv('RANK_SIZE'))
+            context.set_auto_parallel_context(device_num=device_num,
+                                              parallel_mode=ParallelMode.DATA_PARALLEL,
+                                              gradients_mean=True)
+    elif device_target == "GPU":
+        rank = int(os.getenv('DEVICE_ID', "0"))
+        device_num = int(os.getenv('RANK_SIZE', "0"))
+        if device_num > 1:
+            init()
+            rank = get_rank()
+            device_num = get_group_size()
+            context.set_auto_parallel_context(device_num=device_num,
+                                              parallel_mode=ParallelMode.DATA_PARALLEL,
+                                              gradients_mean=True)
+    else:
+        raise ValueError("Unsupported device, only GPU or Ascend is supported.")

    if config.MODELARTS.IS_MODEL_ARTS:
        mox.file.copy_parallel(src_url=args.data_url, dst_url=config.MODELARTS.CACHE_INPUT)
@@ -106,9 +106,7 @@ def main():
    dataset, _ = keypoint_dataset(config,
                                  rank=rank,
                                  group_size=device_num,
-                                  train_mode=True,
-                                  num_parallel_workers=config.TRAIN.NUM_PARALLEL_WORKERS,
-                                  )
+                                  train_mode=True)
    net = GetPoseResNet(config)
    loss = JointsMSELoss(config.LOSS.USE_TARGET_WEIGHT)
    net_with_loss = PoseResNetWithLoss(net, loss)
@@ -123,24 +121,25 @@ def main():
    time_cb = TimeMonitor(data_size=dataset_size)
    loss_cb = LossMonitor()
    cb = [time_cb, loss_cb]
+
    if config.TRAIN.SAVE_CKPT:
-        config_ck = CheckpointConfig(save_checkpoint_steps=dataset_size, keep_checkpoint_max=20)
-        prefix = ''
+        config_ck = CheckpointConfig(save_checkpoint_steps=dataset_size*config.TRAIN.SAVE_CKPT_EPOCH,
+                                     keep_checkpoint_max=config.TRAIN.KEEP_CKPT_MAX)
        if config.GENERAL.RUN_DISTRIBUTE:
-            prefix = 'multi_' + 'train_poseresnet_' + config.GENERAL.VERSION + '_' + os.getenv('DEVICE_ID')
+            prefix = 'multi_' + 'train_poseresnet_' + config.GENERAL.VERSION + '_' + str(rank)
        else:
            prefix = 'single_' + 'train_poseresnet_' + config.GENERAL.VERSION

-        directory = ''
        if config.MODELARTS.IS_MODEL_ARTS:
-            directory = config.MODELARTS.CACHE_OUTPUT + 'device_'+ os.getenv('DEVICE_ID')
+            directory = os.path.join(config.MODELARTS.CACHE_OUTPUT, 'device_' + str(rank))
        elif config.GENERAL.RUN_DISTRIBUTE:
-            directory = config.TRAIN.CKPT_PATH + 'device_'+ os.getenv('DEVICE_ID')
+            directory = os.path.join(config.TRAIN.CKPT_PATH, 'device_' + str(rank))
        else:
-            directory = config.TRAIN.CKPT_PATH + 'device'
+            directory = os.path.join(config.TRAIN.CKPT_PATH, 'device')

        ckpoint_cb = ModelCheckpoint(prefix=prefix, directory=directory, config=config_ck)
        cb.append(ckpoint_cb)
+
    model = Model(net_with_loss, loss_fn=None, optimizer=opt, amp_level="O2")
    epoch_size = config.TRAIN.END_EPOCH - config.TRAIN.BEGIN_EPOCH
    print("************ Start training now ************")
@@ -150,5 +149,6 @@ def main():
    if config.MODELARTS.IS_MODEL_ARTS:
        mox.file.copy_parallel(src_url=config.MODELARTS.CACHE_OUTPUT, dst_url=args.train_url)

+
 if __name__ == '__main__':
    main()