!3721 [杭州电子科技大学][高校贡献][Mindspore][PyramidBox_gpu]

Merge pull request !3721 from Jamesbing/PyramidBox

!3721 [杭州电子科技大学][高校贡献][Mindspore][PyramidBox_gpu]
Merge pull request !3721 from Jamesbing/PyramidBox
c545eab2 · i-robot · Gitee · 4e1318ed · 8ad224e9 · c545eab2
Unverified Commit c545eab2 authored 2 years ago by i-robot Committed by Gitee 2 years ago
--- a/research/cv/PyramidBox/README_CN.md
+++ b/research/cv/PyramidBox/README_CN.md
+# 目录
+<!-- TOC -->
+- [目录](#目录)
+- [PyramidBox描述](#pyramidbox描述)
+- [模型架构](#模型架构)
+- [数据集](#数据集)
+    - [WIDER Face](#wider-face)
+    - [FDDB](#fddb)
+- [环境要求](#环境要求)
+- [快速入门](#快速入门)
+- [脚本说明](#脚本说明)
+    - [脚本及样例代码](#脚本及样例代码)
+    - [脚本参数](#脚本参数)
+        - [训练模型](#训练模型)
+        - [评估模型](#评估模型)
+        - [配置参数](#配置参数)
+    - [训练过程](#训练过程)
+        - [单卡训练](#单卡训练)
+        - [多卡训练](#多卡训练)
+    - [评估过程](#评估过程)
+- [模型描述](#模型描述)
+    - [性能](#性能)
+<!-- /TOC -->
+# PyramidBox描述
+[PyramidBox](https://arxiv.org/pdf/1803.07737.pdf) 是一种基于SSD的单阶段人脸检测器，它利用上下文信息解决困难人脸的检测问题。如下图所示，PyramidBox在六个尺度的特征图上进行不同层级的预测。该工作主要包括以下模块：LFPN、Pyramid Anchors、CPM、Data-anchor-sampling。
+[论文](https://arxiv.org/pdf/1803.07737.pdf):  Tang, Xu, et al. "Pyramidbox: A context-assisted single shot face detector." Proceedings of the European conference on computer vision (ECCV). 2018.
+# 模型架构
+**LFPN**: LFPN全称Low-level Feature Pyramid Networks, 在检测任务中，LFPN可以充分结合高层次的包含更多上下文的特征和低层次的包含更多纹理的特征。高层级特征被用于检测尺寸较大的人脸，而低层级特征被用于检测尺寸较小的人脸。为了将高层级特征整合到高分辨率的低层级特征上，我们从中间层开始做自上而下的融合，构建Low-level FPN。
+**Pyramid Anchors**: 该算法使用半监督解决方案来生成与人脸检测相关的具有语义的近似标签，提出基于anchor的语境辅助方法，它引入有监督的信息来学习较小的、模糊的和部分遮挡的人脸的语境特征。使用者可以根据标注的人脸标签，按照一定的比例扩充，得到头部的标签（上下左右各扩充1/2）和人体的标签（可自定义扩充比例）。
+**CPM**: CPM全称Context-sensitive Predict Module, 本方法设计了一种上下文敏感结构(CPM)来提高预测网络的表达能力。
+**Data-anchor-sampling**: 设计了一种新的采样方法，称作Data-anchor-sampling，该方法可以增加训练样本在不同尺度上的多样性。该方法改变训练样本的分布，重点关注较小的人脸。
+# 数据集
+使用的数据集一共有两个：
+1. [WIDER Face](http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace/)
+1. [FDDB](http://vis-www.cs.umass.edu/fddb/index.html)
+详细地，
+## WIDER Face
+- WIDER Face数据集用于训练模型和验证模型，下载训练数据WIDER Face Training Images，解压下载的WIDER_train数据集；下载验证数据集WIDER Face Validation Images，解压下载的WIDER_val数据集。
+- 下载WIDER Face的[标注文件](http://shuoyang1213.me/WIDERFACE/support/bbx_annotation/wider_face_split.zip)，解压成文件夹wider_face_split。
+- 在dataset文件夹下新建目录WIDERFACE，将WIDER_train，WIDER_val和wider_face_split文件夹放在目录WIDERFACE下。
+- 数据集大小：包含32,203张图片，393,703个标注人脸。
+    - WIDER_train: 1.4G
+    - WIDER_val：355M
+- 检查WIDER_train，WIDER_val和wider_face_split文件夹在WIDERFACE目录下。
+## FDDB
+- FDDB数据集用来评估模型，下载[originalPics.tar.gz](http://vis-www.cs.umass.edu/fddb/originalPics.tar.gz)压缩包和[FDDB-folds.tgz](http://vis-www.cs.umass.edu/fddb/FDDB-folds.tgz)压缩包，originalPics.tar.gz压缩包包含未标注的图片，FDDB-folds.tgz包含标注信息。
+- 数据集大小：包含2,845张图片和5,171个人脸标注。
+    - originalPics.tar.gz：553M
+    - FDDB-folds.tgz：1M
+- 在dataset文件夹下新建文件夹FDDB。
+- 解压originalPics.tar.gz至FDDB，包含两个文件夹2002和2003：
+  ````bash
+  ├── 2002
+  │ ├── 07
+  │ ├── 08
+  │ ├── 09
+  │ ├── 10
+  │ ├── 11
+  │ └── 12
+  ├── 2003
+  │ ├── 01
+  │ ├── 02
+  │ ├── 03
+  │ ├── 04
+  │ ├── 05
+  │ ├── 06
+  │ ├── 07
+  │ ├── 08
+  │ └── 09
+  ````
+- 解压FDDB-folds.tgz至FDDB，包含20个txt文件：
+  ```bash
+  FDDB-folds
+  │ ├── FDDB-fold-01-ellipseList.txt
+  │ ├── FDDB-fold-01.txt
+  │ ├── FDDB-fold-02-ellipseList.txt
+  │ ├── FDDB-fold-02.txt
+  │ ├── FDDB-fold-03-ellipseList.txt
+  │ ├── FDDB-fold-03.txt
+  │ ├── FDDB-fold-04-ellipseList.txt
+  │ ├── FDDB-fold-04.txt
+  │ ├── FDDB-fold-05-ellipseList.txt
+  │ ├── FDDB-fold-05.txt
+  │ ├── FDDB-fold-06-ellipseList.txt
+  │ ├── FDDB-fold-06.txt
+  │ ├── FDDB-fold-07-ellipseList.txt
+  │ ├── FDDB-fold-07.txt
+  │ ├── FDDB-fold-08-ellipseList.txt
+  │ ├── FDDB-fold-08.txt
+  │ ├── FDDB-fold-09-ellipseList.txt
+  │ ├── FDDB-fold-09.txt
+  │ ├── FDDB-fold-10-ellipseList.txt
+  │ ├── FDDB-fold-10.txt
+  ```
+- 检查2002，2003，FDDB-folds三个文件夹在FDDB文件夹下，且FDDB文件夹在dataset文件夹下。
+---------
+综上，一共有一个训练集文件，是WIDER_train；两个验证集文件，WIDER_val和FDDB。
+编辑`src/config.py`文件，将`_C.HOME`字段改成dataset数据集路径。
+总的数据集目录结构如下：
+```bash
+dataset
+├── FDDB
+│   ├── 2002
+│   ├── 2003
+│   └── FDDB-folds
+└──  WIDERFACE
+    ├── wider_face_split
+    │   ├── readme.txt
+    │   ├── wider_face_test_filelist.txt
+    │   ├── wider_face_test.mat
+    │   ├── wider_face_train_bbx_gt.txt
+    │   ├── wider_face_train.mat
+    │   ├── wider_face_val_bbx_gt.txt
+    │   └── wider_face_val.mat
+    ├── WIDER_train
+    │   └── images
+    └── WIDER_val
+        └── images
+```
+# 环境要求
+- 硬件(Ascend/GPU/CPU)
+    - 使用GPU搭建硬件环境
+- 框架
+    [MindSpore](https://www.mindspore.cn/install/en)
+- 如需查看详情，请参见如下资源：
+    - [MindSpore教程](https://www.mindspore.cn/tutorials/zh-CN/master/index.html)
+    - [MindSpore Python API](https://www.mindspore.cn/docs/api/zh-CN/master/index.html)
+# 快速入门
+通过官方网站安装MindSpore后，您可以按照如下步骤进行训练和评估：
+在开始训练前，需要进行以下准备工作：
+1. 检查`src/config.py`文件的`_C.HOME`字段为dataset文件夹路径。
+2. 对wider_face_train_bbx_gt.txt和wider_face_val_bbx_gt.txt文件进行预处理以生成face_train.txt和face_val.txt文件。
+````bash
+# 预处理wider_face_train_bbx_gt.txt和wider_face_val_bbx_gt.txt文件
+# 进入项目主目录
+python preprocess.py
+# 成功执行后data文件夹下出现face_train.txt和face_val.txt
+````
+3. 生成face_val.txt的mindrecord文件，用于训练过程中验证每一轮模型精度，找出最佳训练模型。
+```bash
+bash scripts/generate_mindrecord.sh
+# 成功执行后data文件夹下出现val.mindrecord和val.mindrecord.db文件
+```
+4. 下载预训练完成的[vgg16.ckpt](https://pan.baidu.com/s/1e5qSW4e1QVZRnbyGRWi91Q?pwd=dryt)文件，该预训练模型转自PyTorch。
+完成以上步骤后，开始训练模型。
+1. 单卡训练
+```bash
+bash scripts/run_standalone_train_gpu.sh DEVICE_ID VGG16_CKPT VAL_MINDRECORD_FILE
+# example: bash scripts/run_standalone_train_gpu.sh 0 vgg16.ckpt val.mindrecord
+```
+2. 多卡训练
+```bash
+bash scripts/run_distribute_train_gpu.sh DEVICE_NUM VGG16_CKPT VAL_MINDRECORD_FILE
+# example: bash scripts/run_distribute_train_gpu.sh 8 vgg16.ckpt val.mindrecord
+```
+训练完毕后，开始验证PyramidBox模型。
+3. 评估模型
+``` bash
+# 用FDDB数据集评估
+bash scripts/run_eval_gpu.sh PYRAMIDBOX_CKPT
+# example: bash scripts/run_eval_gpu.sh checkpoints/pyramidbox.ckpt
+```
+# 脚本说明
+## 脚本及样例代码
+```bash
+PyramidBox
+├── data                                        // 保存预处理后数据集文件和mindrecord文件
+├── eval.py                                     // 评估模型脚本
+├── preprocess.py                               // 数据集标注文件预处理脚本
+├── generate_mindrecord.py                      // 创建mindrecord文件脚本
+├── README_CN.md                                // PyramidBox中文描述文档
+├── scripts
+│   ├── generate_mindrecord_onet.sh             // 生成用于验证的mindrecord文件shell脚本
+│   ├── run_distributed_train_gpu.sh            // GPU多卡训练shell脚本
+│   ├── run_eval_gpu.sh                         // GPU模型评估shell脚本
+│   └── run_standalone_train_gpu.sh             // GPU单卡训练shell脚本
+├── src
+│   ├── augmentations.py                        // 数据增强脚本
+│   ├── dataset.py                              // 数据集脚本
+│   ├── evaluate.py                             // 模型评估脚本
+│   ├── loss.py                                 // 损失函数
+│   ├── config.py                               // 配置文件
+│   ├── bbox_utils.py                           // box处理函数
+│   ├── detection.py                            // decode模型预测点和置信度
+│   ├── prior_box.py                            // 默认候选框生成脚本
+│   └── pyramidbox.py                           // PyramidBox模型
+└── train.py                                    // 训练模型脚本
+```
+## 脚本参数
+### 训练模型
+```bash
+usage: train.py [-h] [--basenet BASENET] [--batch_size BATCH_SIZE]
+                [--num_workers NUM_WORKERS] [--device_target {GPU,Ascend}]
+                [--lr LR] [--momentum MOMENTUM] [--weight_decay WEIGHT_DECAY]
+                [--gamma GAMMA] [--distribute DISTRIBUTE]
+                [--save_folder SAVE_FOLDER] [--epoches EPOCHES]
+                [--val_mindrecord VAL_MINDRECORD]
+Pyramidbox face Detector Training With MindSpore
+optional arguments:
+  -h, --help            show this help message and exit
+  --basenet BASENET     Pretrained base model
+  --batch_size BATCH_SIZE
+                        Batch size for training
+  --num_workers NUM_WORKERS
+                        Number of workers used in dataloading
+  --device_target {GPU,Ascend}
+                        device for training
+  --lr LR, --learning-rate LR
+                        initial learning rate
+  --momentum MOMENTUM   Momentum value for optim
+  --weight_decay WEIGHT_DECAY
+                        Weight decay for SGD
+  --gamma GAMMA         Gamma update for SGD
+  --distribute DISTRIBUTE
+                        Use mutil Gpu training
+  --save_folder SAVE_FOLDER
+                        Directory for saving checkpoint models
+  --epoches EPOCHES     Epoches to train model
+  --val_mindrecord VAL_MINDRECORD
+                        Path of val mindrecord file
+```
+### 评估模型
+```bash
+usage: eval.py [-h] [--model MODEL] [--thresh THRESH]
+PyramidBox Evaluatuon on Fddb
+optional arguments:
+  -h, --help       show this help message and exit
+  --model MODEL    trained model
+  --thresh THRESH  Final confidence threshold
+```
+### 配置参数
+```bash
+config.py:
+    LR_STEPS: 单卡训练学习率衰减步数
+    DIS_LR_STEPS: 多卡训练学习率衰减步数
+    FEATURE_MAPS: 训练集数据特征形状列表
+    INPUT_SIZE: 输入数据大小
+    STEPS: 生成默认候选框步数
+    ANCHOR_SIZES: 默认候选框尺寸
+    NUM_CLASSES: 分类类别数
+    OVERLAP_THRESH: 重合度阈值
+    NEG_POS_RATIOS: 负样本与正样本比例
+    NMS_THRESH: nms阈值
+    TOP_K: top k数量
+    KEEP_TOP_K: 保留的top k数量
+    CONF_THRESH: 置信度阈值
+    HOME: 数据集主目录
+    FACE.FILE_DIR: data文件夹路径
+    FACE.TRIN_FILE: face_train.txt文件
+    FACE.VAL_FILE: face_val.txt文件
+    FACE.FDDB_DIR: FDDB文件夹
+    FACE.WIDER_DIR: WIDER face文件夹
+```
+## 训练过程
+在开始训练之前，请确保已完成准备工作，即：
+1. `src/config.py`文件的`_C.HOME`字段为dataset文件夹路径
+2. 对wider_face_train_bbx_gt.txt和wider_face_val_bbx_gt.txt文件进行预处理以生成face_train.txt和face_val.txt文件。
+3. 生成face_val.txt的mindrecord文件。
+4. 下载预训练完成的[vgg16.ckpt](https://pan.baidu.com/s/1e5qSW4e1QVZRnbyGRWi91Q?pwd=dryt)文件。
+准备工作完成后方可训练。
+### 单卡训练
+```bash
+bash scripts/run_standalone_train_gpu.sh DEVICE_ID VGG16_CKPT VAL_MINDRECORD_FILE
+# example: bash scripts/run_standalone_train_gpu.sh 0 vgg16.ckpt val.mindrecord
+```
+训练过程会在后台运行，训练模型将保存在`checkpoints`文件夹中，可以通过`logs/training_gpu.log`文件查看训练输出，输出结果如下所示：
+```bash
+epoch: 2 step: 456, loss is 0.3661264
+epoch: 2 step: 457, loss is 0.32284224
+epoch: 2 step: 458, loss is 0.29254544
+epoch: 2 step: 459, loss is 0.32631972
+epoch: 2 step: 460, loss is 0.3065704
+epoch: 2 step: 461, loss is 0.3995605
+epoch: 2 step: 462, loss is 0.2614449
+epoch: 2 step: 463, loss is 0.50305885
+epoch: 2 step: 464, loss is 0.30908597
+···
+```
+### 多卡训练
+```bash
+bash scripts/run_distribute_train_gpu.sh [DEVICE_NUM] [VGG16_CKPT] [VAL_MINDRECORD_FILE]
+# example: bash scripts/run_distribute_train_gpu.sh 8 vgg16.ckpt val.mindrecord
+```
+训练过程会在后台运行，只保存第一张卡的训练模型，训练模型将保存在`checkpoints/distribute_0/`文件夹中，可以通过`logs/distribute_training_gpu.log`文件查看训练输出，输出结果如下所示：
+```bash
+epoch: 1 total step: 2, step: 2, loss is 25.479286
+epoch: 1 total step: 2, step: 2, loss is 30.297405
+epoch: 1 total step: 2, step: 2, loss is 28.816475
+epoch: 1 total step: 2, step: 2, loss is 25.439453
+epoch: 1 total step: 2, step: 2, loss is 28.585438
+epoch: 1 total step: 2, step: 2, loss is 31.117134
+epoch: 1 total step: 2, step: 2, loss is 25.770748
+epoch: 1 total step: 2, step: 2, loss is 27.557945
+epoch: 1 total step: 3, step: 3, loss is 28.352016
+epoch: 1 total step: 3, step: 3, loss is 31.99873
+epoch: 1 total step: 3, step: 3, loss is 31.426039
+epoch: 1 total step: 3, step: 3, loss is 24.02226
+epoch: 1 total step: 3, step: 3, loss is 30.12824
+epoch: 1 total step: 3, step: 3, loss is 29.977898
+epoch: 1 total step: 3, step: 3, loss is 24.06476
+epoch: 1 total step: 3, step: 3, loss is 28.573633
+epoch: 1 total step: 4, step: 4, loss is 28.599226
+epoch: 1 total step: 4, step: 4, loss is 34.262005
+epoch: 1 total step: 4, step: 4, loss is 30.732353
+epoch: 1 total step: 4, step: 4, loss is 28.62697
+epoch: 1 total step: 4, step: 4, loss is 39.44549
+epoch: 1 total step: 4, step: 4, loss is 27.754185
+epoch: 1 total step: 4, step: 4, loss is 26.15754
+...
+```
+## 评估过程
+```bash
+bash scripts/run_eval_gpu.sh [PYRAMIDBOX_CKPT]
+# example: bash scripts/run_eval_gpu.sh checkpoints/pyramidbox.ckpt
+```
+注：模型名称为`pyramidbox_best_{epoch}.ckpt`，epoch表示该检查点保存时训练的轮数，epoch越大，WIDER val的loss值越小，模型精度相对越高，因此在评估最佳模型时，优先评估epoch最大的模型，按照epoch从大到小的顺序评估。
+评估过程会在后台进行，评估结果可以通过`logs/eval_gpu.log`文件查看，输出结果如下所示：
+```bash
+==================== Results ====================
+FDDB-fold-1 Val AP: 0.9614604685893
+FDDB-fold-2 Val AP: 0.9615593696135745
+FDDB-fold-3 Val AP: 0.9607889632039851
+FDDB-fold-4 Val AP: 0.972454404596466
+FDDB-fold-5 Val AP: 0.9734522365236052
+FDDB-fold-6 Val AP: 0.952158002966933
+FDDB-fold-7 Val AP: 0.9618735923917133
+FDDB-fold-8 Val AP: 0.9501671313630741
+FDDB-fold-9 Val AP: 0.9539008001056393
+FDDB-fold-10 Val AP: 0.9664355605240443
+FDDB Dataset Average AP: 0.9614250529878333
+=================================================
+```
+# 模型描述
+## 性能
+| 参数                 | PyramidBox                                               |
+| -------------------- | ------------------------------------------------------- |
+| 资源                 | GPU(Tesla V100 SXM2)，CPU 2.1GHz 24cores，Memory 128G|
+| 上传日期             | 2022-09-17                                             |
+| MindSpore版本        | 1.8.1                                                  |
+| 数据集               | WIDER Face, FDDB |
+| 训练参数             | epoch=100,batch_size=4, lr=5e-4 |
+| 优化器               | SGD                                                |
+| 损失函数             | SoftmaxCrossEntropyWithLogits, SmoothL1Loss                       |
+| 输出                 | 坐标，置信度                                               |
+| 损失                 | 2-6                |
+| 速度                 | 570毫秒/步(单卡)  650毫秒/步(八卡)                            |
+| 总时长               | 50时58分(单卡)；7时12分(八卡)                                                |
+| 微调检查点 | 655M (.ckpt文件)       |
--- a/research/cv/PyramidBox/eval.py
+++ b/research/cv/PyramidBox/eval.py
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+# This file was copied from project [ZhaoWeicheng][Pyramidbox.pytorch]
+import os
+import argparse
+from PIL import Image
+from mindspore import Tensor, context
+from mindspore import load_checkpoint, load_param_into_net
+import numpy as np
+from src.config import cfg
+from src.pyramidbox import build_net
+from src.augmentations import to_chw_bgr
+from src.prior_box import PriorBox
+from src.detection import Detect
+from src.evaluate import evaluation
+parser = argparse.ArgumentParser(description='PyramidBox Evaluatuon on Fddb')
+parser.add_argument('--model', type=str, default='checkpoints/pyramidbox.pth', help='trained model')
+parser.add_argument('--thresh', default=0.1, type=float, help='Final confidence threshold')
+args = parser.parse_args()
+FDDB_IMG_DIR = cfg.FACE.FDDB_DIR
+FDDB_FOLD_DIR = os.path.join(FDDB_IMG_DIR, 'FDDB-folds')
+FDDB_OUT_DIR = 'FDDB-out'
+if not os.path.exists(FDDB_OUT_DIR):
+    os.mkdir(FDDB_OUT_DIR)
+def detect_face(net_, img_, thresh):
+    x = to_chw_bgr(img_).astype(np.float32)
+    x -= cfg.img_mean
+    x = x[[2, 1, 0], :, :]
+    x = Tensor(x)[None, :, :, :]
+    size = x.shape[2:]
+    loc, conf, feature_maps = net_(x)
+    prior_box = PriorBox(cfg, feature_maps, size, 'test')
+    default_priors = prior_box.forward()
+    detections = Detect(cfg).detect(loc, conf, default_priors)
+    scale = np.array([img_.shape[1], img_.shape[0], img_.shape[1], img_.shape[0]])
+    bboxes = []
+    for i in range(detections.shape[1]):
+        j = 0
+        while detections[0, i, j, 0] >= thresh:
+            box = []
+            score = detections[0, i, j, 0]
+            pt = (detections[0, i, j, 1:] * scale).astype(np.int32)
+            j += 1
+            box += [pt[0], pt[1], pt[2] - pt[0], pt[3] - pt[1], score]
+            bboxes += [box]
+    return bboxes
+if __name__ == '__main__':
+    context.set_context(mode=context.PYNATIVE_MODE)
+    net = build_net('test', cfg.NUM_CLASSES)
+    params = load_checkpoint(args.model)
+    load_param_into_net(net, params)
+    net.set_train(False)
+    print("Start detecting FDDB images")
+    for index in range(1, 11):
+        if not os.path.exists(os.path.join(FDDB_OUT_DIR, str(index))):
+            os.mkdir(os.path.join(FDDB_OUT_DIR, str(index)))
+        print(f"Detecting folder {index}")
+        file_path = os.path.join(cfg.FACE.FDDB_DIR, 'FDDB-folds', 'FDDB-fold-%02d.txt' % index)
+        with open(file_path, 'r') as f:
+            lines = f.readlines()
+            for line in lines:
+                line = line.strip('\n')
+                image_path = os.path.join(cfg.FACE.FDDB_DIR, line) + '.jpg'
+                img = Image.open(image_path)
+                if img.mode == 'L':
+                    img = img.convert('RGB')
+                img = np.array(img)
+                line = line.replace('/', '_')
+                with open(os.path.join(FDDB_OUT_DIR, str(index), line + '.txt'), 'w') as w:
+                    w.write(line)
+                    w.write('\n')
+                    boxes = detect_face(net, img, args.thresh)
+                    if not boxes is None:
+                        w.write(str(len(boxes)))
+                        w.write('\n')
+                        for box_ in boxes:
+                            w.write(f'{int(box_[0])} {int(box_[1])} {int(box_[2])} {int(box_[3])} {box_[4]}\n')
+    print("Detection Done!")
+    print("Start evluation!")
+    evaluation(FDDB_OUT_DIR, FDDB_FOLD_DIR)
--- a/research/cv/PyramidBox/generate_mindrecord.py
+++ b/research/cv/PyramidBox/generate_mindrecord.py
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+import argparse
+import os
+from mindspore.mindrecord import FileWriter
+from src.dataset import WIDERDataset
+from src.config import cfg
+parser = argparse.ArgumentParser(description='Generate Mindrecord File for training')
+parser.add_argument('--prefix', type=str, default='./data', help="Directory to store mindrecord file")
+parser.add_argument('--val_name', type=str, default='val.mindrecord', help='Name of val mindrecord file')
+args = parser.parse_args()
+def data_to_mindrecord(mindrecord_prefix, mindrecord_name, dataset):
+    if not os.path.exists(mindrecord_prefix):
+        os.mkdir(mindrecord_prefix)
+    mindrecord_path = os.path.join(mindrecord_prefix, mindrecord_name)
+    writer = FileWriter(mindrecord_path, 1, overwrite=True)
+    data_json = {
+        'img': {"type": "float32", "shape": [3, 640, 640]},
+        'face_loc': {"type": "float32", "shape": [34125, 4]},
+        'face_conf': {"type": "float32", "shape": [34125]},
+        'head_loc': {"type": "float32", "shape": [34125, 4]},
+        'head_conf': {"type": "float32", "shape": [34125]}
+    }
+    writer.add_schema(data_json, 'data_json')
+    count = 0
+    for d in dataset:
+        img, face_loc, face_conf, head_loc, head_conf = d
+        row = {
+            "img": img,
+            "face_loc": face_loc,
+            "face_conf": face_conf,
+            "head_loc": head_loc,
+            "head_conf": head_conf
+        }
+        writer.write_raw_data([row])
+        count += 1
+    writer.commit()
+    print("Total train data: ", count)
+    print("Create mindrecord done!")
+if __name__ == '__main__':
+    print("Start generating val mindrecord file")
+    ds_val = WIDERDataset(cfg.FACE.VAL_FILE, mode='val')
+    data_to_mindrecord(args.prefix, args.val_name, ds_val)
--- a/research/cv/PyramidBox/preprocess.py
+++ b/research/cv/PyramidBox/preprocess.py
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+# This file was copied from project [ZhaoWeicheng][Pyramidbox.pytorch]
+import os
+from src.config import cfg
+WIDER_ROOT = os.path.join(cfg.HOME, 'WIDERFACE')
+train_list_file = os.path.join(WIDER_ROOT, 'wider_face_split',
+                               'wider_face_train_bbx_gt.txt')
+val_list_file = os.path.join(WIDER_ROOT, 'wider_face_split',
+                             'wider_face_val_bbx_gt.txt')
+WIDER_TRAIN = os.path.join(WIDER_ROOT, 'WIDER_train', 'images')
+WIDER_VAL = os.path.join(WIDER_ROOT, 'WIDER_val', 'images')
+def parse_wider_file(root, file):
+    with open(file, 'r') as fr:
+        lines = fr.readlines()
+    face_count = []
+    img_paths = []
+    face_loc = []
+    img_faces = []
+    count = 0
+    flag = False
+    for k, line in enumerate(lines):
+        line = line.strip().strip('\n')
+        if count > 0:
+            line = line.split(' ')
+            count -= 1
+            loc = [int(line[0]), int(line[1]), int(line[2]), int(line[3])]
+            face_loc += [loc]
+        if flag:
+            face_count += [int(line)]
+            flag = False
+            count = int(line)
+        if 'jpg' in line:
+            img_paths += [os.path.join(root, line)]
+            flag = True
+    total_face = 0
+    for k in face_count:
+        face_ = []
+        for x in range(total_face, total_face + k):
+            face_.append(face_loc[x])
+        img_faces += [face_]
+        total_face += k
+    return img_paths, img_faces
+def wider_data_file():
+    if not os.path.exists(cfg.FACE.FILE_DIR):
+        os.mkdir(cfg.FACE.FILE_DIR)
+    img_paths, bbox = parse_wider_file(WIDER_TRAIN, train_list_file)
+    fw = open(cfg.FACE.TRAIN_FILE, 'w')
+    for index in range(len(img_paths)):
+        path = img_paths[index]
+        boxes = bbox[index]
+        fw.write(path)
+        fw.write(' {}'.format(len(boxes)))
+        for box in boxes:
+            data = ' {} {} {} {} {}'.format(box[0], box[1], box[2], box[3], 1)
+            fw.write(data)
+        fw.write('\n')
+    fw.close()
+    img_paths, bbox = parse_wider_file(WIDER_VAL, val_list_file)
+    fw = open(cfg.FACE.VAL_FILE, 'w')
+    for index in range(len(img_paths)):
+        path = img_paths[index]
+        boxes = bbox[index]
+        fw.write(path)
+        fw.write(' {}'.format(len(boxes)))
+        for box in boxes:
+            data = ' {} {} {} {} {}'.format(box[0], box[1], box[2], box[3], 1)
+            fw.write(data)
+        fw.write('\n')
+    fw.close()
+if __name__ == '__main__':
+    wider_data_file()
--- a/research/cv/PyramidBox/requirements.txt
+++ b/research/cv/PyramidBox/requirements.txt
+easydict==1.9
+mindspore-gpu==1.8.1
+numpy==1.21.5
+opencv-python==4.5.5.62
+Pillow==9.0.0
+scikit-image==0.18.3
+tqdm==4.64.1
\ No newline at end of file
--- a/research/cv/PyramidBox/scripts/generate_mindrecord.sh
+++ b/research/cv/PyramidBox/scripts/generate_mindrecord.sh
+#!/bin/bash
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+echo "=============================================================================================================="
+echo "Please run the script as: "
+echo "bash generate_mindrecord.sh"
+echo "for example: bash generate_mindrecord.sh"
+echo "=============================================================================================================="
+PROJECT_DIR=$(cd "$(dirname "$0")" || exit; pwd)
+LOG_DIR=$PROJECT_DIR/../logs
+if [ ! -d $LOG_DIR ]
+then
+  mkdir $LOG_DIR
+fi
+python $PROJECT_DIR/../generate_mindrecord.py > $LOG_DIR/generate_mindrecord.log 2>&1 &
+echo "The data log is at /logs/generate_mindrecord.log"
--- a/research/cv/PyramidBox/scripts/run_distribute_train_gpu.sh
+++ b/research/cv/PyramidBox/scripts/run_distribute_train_gpu.sh
+#!/bin/bash
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+echo "=============================================================================================================="
+echo "Please run the script as: "
+echo "bash run_distribute_train_gpu.sh DEVICE_NUM VGG16_CKPT VAL_MINDRECORD_FILE"
+echo "for example: bash run_distribute_train_gpu.sh 8 vgg16.ckpt val.mindrecord"
+echo "=============================================================================================================="
+DEVICE_NUM=$1
+VGG16=$2
+VAL_MINDRECORD=$3
+if [ $# -lt 3 ];
+then
+  echo "---------------------ERROR----------------------"
+  echo "You must specify number of gpu devices, vgg16 checkpoint, mindrecord file for evaling"
+  exit
+fi
+PROJECT_DIR=$(cd "$(dirname "$0")" || exit; pwd)
+LOG_DIR=$PROJECT_DIR/../logs
+if [ ! -d $LOG_DIR ]
+then
+  mkdir $LOG_DIR
+fi
+export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python
+mpirun -n $DEVICE_NUM --allow-run-as-root python $PROJECT_DIR/../train.py \
+    --distribute True \
+    --lr 5e-4 \
+    --device_target GPU \
+    --val_mindrecord $VAL_MINDRECORD \
+    --epoches 100 \
+    --basenet $VGG16 \
+    --num_workers 1 \
+    --batch_size 4 > $LOG_DIR/distribute_training_gpu.log 2>&1 &
+echo "The distributed train log is at /logs/distribute_training_gpu.log"
--- a/research/cv/PyramidBox/scripts/run_eval_gpu.sh
+++ b/research/cv/PyramidBox/scripts/run_eval_gpu.sh
+#!/bin/bash
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+echo "=============================================================================================================="
+echo "Please run the script as: "
+echo "bash run_eval_gpu.sh PYRAMIDBOX_CKPT"
+echo "for example: bash run_eval_gpu.sh pyramidbox.ckpt"
+echo "=============================================================================================================="
+if [ $# -lt 1 ];
+then
+  echo "---------------------ERROR----------------------"
+  echo "You must specify pyramidbox checkpoint"
+  exit
+fi
+CKPT=$1
+PROJECT_DIR=$(cd "$(dirname "$0")" || exit; pwd)
+LOG_DIR=$PROJECT_DIR/../logs
+if [ ! -d $LOG_DIR ]
+then
+  mkdir $LOG_DIR
+fi
+export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python
+python $PROJECT_DIR/../eval.py --model $CKPT > $LOG_DIR/eval_gpu.log 2>&1 &
--- a/research/cv/PyramidBox/scripts/run_standalone_train_gpu.sh
+++ b/research/cv/PyramidBox/scripts/run_standalone_train_gpu.sh
+#!/bin/bash
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+echo "=============================================================================================================="
+echo "Please run the script as: "
+echo "bash run_standalone_train_gpu.sh DEVICE_ID VGG16_CKPT VAL_MINDRECORD_FILE"
+echo "for example: bash run_standalone_train_gpu.sh 0 vgg16.ckpt val.mindrecord"
+echo "=============================================================================================================="
+DEVICE_ID=$1
+VGG16=$2
+VAL_MINDRECORD=$3
+if [ $# -lt 3 ];
+then
+  echo "---------------------ERROR----------------------"
+  echo "You must specify gpu device, vgg16 checkpoint and mindrecord file for valing"
+  exit
+fi
+PROJECT_DIR=$(cd "$(dirname "$0")" || exit; pwd)
+LOG_DIR=$PROJECT_DIR/../logs
+if [ ! -d $LOG_DIR ]
+then
+  mkdir $LOG_DIR
+fi
+export DEVICE_ID=$DEVICE_ID
+export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python
+python $PROJECT_DIR/../train.py \
+    --device_target GPU \
+    --epoches 100 \
+    --lr 5e-4 \
+    --basenet $VGG16 \
+    --num_workers 2 \
+    --val_mindrecord $VAL_MINDRECORD \
+    --batch_size 4 > $LOG_DIR/training_gpu.log 2>&1 &
+echo "The standalone train log is at /logs/training_gpu.log"
--- a/research/cv/PyramidBox/src/augmentations.py
+++ b/research/cv/PyramidBox/src/augmentations.py
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+# This file was copied from project [ZhaoWeicheng][Pyramidbox.pytorch]
+import math
+import random
+import six
+import cv2
+import numpy as np
+from PIL import Image, ImageEnhance
+from src.config import cfg
+class Sampler():
+    def __init__(self,
+                 max_sample,
+                 max_trial,
+                 min_scale,
+                 max_scale,
+                 min_aspect_ratio,
+                 max_aspect_ratio,
+                 min_jaccard_overlap,
+                 max_jaccard_overlap,
+                 min_object_coverage,
+                 max_object_coverage,
+                 use_square=False):
+        self.max_sample = max_sample
+        self.max_trial = max_trial
+        self.min_scale = min_scale
+        self.max_scale = max_scale
+        self.min_aspect_ratio = min_aspect_ratio
+        self.max_aspect_ratio = max_aspect_ratio
+        self.min_jaccard_overlap = min_jaccard_overlap
+        self.max_jaccard_overlap = max_jaccard_overlap
+        self.min_object_coverage = min_object_coverage
+        self.max_object_coverage = max_object_coverage
+        self.use_square = use_square
+def intersect(box_a, box_b):
+    max_xy = np.minimum(box_a[:, 2:], box_b[2:])
+    min_xy = np.maximum(box_a[:, :2], box_b[:2])
+    inter = np.clip((max_xy - min_xy), a_min=0, a_max=np.inf)
+    return inter[:, 0] * inter[:, 1]
+def jaccard_numpy(box_a, box_b):
+    """Compute the jaccard overlap of two sets of boxes.  The jaccard overlap
+    is simply the intersection over union of two boxes.
+    E.g.:
+        A ∩ B / A ∪ B = A ∩ B / (area(A) + area(B) - A ∩ B)
+    Args:
+        box_a: Multiple bounding boxes, Shape: [num_boxes,4]
+        box_b: Single bounding box, Shape: [4]
+    Return:
+        jaccard overlap: Shape: [box_a.shape[0], box_a.shape[1]]
+    """
+    inter = intersect(box_a, box_b)
+    area_a = ((box_a[:, 2] - box_a[:, 0]) *
+              (box_a[:, 3] - box_a[:, 1]))  # [A,B]
+    area_b = ((box_b[2] - box_b[0]) *
+              (box_b[3] - box_b[1]))  # [A,B]
+    union = area_a + area_b - inter
+    return inter / union  # [A,B]
+class Bbox():
+    def __init__(self, xmin, ymin, xmax, ymax):
+        self.xmin = xmin
+        self.ymin = ymin
+        self.xmax = xmax
+        self.ymax = ymax
+def random_brightness(img):
+    prob = np.random.uniform(0, 1)
+    if prob < cfg.brightness_prob:
+        delta = np.random.uniform(-cfg.brightness_delta,
+                                  cfg.brightness_delta) + 1
+        img = ImageEnhance.Brightness(img).enhance(delta)
+    return img
+def random_contrast(img):
+    prob = np.random.uniform(0, 1)
+    if prob < cfg.contrast_prob:
+        delta = np.random.uniform(-cfg.contrast_delta,
+                                  cfg.contrast_delta) + 1
+        img = ImageEnhance.Contrast(img).enhance(delta)
+    return img
+def random_saturation(img):
+    prob = np.random.uniform(0, 1)
+    if prob < cfg.saturation_prob:
+        delta = np.random.uniform(-cfg.saturation_delta,
+                                  cfg.saturation_delta) + 1
+        img = ImageEnhance.Color(img).enhance(delta)
+    return img
+def random_hue(img):
+    prob = np.random.uniform(0, 1)
+    if prob < cfg.hue_prob:
+        delta = np.random.uniform(-cfg.hue_delta, cfg.hue_delta)
+        img_hsv = np.array(img.convert('HSV'))
+        img_hsv[:, :, 0] = img_hsv[:, :, 0] + delta
+        img = Image.fromarray(img_hsv, mode='HSV').convert('RGB')
+    return img
+def distort_image(img):
+    prob = np.random.uniform(0, 1)
+    # Apply different distort order
+    if prob > 0.5:
+        img = random_brightness(img)
+        img = random_contrast(img)
+        img = random_saturation(img)
+        img = random_hue(img)
+    else:
+        img = random_brightness(img)
+        img = random_saturation(img)
+        img = random_hue(img)
+        img = random_contrast(img)
+    return img
+def meet_emit_constraint(src_bbox, sample_bbox):
+    center_x = (src_bbox.xmax + src_bbox.xmin) / 2
+    center_y = (src_bbox.ymax + src_bbox.ymin) / 2
+    if sample_bbox.xmin <= center_x <= sample_bbox.xmax and \
+            sample_bbox.ymin <= center_y <= sample_bbox.ymax:
+        return True
+    return False
+def project_bbox(object_bbox, sample_bbox):
+    if object_bbox.xmin >= sample_bbox.xmax or \
+       object_bbox.xmax <= sample_bbox.xmin or \
+       object_bbox.ymin >= sample_bbox.ymax or \
+       object_bbox.ymax <= sample_bbox.ymin:
+        return False
+    proj_bbox = Bbox(0, 0, 0, 0)
+    sample_width = sample_bbox.xmax - sample_bbox.xmin
+    sample_height = sample_bbox.ymax - sample_bbox.ymin
+    proj_bbox.xmin = (object_bbox.xmin - sample_bbox.xmin) / sample_width
+    proj_bbox.ymin = (object_bbox.ymin - sample_bbox.ymin) / sample_height
+    proj_bbox.xmax = (object_bbox.xmax - sample_bbox.xmin) / sample_width
+    proj_bbox.ymax = (object_bbox.ymax - sample_bbox.ymin) / sample_height
+    proj_bbox = clip_bbox(proj_bbox)
+    if bbox_area(proj_bbox) > 0:
+        return proj_bbox
+    return False
+def transform_labels(bbox_labels, sample_bbox):
+    sample_labels = []
+    for i in range(len(bbox_labels)):
+        sample_label = []
+        object_bbox = Bbox(bbox_labels[i][1], bbox_labels[i][2],
+                           bbox_labels[i][3], bbox_labels[i][4])
+        if not meet_emit_constraint(object_bbox, sample_bbox):
+            continue
+        proj_bbox = project_bbox(object_bbox, sample_bbox)
+        if proj_bbox:
+            sample_label.append(bbox_labels[i][0])
+            sample_label.append(float(proj_bbox.xmin))
+            sample_label.append(float(proj_bbox.ymin))
+            sample_label.append(float(proj_bbox.xmax))
+            sample_label.append(float(proj_bbox.ymax))
+            sample_label = sample_label + bbox_labels[i][5:]
+            sample_labels.append(sample_label)
+    return sample_labels
+def expand_image(img, bbox_labels, img_width, img_height):
+    prob = np.random.uniform(0, 1)
+    if prob < cfg.expand_prob:
+        if cfg.expand_max_ratio - 1 >= 0.01:
+            expand_ratio = np.random.uniform(1, cfg.expand_max_ratio)
+            height = int(img_height * expand_ratio)
+            width = int(img_width * expand_ratio)
+            h_off = math.floor(np.random.uniform(0, height - img_height))
+            w_off = math.floor(np.random.uniform(0, width - img_width))
+            expand_bbox = Bbox(-w_off / img_width, -h_off / img_height,
+                               (width - w_off) / img_width,
+                               (height - h_off) / img_height)
+            expand_img = np.ones((height, width, 3))
+            expand_img = np.uint8(expand_img * np.squeeze(cfg.img_mean))
+            expand_img = Image.fromarray(expand_img)
+            expand_img.paste(img, (int(w_off), int(h_off)))
+            bbox_labels = transform_labels(bbox_labels, expand_bbox)
+            return expand_img, bbox_labels, width, height
+    return img, bbox_labels, img_width, img_height
+def clip_bbox(src_bbox):
+    src_bbox.xmin = max(min(src_bbox.xmin, 1.0), 0.0)
+    src_bbox.ymin = max(min(src_bbox.ymin, 1.0), 0.0)
+    src_bbox.xmax = max(min(src_bbox.xmax, 1.0), 0.0)
+    src_bbox.ymax = max(min(src_bbox.ymax, 1.0), 0.0)
+    return src_bbox
+def bbox_area(src_bbox):
+    if src_bbox.xmax < src_bbox.xmin or src_bbox.ymax < src_bbox.ymin:
+        return 0.
+    width = src_bbox.xmax - src_bbox.xmin
+    height = src_bbox.ymax - src_bbox.ymin
+    return width * height
+def intersect_bbox(bbox1, bbox2):
+    if bbox2.xmin > bbox1.xmax or bbox2.xmax < bbox1.xmin or \
+            bbox2.ymin > bbox1.ymax or bbox2.ymax < bbox1.ymin:
+        intersection_box = Bbox(0.0, 0.0, 0.0, 0.0)
+    else:
+        intersection_box = Bbox(
+            max(bbox1.xmin, bbox2.xmin),
+            max(bbox1.ymin, bbox2.ymin),
+            min(bbox1.xmax, bbox2.xmax), min(bbox1.ymax, bbox2.ymax))
+    return intersection_box
+def bbox_coverage(bbox1, bbox2):
+    inter_box = intersect_bbox(bbox1, bbox2)
+    intersect_size = bbox_area(inter_box)
+    if intersect_size > 0:
+        bbox1_size = bbox_area(bbox1)
+        return intersect_size / bbox1_size
+    return 0.
+def generate_batch_random_samples(batch_sampler, bbox_labels, image_width,
+                                  image_height, scale_array, resize_width,
+                                  resize_height):
+    sampled_bbox = []
+    for sampler in batch_sampler:
+        found = 0
+        for _ in range(sampler.max_trial):
+            if found >= sampler.max_sample:
+                break
+            sample_bbox = data_anchor_sampling(
+                sampler, bbox_labels, image_width, image_height, scale_array,
+                resize_width, resize_height)
+            if sample_bbox == 0:
+                break
+            if satisfy_sample_constraint(sampler, sample_bbox, bbox_labels):
+                sampled_bbox.append(sample_bbox)
+                found = found + 1
+    return sampled_bbox
+def data_anchor_sampling(sampler, bbox_labels, image_width, image_height,
+                         scale_array, resize_width, resize_height):
+    num_gt = len(bbox_labels)
+    # np.random.randint range: [low, high)
+    rand_idx = np.random.randint(0, num_gt) if num_gt != 0 else 0
+    if num_gt != 0:
+        norm_xmin = bbox_labels[rand_idx][1]
+        norm_ymin = bbox_labels[rand_idx][2]
+        norm_xmax = bbox_labels[rand_idx][3]
+        norm_ymax = bbox_labels[rand_idx][4]
+        xmin = norm_xmin * image_width
+        ymin = norm_ymin * image_height
+        wid = image_width * (norm_xmax - norm_xmin)
+        hei = image_height * (norm_ymax - norm_ymin)
+        range_size = 0
+        area = wid * hei
+        for scale_ind in range(0, len(scale_array) - 1):
+            if scale_array[scale_ind] ** 2 < area < scale_array[scale_ind + 1] ** 2:
+                range_size = scale_ind + 1
+                break
+        if area > scale_array[len(scale_array) - 2]**2:
+            range_size = len(scale_array) - 2
+        scale_choose = 0.0
+        if range_size == 0:
+            rand_idx_size = 0
+        else:
+            # np.random.randint range: [low, high)
+            rng_rand_size = np.random.randint(0, range_size + 1)
+            rand_idx_size = rng_rand_size % (range_size + 1)
+        if rand_idx_size == range_size:
+            min_resize_val = scale_array[rand_idx_size] / 2.0
+            max_resize_val = min(2.0 * scale_array[rand_idx_size],
+                                 2 * math.sqrt(wid * hei))
+            scale_choose = random.uniform(min_resize_val, max_resize_val)
+        else:
+            min_resize_val = scale_array[rand_idx_size] / 2.0
+            max_resize_val = 2.0 * scale_array[rand_idx_size]
+            scale_choose = random.uniform(min_resize_val, max_resize_val)
+        sample_bbox_size = wid * resize_width / scale_choose
+        w_off_orig = 0.0
+        h_off_orig = 0.0
+        if sample_bbox_size < max(image_height, image_width):
+            if wid <= sample_bbox_size:
+                w_off_orig = np.random.uniform(xmin + wid - sample_bbox_size,
+                                               xmin)
+            else:
+                w_off_orig = np.random.uniform(xmin,
+                                               xmin + wid - sample_bbox_size)
+            if hei <= sample_bbox_size:
+                h_off_orig = np.random.uniform(ymin + hei - sample_bbox_size,
+                                               ymin)
+            else:
+                h_off_orig = np.random.uniform(ymin,
+                                               ymin + hei - sample_bbox_size)
+        else:
+            w_off_orig = np.random.uniform(image_width - sample_bbox_size, 0.0)
+            h_off_orig = np.random.uniform(
+                image_height - sample_bbox_size, 0.0)
+        w_off_orig = math.floor(w_off_orig)
+        h_off_orig = math.floor(h_off_orig)
+        # Figure out top left coordinates.
+        w_off = 0.0
+        h_off = 0.0
+        w_off = float(w_off_orig / image_width)
+        h_off = float(h_off_orig / image_height)
+        sampled_bbox = Bbox(w_off, h_off,
+                            w_off + float(sample_bbox_size / image_width),
+                            h_off + float(sample_bbox_size / image_height))
+        return sampled_bbox
+    return 0
+def jaccard_overlap(sample_bbox, object_bbox):
+    if sample_bbox.xmin >= object_bbox.xmax or \
+            sample_bbox.xmax <= object_bbox.xmin or \
+            sample_bbox.ymin >= object_bbox.ymax or \
+            sample_bbox.ymax <= object_bbox.ymin:
+        return 0
+    intersect_xmin = max(sample_bbox.xmin, object_bbox.xmin)
+    intersect_ymin = max(sample_bbox.ymin, object_bbox.ymin)
+    intersect_xmax = min(sample_bbox.xmax, object_bbox.xmax)
+    intersect_ymax = min(sample_bbox.ymax, object_bbox.ymax)
+    intersect_size = (intersect_xmax - intersect_xmin) * (
+        intersect_ymax - intersect_ymin)
+    sample_bbox_size = bbox_area(sample_bbox)
+    object_bbox_size = bbox_area(object_bbox)
+    overlap = intersect_size / (
+        sample_bbox_size + object_bbox_size - intersect_size)
+    return overlap
+def satisfy_sample_constraint(sampler, sample_bbox, bbox_labels):
+    if sampler.min_jaccard_overlap == 0 and sampler.max_jaccard_overlap == 0:
+        has_jaccard_overlap = False
+    else:
+        has_jaccard_overlap = True
+    if sampler.min_object_coverage == 0 and sampler.max_object_coverage == 0:
+        has_object_coverage = False
+    else:
+        has_object_coverage = True
+    if not has_jaccard_overlap and not has_object_coverage:
+        return True
+    found = False
+    for i in range(len(bbox_labels)):
+        object_bbox = Bbox(bbox_labels[i][1], bbox_labels[i][2],
+                           bbox_labels[i][3], bbox_labels[i][4])
+        if has_jaccard_overlap:
+            overlap = jaccard_overlap(sample_bbox, object_bbox)
+            if sampler.min_jaccard_overlap != 0 and \
+                    overlap < sampler.min_jaccard_overlap:
+                continue
+            if sampler.max_jaccard_overlap != 0 and \
+                    overlap > sampler.max_jaccard_overlap:
+                continue
+            found = True
+        if has_object_coverage:
+            object_coverage = bbox_coverage(object_bbox, sample_bbox)
+            if sampler.min_object_coverage != 0 and \
+                    object_coverage < sampler.min_object_coverage:
+                continue
+            if sampler.max_object_coverage != 0 and \
+                    object_coverage > sampler.max_object_coverage:
+                continue
+            found = True
+        if found:
+            return True
+    return found
+def crop_image_sampling(img, bbox_labels, sample_bbox, image_width,
+                        image_height, resize_width, resize_height,
+                        min_face_size):
+    # no clipping here
+    xmin = int(sample_bbox.xmin * image_width)
+    xmax = int(sample_bbox.xmax * image_width)
+    ymin = int(sample_bbox.ymin * image_height)
+    ymax = int(sample_bbox.ymax * image_height)
+    w_off = xmin
+    h_off = ymin
+    width = xmax - xmin
+    height = ymax - ymin
+    cross_xmin = max(0.0, float(w_off))
+    cross_ymin = max(0.0, float(h_off))
+    cross_xmax = min(float(w_off + width - 1.0), float(image_width))
+    cross_ymax = min(float(h_off + height - 1.0), float(image_height))
+    cross_width = cross_xmax - cross_xmin
+    cross_height = cross_ymax - cross_ymin
+    roi_xmin = 0 if w_off >= 0 else abs(w_off)
+    roi_ymin = 0 if h_off >= 0 else abs(h_off)
+    roi_width = cross_width
+    roi_height = cross_height
+    roi_y1 = int(roi_ymin)
+    roi_y2 = int(roi_ymin + roi_height)
+    roi_x1 = int(roi_xmin)
+    roi_x2 = int(roi_xmin + roi_width)
+    cross_y1 = int(cross_ymin)
+    cross_y2 = int(cross_ymin + cross_height)
+    cross_x1 = int(cross_xmin)
+    cross_x2 = int(cross_xmin + cross_width)
+    sample_img = np.zeros((height, width, 3))
+    # print(sample_img.shape)
+    sample_img[roi_y1: roi_y2, roi_x1: roi_x2] = \
+        img[cross_y1: cross_y2, cross_x1: cross_x2]
+    sample_img = cv2.resize(
+        sample_img, (resize_width, resize_height), interpolation=cv2.INTER_AREA)
+    resize_val = resize_width
+    sample_labels = transform_labels_sampling(bbox_labels, sample_bbox,
+                                              resize_val, min_face_size)
+    return sample_img, sample_labels
+def transform_labels_sampling(bbox_labels, sample_bbox, resize_val,
+                              min_face_size):
+    sample_labels = []
+    for i in range(len(bbox_labels)):
+        sample_label = []
+        object_bbox = Bbox(bbox_labels[i][1], bbox_labels[i][2],
+                           bbox_labels[i][3], bbox_labels[i][4])
+        if not meet_emit_constraint(object_bbox, sample_bbox):
+            continue
+        proj_bbox = project_bbox(object_bbox, sample_bbox)
+        if proj_bbox:
+            real_width = float((proj_bbox.xmax - proj_bbox.xmin) * resize_val)
+            real_height = float((proj_bbox.ymax - proj_bbox.ymin) * resize_val)
+            if real_width * real_height < float(min_face_size * min_face_size):
+                continue
+            else:
+                sample_label.append(bbox_labels[i][0])
+                sample_label.append(float(proj_bbox.xmin))
+                sample_label.append(float(proj_bbox.ymin))
+                sample_label.append(float(proj_bbox.xmax))
+                sample_label.append(float(proj_bbox.ymax))
+                sample_label = sample_label + bbox_labels[i][5:]
+                sample_labels.append(sample_label)
+    return sample_labels
+def generate_sample(sampler, image_width, image_height):
+    scale = np.random.uniform(sampler.min_scale, sampler.max_scale)
+    aspect_ratio = np.random.uniform(sampler.min_aspect_ratio,
+                                     sampler.max_aspect_ratio)
+    aspect_ratio = max(aspect_ratio, (scale**2.0))
+    aspect_ratio = min(aspect_ratio, 1 / (scale**2.0))
+    bbox_width = scale * (aspect_ratio**0.5)
+    bbox_height = scale / (aspect_ratio**0.5)
+    # guarantee a squared image patch after cropping
+    if sampler.use_square:
+        if image_height < image_width:
+            bbox_width = bbox_height * image_height / image_width
+        else:
+            bbox_height = bbox_width * image_width / image_height
+    xmin_bound = 1 - bbox_width
+    ymin_bound = 1 - bbox_height
+    xmin = np.random.uniform(0, xmin_bound)
+    ymin = np.random.uniform(0, ymin_bound)
+    xmax = xmin + bbox_width
+    ymax = ymin + bbox_height
+    sampled_bbox = Bbox(xmin, ymin, xmax, ymax)
+    return sampled_bbox
+def generate_batch_samples(batch_sampler, bbox_labels, image_width,
+                           image_height):
+    sampled_bbox = []
+    for sampler in batch_sampler:
+        found = 0
+        for _ in range(sampler.max_trial):
+            if found >= sampler.max_sample:
+                break
+            sample_bbox = generate_sample(sampler, image_width, image_height)
+            if satisfy_sample_constraint(sampler, sample_bbox, bbox_labels):
+                sampled_bbox.append(sample_bbox)
+                found = found + 1
+    return sampled_bbox
+def crop_image(img, bbox_labels, sample_bbox, image_width, image_height,
+               resize_width, resize_height, min_face_size):
+    sample_bbox = clip_bbox(sample_bbox)
+    xmin = int(sample_bbox.xmin * image_width)
+    xmax = int(sample_bbox.xmax * image_width)
+    ymin = int(sample_bbox.ymin * image_height)
+    ymax = int(sample_bbox.ymax * image_height)
+    sample_img = img[ymin:ymax, xmin:xmax]
+    resize_val = resize_width
+    sample_labels = transform_labels_sampling(bbox_labels, sample_bbox,
+                                              resize_val, min_face_size)
+    return sample_img, sample_labels
+def to_chw_bgr(image):
+    """
+    Transpose image from HWC to CHW and from RBG to BGR.
+    Args:
+        image (np.array): an image with HWC and RBG layout.
+    """
+    # HWC to CHW
+    if len(image.shape) == 3:
+        image = np.swapaxes(image, 1, 2)
+        image = np.swapaxes(image, 1, 0)
+    # RBG to BGR
+    image = image[[2, 1, 0], :, :]
+    return image
+def anchor_crop_image_sampling(img,
+                               bbox_labels,
+                               scale_array,
+                               img_width,
+                               img_height):
+    mean = np.array([104, 117, 123], dtype=np.float32)
+    maxSize = 12000  # max size
+    infDistance = 9999999
+    bbox_labels = np.array(bbox_labels)
+    scale = np.array([img_width, img_height, img_width, img_height])
+    boxes = bbox_labels[:, 1:5] * scale
+    labels = bbox_labels[:, 0]
+    boxArea = (boxes[:, 2] - boxes[:, 0] + 1) * (boxes[:, 3] - boxes[:, 1] + 1)
+    rand_idx = np.random.randint(len(boxArea))
+    rand_Side = boxArea[rand_idx] ** 0.5
+    distance = infDistance
+    anchor_idx = 5
+    for i, anchor in enumerate(scale_array):
+        if abs(anchor - rand_Side) < distance:
+            distance = abs(anchor - rand_Side)
+            anchor_idx = i
+    target_anchor = random.choice(scale_array[0:min(anchor_idx + 1, 5) + 1])
+    ratio = float(target_anchor) / rand_Side
+    ratio = ratio * (2**random.uniform(-1, 1))
+    if int(img_height * ratio * img_width * ratio) > maxSize * maxSize:
+        ratio = (maxSize * maxSize / (img_height * img_width))**0.5
+    interp_methods = [cv2.INTER_LINEAR, cv2.INTER_CUBIC,
+                      cv2.INTER_AREA, cv2.INTER_NEAREST, cv2.INTER_LANCZOS4]
+    interp_method = random.choice(interp_methods)
+    image = cv2.resize(img, None, None, fx=ratio,
+                       fy=ratio, interpolation=interp_method)
+    boxes[:, 0] *= ratio
+    boxes[:, 1] *= ratio
+    boxes[:, 2] *= ratio
+    boxes[:, 3] *= ratio
+    height, width, _ = image.shape
+    sample_boxes = []
+    xmin = boxes[rand_idx, 0]
+    ymin = boxes[rand_idx, 1]
+    bw = (boxes[rand_idx, 2] - boxes[rand_idx, 0] + 1)
+    bh = (boxes[rand_idx, 3] - boxes[rand_idx, 1] + 1)
+    w = h = 640
+    for _ in range(50):
+        if w < max(height, width):
+            if bw <= w:
+                w_off = random.uniform(xmin + bw - w, xmin)
+            else:
+                w_off = random.uniform(xmin, xmin + bw - w)
+            if bh <= h:
+                h_off = random.uniform(ymin + bh - h, ymin)
+            else:
+                h_off = random.uniform(ymin, ymin + bh - h)
+        else:
+            w_off = random.uniform(width - w, 0)
+            h_off = random.uniform(height - h, 0)
+        w_off = math.floor(w_off)
+        h_off = math.floor(h_off)
+        # convert to integer rect x1,y1,x2,y2
+        rect = np.array(
+            [int(w_off), int(h_off), int(w_off + w), int(h_off + h)])
+        # keep overlap with gt box IF center in sampled patch
+        centers = (boxes[:, :2] + boxes[:, 2:]) / 2.0
+        # mask in all gt boxes that above and to the left of centers
+        m1 = (rect[0] <= boxes[:, 0]) * (rect[1] <= boxes[:, 1])
+        # mask in all gt boxes that under and to the right of centers
+        m2 = (rect[2] >= boxes[:, 2]) * (rect[3] >= boxes[:, 3])
+        # mask in that both m1 and m2 are true
+        mask = m1 * m2
+        overlap = jaccard_numpy(boxes, rect)
+        # have any valid boxes? try again if not
+        if not mask.any() and not overlap.max() > 0.7:
+            continue
+        else:
+            sample_boxes.append(rect)
+    sampled_labels = []
+    if sample_boxes:
+        choice_idx = np.random.randint(len(sample_boxes))
+        choice_box = sample_boxes[choice_idx]
+        centers = (boxes[:, :2] + boxes[:, 2:]) / 2.0
+        m1 = (choice_box[0] < centers[:, 0]) * \
+            (choice_box[1] < centers[:, 1])
+        m2 = (choice_box[2] > centers[:, 0]) * \
+            (choice_box[3] > centers[:, 1])
+        mask = m1 * m2
+        current_boxes = boxes[mask, :].copy()
+        current_labels = labels[mask]
+        current_boxes[:, :2] -= choice_box[:2]
+        current_boxes[:, 2:] -= choice_box[:2]
+        if choice_box[0] < 0 or choice_box[1] < 0:
+            new_img_width = width if choice_box[
+                0] >= 0 else width - choice_box[0]
+            new_img_height = height if choice_box[
+                1] >= 0 else height - choice_box[1]
+            image_pad = np.zeros(
+                (new_img_height, new_img_width, 3), dtype=float)
+            image_pad[:, :, :] = mean
+            start_left = 0 if choice_box[0] >= 0 else -choice_box[0]
+            start_top = 0 if choice_box[1] >= 0 else -choice_box[1]
+            image_pad[start_top:, start_left:, :] = image
+            choice_box_w = choice_box[2] - choice_box[0]
+            choice_box_h = choice_box[3] - choice_box[1]
+            start_left = choice_box[0] if choice_box[0] >= 0 else 0
+            start_top = choice_box[1] if choice_box[1] >= 0 else 0
+            end_right = start_left + choice_box_w
+            end_bottom = start_top + choice_box_h
+            current_image = image_pad[
+                start_top:end_bottom, start_left:end_right, :].copy()
+            image_height, image_width, _ = current_image.shape
+            if cfg.filter_min_face:
+                bbox_w = current_boxes[:, 2] - current_boxes[:, 0]
+                bbox_h = current_boxes[:, 3] - current_boxes[:, 1]
+                bbox_area_ = bbox_w * bbox_h
+                mask = bbox_area_ > (cfg.min_face_size * cfg.min_face_size)
+                current_boxes = current_boxes[mask]
+                current_labels = current_labels[mask]
+                for i in range(len(current_boxes)):
+                    sample_label = []
+                    sample_label.append(current_labels[i])
+                    sample_label.append(current_boxes[i][0] / image_width)
+                    sample_label.append(current_boxes[i][1] / image_height)
+                    sample_label.append(current_boxes[i][2] / image_width)
+                    sample_label.append(current_boxes[i][3] / image_height)
+                    sampled_labels += [sample_label]
+                sampled_labels = np.array(sampled_labels)
+            else:
+                current_boxes /= np.array([image_width,
+                                           image_height, image_width, image_height])
+                sampled_labels = np.hstack(
+                    (current_labels[:, np.newaxis], current_boxes))
+            return current_image, sampled_labels
+        current_image = image[choice_box[1]:choice_box[
+            3], choice_box[0]:choice_box[2], :].copy()
+        image_height, image_width, _ = current_image.shape
+        if cfg.filter_min_face:
+            bbox_w = current_boxes[:, 2] - current_boxes[:, 0]
+            bbox_h = current_boxes[:, 3] - current_boxes[:, 1]
+            bbox_area_ = bbox_w * bbox_h
+            mask = bbox_area_ > (cfg.min_face_size * cfg.min_face_size)
+            current_boxes = current_boxes[mask]
+            current_labels = current_labels[mask]
+            for i in range(len(current_boxes)):
+                sample_label = []
+                sample_label.append(current_labels[i])
+                sample_label.append(current_boxes[i][0] / image_width)
+                sample_label.append(current_boxes[i][1] / image_height)
+                sample_label.append(current_boxes[i][2] / image_width)
+                sample_label.append(current_boxes[i][3] / image_height)
+                sampled_labels += [sample_label]
+            sampled_labels = np.array(sampled_labels)
+        else:
+            current_boxes /= np.array([image_width,
+                                       image_height, image_width, image_height])
+            sampled_labels = np.hstack(
+                (current_labels[:, np.newaxis], current_boxes))
+        return current_image, sampled_labels
+    image_height, image_width, _ = image.shape
+    if cfg.filter_min_face:
+        bbox_w = boxes[:, 2] - boxes[:, 0]
+        bbox_h = boxes[:, 3] - boxes[:, 1]
+        bbox_area_ = bbox_w * bbox_h
+        mask = bbox_area_ > (cfg.min_face_size * cfg.min_face_size)
+        boxes = boxes[mask]
+        labels = labels[mask]
+        for i in range(len(boxes)):
+            sample_label = []
+            sample_label.append(labels[i])
+            sample_label.append(boxes[i][0] / image_width)
+            sample_label.append(boxes[i][1] / image_height)
+            sample_label.append(boxes[i][2] / image_width)
+            sample_label.append(boxes[i][3] / image_height)
+            sampled_labels += [sample_label]
+        sampled_labels = np.array(sampled_labels)
+    else:
+        boxes /= np.array([image_width, image_height,
+                           image_width, image_height])
+        sampled_labels = np.hstack(
+            (labels[:, np.newaxis], boxes))
+    return image, sampled_labels
+def preprocess(img, bbox_labels, mode):
+    img_width, img_height = img.size
+    sampled_labels = bbox_labels
+    if mode == 'train':
+        if cfg.apply_distort:
+            img = distort_image(img)
+        if cfg.apply_expand:
+            img, bbox_labels, img_width, img_height = expand_image(
+                img, bbox_labels, img_width, img_height)
+        batch_sampler = []
+        prob = np.random.uniform(0., 1.)
+        if prob > cfg.data_anchor_sampling_prob and cfg.anchor_sampling:
+            scale_array = np.array([16, 32, 64, 128, 256, 512])
+            img = np.array(img)
+            img, sampled_labels = anchor_crop_image_sampling(
+                img, bbox_labels, scale_array, img_width, img_height)
+            img = img.astype('uint8')
+            img = Image.fromarray(img)
+        else:
+            batch_sampler.append(Sampler(1, 50, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0,
+                                         0.0, True))
+            batch_sampler.append(Sampler(1, 50, 0.3, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0,
+                                         0.0, True))
+            batch_sampler.append(Sampler(1, 50, 0.3, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0,
+                                         0.0, True))
+            batch_sampler.append(Sampler(1, 50, 0.3, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0,
+                                         0.0, True))
+            batch_sampler.append(Sampler(1, 50, 0.3, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0,
+                                         0.0, True))
+            sampled_bbox = generate_batch_samples(
+                batch_sampler, bbox_labels, img_width, img_height)
+            img = np.array(img)
+            if sampled_bbox:
+                idx = int(np.random.uniform(0, len(sampled_bbox)))
+                img, sampled_labels = crop_image(
+                    img, bbox_labels, sampled_bbox[idx], img_width, img_height,
+                    cfg.resize_width, cfg.resize_height, cfg.min_face_size)
+            img = Image.fromarray(img)
+    interp_mode = [
+        Image.BILINEAR, Image.HAMMING, Image.NEAREST, Image.BICUBIC,
+        Image.LANCZOS
+    ]
+    interp_indx = np.random.randint(0, 5)
+    img = img.resize((cfg.resize_width, cfg.resize_height),
+                     resample=interp_mode[interp_indx])
+    img = np.array(img)
+    if mode == 'train':
+        mirror = int(np.random.uniform(0, 2))
+        if mirror == 1:
+            img = img[:, ::-1, :]
+            for i in six.moves.xrange(len(sampled_labels)):
+                tmp = sampled_labels[i][1]
+                sampled_labels[i][1] = 1 - sampled_labels[i][3]
+                sampled_labels[i][3] = 1 - tmp
+    img = to_chw_bgr(img)
+    img = img.astype('float32')
+    img -= cfg.img_mean
+    img = img[[2, 1, 0], :, :]  # to RGB
+    return img, sampled_labels
--- a/research/cv/PyramidBox/src/bbox_utils.py
+++ b/research/cv/PyramidBox/src/bbox_utils.py
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+# This file was copied from project [ZhaoWeicheng][Pyramidbox.pytorch]
+import numpy as np
+def point_form(boxes):
+    """ Convert prior_boxes to (xmin, ymin, xmax, ymax)
+    representation for comparison to point form ground truth data.
+    Args:
+        boxes: center-size default boxes from priorbox layers.
+    Return:
+        boxes: Converted xmin, ymin, xmax, ymax form of boxes.
+    """
+    return np.concatenate((boxes[:, :2] - boxes[:, 2:] / 2,
+                           boxes[:, :2] + boxes[:, 2:] / 2), 1)
+def center_size(boxes):
+    """ Convert prior_boxes to (cx, cy, w, h)
+    representation for comparison to center-size form ground truth data.
+    Args:
+        boxes: point_form boxes
+    Return:
+        boxes: Converted xmin, ymin, xmax, ymax form of boxes.
+    """
+    return np.concatenate([(boxes[:, 2:] + boxes[:, :2]) / 2,
+                           boxes[:, 2:] - boxes[:, :2]], 1)
+def intersect(box_a, box_b):
+    """ We resize both tensors to [A,B,2] without new malloc:
+    [A,2] -> [A,1,2] -> [A,B,2]
+    [B,2] -> [1,B,2] -> [A,B,2]
+    Then we compute the area of intersect between box_a and box_b.
+    Args:
+      box_a: bounding boxes, Shape: [A,4].
+      box_b: bounding boxes, Shape: [B,4].
+    Return:
+      intersection area, Shape: [A,B].
+    """
+    A = box_a.shape[0]
+    B = box_b.shape[0]
+    max_xy = np.minimum(np.broadcast_to(np.expand_dims(box_a[:, 2:], 1), (A, B, 2)),
+                        np.broadcast_to(np.expand_dims(box_b[:, 2:], 0), (A, B, 2)))
+    min_xy = np.maximum(np.broadcast_to(np.expand_dims(box_a[:, :2], 1), (A, B, 2)),
+                        np.broadcast_to(np.expand_dims(box_b[:, :2], 0), (A, B, 2)))
+    inter = np.clip((max_xy - min_xy), 0, np.inf)
+    return inter[:, :, 0] * inter[:, :, 1]
+def jaccard(box_a, box_b):
+    """Compute the jaccard overlap of two sets of boxes.  The jaccard overlap
+    is simply the intersection over union of two boxes.  Here we operate on
+    ground truth boxes and default boxes.
+    E.g.:
+        A ∩ B / A ∪ B = A ∩ B / (area(A) + area(B) - A ∩ B)
+    Args:
+        box_a: Ground truth bounding boxes, Shape: [num_objects,4]
+        box_b: Prior boxes from priorbox layers, Shape: [num_priors,4]
+    Return:
+        jaccard overlap: Shape: [box_a.size(0), box_b.size(0)]
+    """
+    inter = intersect(box_a, box_b)
+    area_a = ((box_a[:, 2] - box_a[:, 0]) *
+              (box_a[:, 3] - box_a[:, 1]))
+    area_a = np.expand_dims(area_a, 1)
+    area_a = np.broadcast_to(area_a, inter.shape)
+    area_b = ((box_b[:, 2] - box_b[:, 0]) *
+              (box_b[:, 3] - box_b[:, 1]))
+    area_b = np.expand_dims(area_b, 0)
+    area_b = np.broadcast_to(area_b, inter.shape)
+    union = area_a + area_b - inter
+    return inter / union
+def match(threshold, truths, priors, variances, labels, loc_t, conf_t, idx):
+    """Match each prior box with the ground truth box of the highest jaccard
+    overlap, encode the bounding boxes, then return the matched indices
+    corresponding to both confidence and location preds.
+    Args:
+        threshold: (float) The overlap threshold used when matching boxes.
+        truths: Ground truth boxes, Shape: [num_obj, num_priors].
+        priors: Prior boxes from priorbox layers, Shape: [n_priors,4].
+        variances: Variances corresponding to each prior coord,
+            Shape: [num_priors, 4].
+        labels: All the class labels for the image, Shape: [num_obj].
+        loc_t: Tensor to be filled w/ encoded location targets.
+        conf_t: Tensor to be filled w/ matched indices for conf preds.
+        idx: (int) current batch index
+    Return:
+        The matched indices corresponding to 1)location and 2)confidence preds.
+    """
+    overlaps = jaccard(truths, point_form(priors))
+    # best_prior_overlap, best_prior_idx = overlaps.max(1, keepdim=True)
+    best_prior_overlap = np.max(overlaps, 1, keepdims=True)
+    best_prior_idx = np.argmax(overlaps, 1)
+    best_truth_overlap = np.max(overlaps, 0, keepdims=True)
+    best_truth_idx = np.argmax(overlaps, 0)
+    best_truth_idx = np.squeeze(best_truth_idx, 0)
+    best_truth_overlap = np.squeeze(best_truth_overlap, 0)
+    best_prior_idx = np.squeeze(best_prior_idx, 1)
+    best_prior_overlap = np.squeeze(best_prior_overlap, 1)
+    for i in best_prior_idx:
+        best_truth_overlap[i, :] = 2
+    for j in range(best_prior_idx.shape[0]):
+        best_truth_idx[best_prior_idx[j]] = j
+    _th1, _th2, _th3 = threshold
+    N = (np.sum(best_prior_overlap >= _th2) +
+         np.sum(best_prior_overlap >= _th3)) // 2
+    matches = truths[best_truth_idx]
+    conf = labels[best_truth_idx]
+    conf[best_truth_overlap < _th2] = 0
+    best_truth_overlap_clone = best_truth_overlap.copy()
+    idx_1 = np.greater(best_truth_overlap_clone, _th1)
+    idx_2 = np.less(best_truth_overlap_clone, _th2)
+    add_idx = np.equal(idx_1, idx_2)
+    best_truth_overlap_clone[1 - add_idx] = 0
+    stage2_overlap = np.sort(best_truth_overlap_clone)[:, ::-1]
+    stage2_idx = np.argsort(best_truth_overlap_clone)[:, ::-1]
+    stage2_overlap = np.greater(stage2_overlap, _th1)
+    if N > 0:
+        N = np.sum(stage2_overlap[:N]) if np.sum(stage2_overlap[:N]) < N else N
+        conf[stage2_idx[:N]] += 1
+    loc = encode(matches, priors, variances)
+    loc_t[idx] = loc
+    conf_t[idx] = conf
+def match_ssd(threshold, truths, priors, variances, labels):
+    """Match each prior box with the ground truth box of the highest jaccard
+    overlap, encode the bounding boxes, then return the matched indices
+    corresponding to both confidence and location preds.
+    Args:
+        threshold: (float) The overlap threshold used when matching boxes.
+        truths: (tensor) Ground truth boxes, Shape: [num_obj, num_priors].
+        priors: (tensor) Prior boxes from priorbox layers, Shape: [n_priors,4].
+        variances: (tensor) Variances corresponding to each prior coord,
+            Shape: [num_priors, 4].
+        labels: (tensor) All the class labels for the image, Shape: [num_obj].
+        loc_t: (tensor) Tensor to be filled w/ encoded location targets.
+        conf_t: (tensor) Tensor to be filled w/ matched indices for conf preds.
+        idx: (int) current batch index
+    Return:
+        The matched indices corresponding to 1)location and 2)confidence preds.
+    """
+    overlaps = jaccard(truths, point_form(priors))
+    # best_prior_overlap, best_prior_idx = overlaps.max(1, keepdim=True)
+    best_prior_overlap = np.max(overlaps, 1, keepdims=True)
+    best_prior_idx = np.argmax(overlaps, 1)
+    best_truth_overlap = np.max(overlaps, 0, keepdims=True)
+    best_truth_idx = np.argmax(overlaps, 0)
+    best_truth_overlap = np.squeeze(best_truth_overlap, 0)
+    best_prior_overlap = np.squeeze(best_prior_overlap, 1)
+    for i in best_prior_idx:
+        best_truth_overlap[i] = 2
+    for j in range(best_prior_idx.shape[0]):
+        best_truth_idx[best_prior_idx[j]] = j
+    matches = truths[best_truth_idx]
+    conf = labels[best_truth_idx]
+    conf[best_truth_overlap < threshold] = 0
+    loc = encode(matches, priors, variances)
+    return loc, conf
+def encode(matched, priors, variances):
+    """Encode the variances from the priorbox layers into the ground truth boxes
+    we have matched (based on jaccard overlap) with the prior boxes.
+    Args:
+        matched: Coords of ground truth for each prior in point-form
+            Shape: [num_priors, 4].
+        priors: Prior boxes in center-offset form
+            Shape: [num_priors,4].
+        variances: (list[float]) Variances of priorboxes
+    Return:
+        encoded boxes (tensor), Shape: [num_priors, 4]
+    """
+    # dist b/t match center and prior's center
+    g_cxcy = (matched[:, :2] + matched[:, 2:]) / 2 - priors[:, :2]
+    # encode variance
+    g_cxcy /= (variances[0] * priors[:, 2:])
+    # match wh / prior wh
+    g_wh = (matched[:, 2:] - matched[:, :2]) / priors[:, 2:]
+    g_wh = np.log(g_wh) / variances[1]
+    # return target for smooth_l1_loss
+    return np.concatenate([g_cxcy, g_wh], 1)
+def decode(loc, priors, variances):
+    """Decode locations from predictions using priors to undo
+    the encoding we did for offset regression at train time.
+    Args:
+        loc: location predictions for loc layers,
+            Shape: [num_priors,4]
+        priors: Prior boxes in center-offset form.
+            Shape: [num_priors,4].
+        variances: (list[float]) Variances of priorboxes
+    Return:
+        decoded bounding box predictions
+    """
+    if priors.shape[0] == 1:
+        priors = priors[0, :, :]
+    boxes = np.concatenate((priors[:, :2] + loc[:, :2] * variances[0] * priors[:, 2:],
+                            priors[:, 2:] * np.exp(loc[:, 2:] * variances[1])), 1)
+    boxes[:, :2] -= boxes[:, 2:] / 2
+    boxes[:, 2:] += boxes[:, :2]
+    return boxes
+def log_sum_exp(x):
+    """Utility function for computing log_sum_exp while determining
+    This will be used to determine unaveraged confidence loss across
+    all examples in a batch.
+    Args:
+        x (Variable(tensor)): conf_preds from conf layers
+    """
+    x_max = x.max()
+    return np.log(np.sum(np.exp(x - x_max), 1, keepdim=True)) + x_max
+def nms(boxes, scores, overlap=0.5, top_k=200):
+    """Apply non-maximum suppression at test time to avoid detecting too many
+    overlapping bounding boxes for a given object.
+    Args:
+        boxes: The location preds for the img, Shape: [num_priors,4].
+        scores: The class predscores for the img, Shape:[num_priors].
+        overlap: The overlap thresh for suppressing unnecessary boxes.
+        top_k: The Maximum number of box preds to consider.
+    Return:
+        The indices of the kept boxes with respect to num_priors.
+    """
+    keep = np.zeros_like(scores).astype(np.int32)
+    if boxes.size == 0:
+        return keep, 0
+    x1 = boxes[:, 0]
+    y1 = boxes[:, 1]
+    x2 = boxes[:, 2]
+    y2 = boxes[:, 3]
+    area = np.multiply(x2 - x1, y2 - y1)
+    idx = np.argsort(scores, axis=0)
+    idx = idx[-top_k:]
+    count = 0
+    while idx.size > 0:
+        i = idx[-1]
+        keep[count] = i
+        count += 1
+        if idx.shape[0] == 1:
+            break
+        idx = idx[:-1]
+        xx1 = x1[idx]
+        yy1 = y1[idx]
+        xx2 = x2[idx]
+        yy2 = y2[idx]
+        xx1 = np.clip(xx1, x1[i], np.inf)
+        yy1 = np.clip(yy1, y1[i], np.inf)
+        xx2 = np.clip(xx2, -np.inf, x2[i])
+        yy2 = np.clip(yy2, -np.inf, y2[i])
+        w = xx2 - xx1
+        h = yy2 - yy1
+        w = np.clip(w, 0, np.inf)
+        h = np.clip(h, 0, np.inf)
+        inter = w * h
+        rem_areas = area[idx]
+        union = (rem_areas - inter) + area[i]
+        IoU = inter / union
+        idx = idx[np.less(IoU, overlap)]
+    return keep, count
--- a/research/cv/PyramidBox/src/config.py
+++ b/research/cv/PyramidBox/src/config.py
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+# This file was copied from project [ZhaoWeicheng][Pyramidbox.pytorch]
+import os
+from easydict import EasyDict
+import numpy as np
+_C = EasyDict()
+cfg = _C
+# data argument config
+_C.expand_prob = 0.5
+_C.expand_max_ratio = 4
+_C.hue_prob = 0.5
+_C.hue_delta = 18
+_C.contrast_prob = 0.5
+_C.contrast_delta = 0.5
+_C.saturation_prob = 0.5
+_C.saturation_delta = 0.5
+_C.brightness_prob = 0.5
+_C.brightness_delta = 0.125
+_C.data_anchor_sampling_prob = 0.5
+_C.min_face_size = 6.0
+_C.apply_distort = True
+_C.apply_expand = True
+_C.img_mean = np.array([104., 117., 123.])[:, np.newaxis, np.newaxis].astype('float32')
+_C.resize_width = 640
+_C.resize_height = 640
+_C.scale = 1 / 127.0
+_C.anchor_sampling = True
+_C.filter_min_face = True
+# train config
+_C.LR_STEPS = [80000, 100000, 120000]
+_C.DIS_LR_STEPS = [30000, 35000, 40000]
+# anchor config
+_C.FEATURE_MAPS = [[160, 160], [80, 80], [40, 40], [20, 20], [10, 10], [5, 5]]
+_C.INPUT_SIZE = (640, 640)
+_C.STEPS = [4, 8, 16, 32, 64, 128]
+_C.ANCHOR_SIZES = [16, 32, 64, 128, 256, 512]
+_C.CLIP = False
+_C.VARIANCE = [0.1, 0.2]
+# loss config
+_C.NUM_CLASSES = 2
+_C.OVERLAP_THRESH = 0.35
+_C.NEG_POS_RATIOS = 3
+# detection config
+_C.NMS_THRESH = 0.3
+_C.TOP_K = 5000
+_C.KEEP_TOP_K = 750
+_C.CONF_THRESH = 0.05
+# dataset config
+_C.HOME = '/data2/James/dataset/pyramidbox_dataset/'
+# face config
+_C.FACE = EasyDict()
+_C.FACE.FILE_DIR = os.path.dirname(os.path.realpath(__file__)) + '/../data'
+_C.FACE.TRAIN_FILE = os.path.join(_C.FACE.FILE_DIR, 'face_train.txt')
+_C.FACE.VAL_FILE = os.path.join(_C.FACE.FILE_DIR, 'face_val.txt')
+_C.FACE.FDDB_DIR = os.path.join(_C.HOME, 'FDDB')
+_C.FACE.WIDER_DIR = os.path.join(_C.HOME, 'WIDERFACE')
+_C.FACE.OVERLAP_THRESH = 0.35
--- a/research/cv/PyramidBox/src/dataset.py
+++ b/research/cv/PyramidBox/src/dataset.py
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+import random
+from PIL import Image
+import numpy as np
+from mindspore import dataset as ds
+from src.augmentations import preprocess
+from src.prior_box import PriorBox
+from src.bbox_utils import match_ssd
+from src.config import cfg
+class WIDERDataset:
+    """docstring for WIDERDetection"""
+    def __init__(self, list_file, mode='train'):
+        super(WIDERDataset, self).__init__()
+        self.mode = mode
+        self.fnames = []
+        self.boxes = []
+        self.labels = []
+        prior_box = PriorBox(cfg)
+        self.default_priors = prior_box.forward()
+        self.num_priors = self.default_priors.shape[0]
+        self.match = match_ssd
+        self.threshold = cfg.FACE.OVERLAP_THRESH
+        self.variance = cfg.VARIANCE
+        with open(list_file) as f:
+            lines = f.readlines()
+        for line in lines:
+            line = line.strip().split()
+            num_faces = int(line[1])
+            box = []
+            label = []
+            for i in range(num_faces):
+                x = float(line[2 + 5 * i])
+                y = float(line[3 + 5 * i])
+                w = float(line[4 + 5 * i])
+                h = float(line[5 + 5 * i])
+                c = int(line[6 + 5 * i])
+                if w <= 0 or h <= 0:
+                    continue
+                box.append([x, y, x + w, y + h])
+                label.append(c)
+            if box:
+                self.fnames.append(line[0])
+                self.boxes.append(box)
+                self.labels.append(label)
+        self.num_samples = len(self.boxes)
+    def __len__(self):
+        return self.num_samples
+    def __getitem__(self, index):
+        img, face_loc, face_conf, head_loc, head_conf = self.pull_item(index)
+        return img, face_loc, face_conf, head_loc, head_conf
+    def pull_item(self, index):
+        while True:
+            image_path = self.fnames[index]
+            img = Image.open(image_path)
+            if img.mode == 'L':
+                img = img.convert('RGB')
+            im_width, im_height = img.size
+            boxes = self.annotransform(np.array(self.boxes[index]), im_width, im_height)
+            label = np.array(self.labels[index])
+            bbox_labels = np.hstack((label[:, np.newaxis], boxes)).tolist()
+            img, sample_labels = preprocess(img, bbox_labels, self.mode)
+            sample_labels = np.array(sample_labels)
+            if sample_labels.size > 0:
+                face_target = np.hstack(
+                    (sample_labels[:, 1:], sample_labels[:, 0][:, np.newaxis]))
+                assert (face_target[:, 2] > face_target[:, 0]).any()
+                assert (face_target[:, 3] > face_target[:, 1]).any()
+                face_box = face_target[:, :-1]
+                head_box = self.expand_bboxes(face_box)
+                head_target = np.hstack((head_box, face_target[
+                                        :, -1][:, np.newaxis]))
+                break
+            else:
+                index = random.randrange(0, self.num_samples)
+        face_truth = face_target[:, :-1]
+        face_label = face_target[:, -1]
+        face_loc_t, face_conf_t = self.match(self.threshold, face_truth, self.default_priors,
+                                             self.variance, face_label)
+        head_truth = head_target[:, :-1]
+        head_label = head_target[:, -1]
+        head_loc_t, head_conf_t = self.match(self.threshold, head_truth, self.default_priors,
+                                             self.variance, head_label)
+        return img, face_loc_t, face_conf_t, head_loc_t, head_conf_t
+    def annotransform(self, boxes, im_width, im_height):
+        boxes[:, 0] /= im_width
+        boxes[:, 1] /= im_height
+        boxes[:, 2] /= im_width
+        boxes[:, 3] /= im_height
+        return boxes
+    def expand_bboxes(self,
+                      bboxes,
+                      expand_left=2.,
+                      expand_up=2.,
+                      expand_right=2.,
+                      expand_down=2.):
+        expand_bboxes = []
+        for bbox in bboxes:
+            xmin = bbox[0]
+            ymin = bbox[1]
+            xmax = bbox[2]
+            ymax = bbox[3]
+            w = xmax - xmin
+            h = ymax - ymin
+            ex_xmin = max(xmin - w / expand_left, 0.)
+            ex_ymin = max(ymin - h / expand_up, 0.)
+            ex_xmax = max(xmax + w / expand_right, 0.)
+            ex_ymax = max(ymax + h / expand_down, 0.)
+            expand_bboxes.append([ex_xmin, ex_ymin, ex_xmax, ex_ymax])
+        expand_bboxes = np.array(expand_bboxes)
+        return expand_bboxes
+def create_val_dataset(mindrecord_file, batch_size, device_num=1, device_id=0, num_workers=8):
+    """
+    Create user-defined mindspore dataset for training
+    """
+    column_names = ['img', 'face_loc', 'face_conf', 'head_loc', 'head_conf']
+    ds.config.set_num_parallel_workers(num_workers)
+    ds.config.set_enable_shared_mem(False)
+    ds.config.set_prefetch_size(batch_size * 2)
+    train_dataset = ds.MindDataset(mindrecord_file, columns_list=column_names, shuffle=True,
+                                   shard_id=device_id, num_shards=device_num)
+    train_dataset = train_dataset.batch(batch_size=batch_size, drop_remainder=True)
+    return train_dataset
+def create_train_dataset(cfg_, batch_size, device_num=1, device_id=0, num_workers=8):
+    """
+    Create user-defined mindspore dataset for training
+    """
+    column_names = ['img', 'face_loc', 'face_conf', 'head_loc', 'head_conf']
+    ds.config.set_num_parallel_workers(num_workers)
+    ds.config.set_enable_shared_mem(False)
+    ds.config.set_prefetch_size(batch_size * 2)
+    train_dataset = ds.GeneratorDataset(WIDERDataset(cfg_.FACE.TRAIN_FILE, mode='train'),
+                                        column_names=column_names, shuffle=True, num_shards=device_num,
+                                        shard_id=device_id)
+    train_dataset = train_dataset.batch(batch_size=batch_size)
+    return train_dataset
--- a/research/cv/PyramidBox/src/detection.py
+++ b/research/cv/PyramidBox/src/detection.py
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+import numpy as np
+from mindspore import Tensor
+from src.bbox_utils import decode, nms
+class Detect:
+    """At test time, Detect is the final layer of SSD.  Decode location preds,
+    apply non-maximum suppression to location predictions based on conf
+    scores and threshold to a top_k number of output predictions for both
+    confidence score and locations.
+    """
+    def __init__(self, cfg):
+        self.num_classes = cfg.NUM_CLASSES
+        self.top_k = cfg.TOP_K
+        self.nms_thresh = cfg.NMS_THRESH
+        self.conf_thresh = cfg.CONF_THRESH
+        self.variance = cfg.VARIANCE
+    def detect(self, loc_data, conf_data, prior_data):
+        """
+        Args:
+            loc_data: (Tensor) Loc preds from loc layers
+                Shape: [batch, num_priors*4]
+            conf_data: (Tensor) Shape: Conf preds from conf layers
+                Shape: [batch*num_priors, num_classes]
+            prior_data: Prior boxes and variances from priorbox layers
+                Shape: [1,num_priors,4]
+        """
+        if isinstance(loc_data, Tensor):
+            loc_data = loc_data.asnumpy()
+        if isinstance(conf_data, Tensor):
+            conf_data = conf_data.asnumpy()
+        num = loc_data.shape[0]
+        num_priors = prior_data.shape[0]
+        conf_preds = np.transpose(conf_data.reshape((num, num_priors, self.num_classes)), (0, 2, 1))
+        batch_priors = prior_data.reshape((-1, num_priors, 4))
+        batch_priors = np.broadcast_to(batch_priors, (num, num_priors, 4))
+        decoded_boxes = decode(loc_data.reshape((-1, 4)), batch_priors, self.variance).reshape((num, num_priors, 4))
+        output = np.zeros((num, self.num_classes, self.top_k, 5))
+        for i in range(num):
+            boxes = decoded_boxes[i].copy()
+            conf_scores = conf_preds[i].copy()
+            for cl in range(1, self.num_classes):
+                c_mask = np.greater(conf_scores[cl], self.conf_thresh)
+                scores = conf_scores[cl][c_mask]
+                if scores.ndim == 0:
+                    continue
+                l_mask = np.expand_dims(c_mask, 1)
+                l_mask = np.broadcast_to(l_mask, boxes.shape)
+                boxes_ = boxes[l_mask].reshape((-1, 4))
+                ids, count = nms(boxes_, scores, self.nms_thresh, self.top_k)
+                output[i, cl, :count] = np.concatenate((np.expand_dims(scores[ids[:count]], 1),
+                                                        boxes_[ids[:count]]), 1)
+        return output
--- a/research/cv/PyramidBox/src/evaluate.py
+++ b/research/cv/PyramidBox/src/evaluate.py
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+# This file was copied from project [RuisongZhou][FDDB_Evaluation]
+import os
+import argparse
+import tqdm
+import numpy as np
+import cv2
+def bbox_overlaps(boxes, query_boxes):
+    """
+    Parameters
+    ----------
+    boxes: (N, 4) ndarray of float
+    query_boxes: (K, 4) ndarray of float
+    Returns
+    -------
+    overlaps: (N, K) ndarray of overlap between boxes and query_boxes
+    """
+    N = boxes.shape[0]
+    K = query_boxes.shape[0]
+    overlaps = np.zeros((N, K), dtype=np.float32)
+    for k in range(K):
+        box_area = (
+            (query_boxes[k, 2] - query_boxes[k, 0] + 1) *
+            (query_boxes[k, 3] - query_boxes[k, 1] + 1)
+        )
+        for n in range(N):
+            iw = (
+                min(boxes[n, 2], query_boxes[k, 2]) -
+                max(boxes[n, 0], query_boxes[k, 0]) + 1
+            )
+            if iw > 0:
+                ih = (
+                    min(boxes[n, 3], query_boxes[k, 3]) -
+                    max(boxes[n, 1], query_boxes[k, 1]) + 1
+                )
+                if ih > 0:
+                    ua = float(
+                        (boxes[n, 2] - boxes[n, 0] + 1) *
+                        (boxes[n, 3] - boxes[n, 1] + 1) +
+                        box_area - iw * ih
+                    )
+                    overlaps[n, k] = iw * ih / ua
+    return overlaps
+def get_gt_boxes(gt_dir):
+    gt_dict = {}
+    for i in range(1, 11):
+        filename = os.path.join(gt_dir, 'FDDB-fold-{}-ellipseList.txt'.format('%02d' % i))
+        assert os.path.exists(filename)
+        gt_sub_dict = {}
+        annotationfile = open(filename)
+        while True:
+            filename = annotationfile.readline()[:-1].replace('/', '_')
+            if not filename:
+                break
+            line = annotationfile.readline()
+            if not line:
+                break
+            facenum = int(line)
+            face_loc = []
+            for _ in range(facenum):
+                line = annotationfile.readline().strip().split()
+                major_axis_radius = float(line[0])
+                minor_axis_radius = float(line[1])
+                angle = float(line[2])
+                center_x = float(line[3])
+                center_y = float(line[4])
+                _ = float(line[5])
+                angle = angle / 3.1415926 * 180
+                mask = np.zeros((1000, 1000), dtype=np.uint8)
+                cv2.ellipse(mask, ((int)(center_x), (int)(center_y)),
+                            ((int)(major_axis_radius), (int)(minor_axis_radius)), angle, 0., 360., (255, 255, 255))
+                contours, _ = cv2.findContours(mask, cv2.RETR_LIST, cv2.CHAIN_APPROX_NONE)[-2:]
+                r = cv2.boundingRect(contours[0])
+                x_min = r[0]
+                y_min = r[1]
+                x_max = r[0] + r[2]
+                y_max = r[1] + r[3]
+                face_loc.append([x_min, y_min, x_max, y_max])
+            face_loc = np.array(face_loc)
+            gt_sub_dict[filename] = face_loc
+        gt_dict[i] = gt_sub_dict
+    return gt_dict
+def read_pred_file(filepath):
+    with open(filepath, 'r') as f:
+        lines = f.readlines()
+        img_file = lines[0].rstrip('\n')
+        lines = lines[2:]
+    boxes = []
+    for line in lines:
+        line = line.rstrip('\n').split(' ')
+        if line[0] == '':
+            continue
+        boxes.append([float(line[0]), float(line[1]), float(line[2]), float(line[3]), float(line[4])])
+    boxes = np.array(boxes)
+    return img_file.split('/')[-1], boxes
+def get_preds_box(pred_dir):
+    events = os.listdir(pred_dir)
+    boxes = dict()
+    pbar = tqdm.tqdm(events)
+    for event in pbar:
+        pbar.set_description('Reading Predictions Boxes')
+        event_dir = os.path.join(pred_dir, event)
+        event_images = os.listdir(event_dir)
+        current_event = dict()
+        for imgtxt in event_images:
+            imgname, _boxes = read_pred_file(os.path.join(event_dir, imgtxt))
+            current_event[imgname.rstrip('.jpg')] = _boxes
+        boxes[event] = current_event
+    return boxes
+def norm_score(pred):
+    """ norm score
+    pred {key: [[x1,y1,x2,y2,s]]}
+    """
+    max_score = 0
+    min_score = 1
+    for _, k in pred.items():
+        for _, v in k.items():
+            if v.size == 0:
+                continue
+            _min = np.min(v[:, -1])
+            _max = np.max(v[:, -1])
+            max_score = max(_max, max_score)
+            min_score = min(_min, min_score)
+    diff = max_score - min_score
+    for _, k in pred.items():
+        for _, v in k.items():
+            if v.size:
+                continue
+            v[:, -1] = (v[:, -1] - min_score) / diff
+def image_eval(pred, gt, ignore, iou_thresh):
+    """ single image evaluation
+    pred: Nx5
+    gt: Nx4
+    ignore:
+    """
+    _pred = pred.copy()
+    _gt = gt.copy()
+    pred_recall = np.zeros(_pred.shape[0])
+    recall_list = np.zeros(_gt.shape[0])
+    proposal_list = np.ones(_pred.shape[0])
+    _pred[:, 2] = _pred[:, 2] + _pred[:, 0]
+    _pred[:, 3] = _pred[:, 3] + _pred[:, 1]
+    overlaps = bbox_overlaps(_pred[:, :4], _gt)
+    for h in range(_pred.shape[0]):
+        gt_overlap = overlaps[h]
+        max_overlap, max_idx = gt_overlap.max(), gt_overlap.argmax()
+        if max_overlap >= iou_thresh:
+            if ignore[max_idx] == 0:
+                recall_list[max_idx] = -1
+                proposal_list[h] = -1
+            elif recall_list[max_idx] == 0:
+                recall_list[max_idx] = 1
+        r_keep_index = np.where(recall_list == 1)[0]
+        pred_recall[h] = len(r_keep_index)
+    return pred_recall, proposal_list
+def img_pr_info(thresh_num, pred_info, proposal_list, pred_recall):
+    pr_info = np.zeros((thresh_num, 2)).astype('float')
+    for t in range(thresh_num):
+        thresh = 1 - (t + 1) / thresh_num
+        r_index = np.where(pred_info[:, 4] >= thresh)[0]
+        if r_index.size == 0:
+            pr_info[t, 0] = 0
+            pr_info[t, 1] = 0
+        else:
+            r_index = r_index[-1]
+            p_index = np.where(proposal_list[:r_index + 1] == 1)[0]
+            pr_info[t, 0] = len(p_index)
+            pr_info[t, 1] = pred_recall[r_index]
+    return pr_info
+def dataset_pr_info(thresh_num, pr_curve, count_face):
+    _pr_curve = np.zeros((thresh_num, 2))
+    for i in range(thresh_num):
+        _pr_curve[i, 0] = pr_curve[i, 1] / pr_curve[i, 0]
+        _pr_curve[i, 1] = pr_curve[i, 1] / count_face
+    return _pr_curve
+def voc_ap(rec, prec):
+    # correct AP calculation
+    # first append sentinel values at the end
+    mrec = np.concatenate(([0.], rec, [1.]))
+    mpre = np.concatenate(([0.], prec, [0.]))
+    # compute the precision envelope
+    for i in range(mpre.size - 1, 0, -1):
+        mpre[i - 1] = np.maximum(mpre[i - 1], mpre[i])
+    # to calculate area under PR curve, look for points
+    # where X axis (recall) changes value
+    i = np.where(mrec[1:] != mrec[:-1])[0]
+    # and sum (\Delta recall) * prec
+    ap = np.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1])
+    return ap
+def evaluation(pred, gt_path, iou_thresh=0.5):
+    pred = get_preds_box(pred)
+    norm_score(pred)
+    gt_box_dict = get_gt_boxes(gt_path)
+    event = list(pred.keys())
+    event = [int(e) for e in event]
+    event.sort()
+    thresh_num = 1000
+    aps = []
+    pbar = tqdm.tqdm(range(len(event)))
+    for setting_id in pbar:
+        pbar.set_description('Predicting ... ')
+        # different setting
+        count_face = 0
+        pr_curve = np.zeros((thresh_num, 2)).astype('float')
+        gt = gt_box_dict[event[setting_id]]
+        pred_list = pred[str(event[setting_id])]
+        gt_list = list(gt.keys())
+        for j in range(len(gt_list)):
+            gt_boxes = gt[gt_list[j]].astype('float')  # from image name get gt boxes
+            pred_info = pred_list[gt_list[j]]
+            keep_index = np.array(range(1, len(gt_boxes) + 1))
+            count_face += len(keep_index)
+            ignore = np.zeros(gt_boxes.shape[0])
+            if gt_boxes.size == 0 or pred_info.size == 0:
+                continue
+            if keep_index.size != 0:
+                ignore[keep_index - 1] = 1
+            pred_recall, proposal_list = image_eval(pred_info, gt_boxes, ignore, iou_thresh)
+            _img_pr_info = img_pr_info(thresh_num, pred_info, proposal_list, pred_recall)
+            pr_curve += _img_pr_info
+        pr_curve = dataset_pr_info(thresh_num, pr_curve, count_face)
+        propose = pr_curve[:, 0]
+        recall = pr_curve[:, 1]
+        ap = voc_ap(recall, propose)
+        aps.append(ap)
+    print("==================== Results ====================")
+    for i in range(len(aps)):
+        print("FDDB-fold-{} Val AP: {}".format(event[i], aps[i]))
+    print("FDDB Dataset Average AP: {}".format(sum(aps)/len(aps)))
+    print("=================================================")
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--pred')
+    parser.add_argument('--gt')
+    args = parser.parse_args()
+    evaluation(args.pred, args.gt)
--- a/research/cv/PyramidBox/src/loss.py
+++ b/research/cv/PyramidBox/src/loss.py
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+from mindspore import nn, Tensor, ops
+from mindspore import dtype as mstype
+from mindspore import numpy as mnp
+from src.config import cfg
+class MultiBoxLoss(nn.Cell):
+    """SSD Weighted Loss Function
+    """
+    def __init__(self, use_head_loss=False):
+        super(MultiBoxLoss, self).__init__()
+        self.use_head_loss = use_head_loss
+        self.num_classes = cfg.NUM_CLASSES
+        self.negpos_ratio = cfg.NEG_POS_RATIOS
+        self.cast = ops.Cast()
+        self.sum = ops.ReduceSum()
+        self.loc_loss = nn.SmoothL1Loss()
+        self.cls_loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True)
+        self.sort_descending = ops.Sort(descending=True)
+        self.stack = ops.Stack(axis=1)
+        self.unsqueeze = ops.ExpandDims()
+        self.gather = ops.GatherNd()
+    def construct(self, predictions, targets):
+        """Multibox Loss"""
+        if self.use_head_loss:
+            _, _, loc_data, conf_data = predictions
+        else:
+            loc_data, conf_data, _, _ = predictions
+        loc_t, conf_t = targets
+        loc_data = self.cast(loc_data, mstype.float32)
+        conf_data = self.cast(conf_data, mstype.float32)
+        loc_t = self.cast(loc_t, mstype.float32)
+        conf_t = self.cast(conf_t, mstype.int32)
+        batch_size, box_num, _ = conf_data.shape
+        mask = self.cast(conf_t > 0, mstype.float32)
+        pos_num = self.sum(mask, 1)
+        loc_loss = self.sum(self.loc_loss(loc_data, loc_t), 2)
+        loc_loss = self.sum(mask * loc_loss)
+        # Hard Negative Mining
+        con = self.cls_loss(conf_data.view(-1, self.num_classes), conf_t.view(-1))
+        con = con.view(batch_size, -1)
+        con_neg = con * (1 - mask)
+        value, _ = self.sort_descending(con_neg)
+        neg_num = self.cast(ops.minimum(self.negpos_ratio * pos_num, box_num), mstype.int32)
+        batch_iter = Tensor(mnp.arange(batch_size), dtype=mstype.int32)
+        neg_index = self.stack((batch_iter, neg_num))
+        min_neg_score = self.unsqueeze(self.gather(value, neg_index), 1)
+        neg_mask = self.cast(con_neg > min_neg_score, mstype.float32)
+        all_mask = mask + neg_mask
+        all_mask = ops.stop_gradient(all_mask)
+        cls_loss = self.sum(con * all_mask)
+        N = self.sum(pos_num)
+        N = ops.maximum(self.cast(N, mstype.float32), 0.25)
+        loc_loss /= N
+        cls_loss /= N
+        return loc_loss, cls_loss
--- a/research/cv/PyramidBox/src/prior_box.py
+++ b/research/cv/PyramidBox/src/prior_box.py
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+# This file was copied from project [ZhaoWeicheng][Pyramidbox.pytorch]
+from itertools import product
+import numpy as np
+class PriorBox:
+    """Compute priorbox coordinates in center-offset form for each source
+    feature map.
+    """
+    def __init__(self, cfg, feature_maps=None, input_size=(640, 640), phase='train'):
+        self.imh = input_size[0]
+        self.imw = input_size[1]
+        # number of priors for feature map location (either 4 or 6)
+        self.variance = cfg.VARIANCE or [0.1]
+        if phase == 'train':
+            self.feature_maps = cfg.FEATURE_MAPS
+        else:
+            self.feature_maps = feature_maps
+        self.min_sizes = cfg.ANCHOR_SIZES
+        self.steps = cfg.STEPS
+        self.clip = cfg.CLIP
+        for v in self.variance:
+            if v <= 0:
+                raise ValueError('Variances must be greater than 0')
+    def forward(self):
+        mean = []
+        for k in range(len(self.feature_maps)):
+            feath = self.feature_maps[k][0]
+            featw = self.feature_maps[k][1]
+            for i, j in product(range(feath), range(featw)):
+                f_kw = self.imw / self.steps[k]
+                f_kh = self.imh / self.steps[k]
+                cx = (j + 0.5) / f_kw
+                cy = (i + 0.5) / f_kh
+                s_kw = self.min_sizes[k] / self.imw
+                s_kh = self.min_sizes[k] / self.imh
+                mean += [cx, cy, s_kw, s_kh]
+        output = np.array(mean).reshape(-1, 4)
+        if self.clip:
+            output = np.clip(output, 0, 1)
+        return output
--- a/research/cv/PyramidBox/src/pyramidbox.py
+++ b/research/cv/PyramidBox/src/pyramidbox.py
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+# This file was copied from project [ZhaoWeicheng][Pyramidbox.pytorch]
+from mindspore import nn, ops, Parameter, Tensor
+from mindspore.common import initializer
+from mindspore import dtype as mstype
+from src.loss import MultiBoxLoss
+class L2Norm(nn.Cell):
+    def __init__(self, n_channles, scale):
+        super(L2Norm, self).__init__()
+        self.n_channels = n_channles
+        self.gamma = scale or None
+        self.eps = 1e-10
+        self.weight = Parameter(Tensor(shape=(self.n_channels), init=initializer.Constant(value=self.gamma),
+                                       dtype=mstype.float32))
+        self.pow = ops.Pow()
+        self.sum = ops.ReduceSum()
+        self.div = ops.Div()
+    def construct(self, x):
+        norm = self.pow(x, 2).sum(axis=1, keepdims=True)
+        norm = ops.sqrt(norm) + self.eps
+        x = self.div(x, norm)
+        out = self.weight[None, :][:, :, None][:, :, :, None].expand_as(x) * x
+        return out
+class ConvBn(nn.Cell):
+    """docstring for conv"""
+    def __init__(self,
+                 in_plane,
+                 out_plane,
+                 kernel_size,
+                 stride,
+                 padding):
+        super(ConvBn, self).__init__()
+        self.conv1 = nn.Conv2d(in_plane, out_plane, kernel_size, stride, pad_mode='pad',
+                               padding=padding, has_bias=True, weight_init='xavier_uniform')
+        self.bn1 = nn.BatchNorm2d(out_plane)
+    def construct(self, x):
+        x = self.conv1(x)
+        return self.bn1(x)
+class CPM(nn.Cell):
+    """docstring for CPM"""
+    def __init__(self, in_plane):
+        super(CPM, self).__init__()
+        self.branch1 = ConvBn(in_plane, 1024, 1, 1, 0)
+        self.branch2a = ConvBn(in_plane, 256, 1, 1, 0)
+        self.branch2b = ConvBn(256, 256, 3, 1, 1)
+        self.branch2c = ConvBn(256, 1024, 1, 1, 0)
+        self.relu = nn.ReLU()
+        self.ssh_1 = nn.Conv2d(1024, 256, kernel_size=3, stride=1, pad_mode='pad', padding=1,
+                               has_bias=True, weight_init='xavier_uniform')
+        self.ssh_dimred = nn.Conv2d(1024, 128, kernel_size=3, stride=1, pad_mode='pad', padding=1,
+                                    has_bias=True, weight_init='xavier_uniform')
+        self.ssh_2 = nn.Conv2d(128, 128, kernel_size=3, stride=1, pad_mode='pad', padding=1,
+                               has_bias=True, weight_init='xavier_uniform')
+        self.ssh_3a = nn.Conv2d(128, 128, kernel_size=3, stride=1, pad_mode='pad', padding=1, has_bias=True,
+                                weight_init='xavier_uniform')
+        self.ssh_3b = nn.Conv2d(128, 128, kernel_size=3, stride=1, pad_mode='pad', padding=1,
+                                has_bias=True, weight_init='xavier_uniform')
+        self.cat = ops.Concat(1)
+    def construct(self, x):
+        out_residual = self.branch1(x)
+        x = self.relu(self.branch2a(x))
+        x = self.relu(self.branch2b(x))
+        x = self.branch2c(x)
+        rescomb = self.relu(x + out_residual)
+        ssh1 = self.ssh_1(rescomb)
+        ssh_dimred = self.relu(self.ssh_dimred(rescomb))
+        ssh_2 = self.ssh_2(ssh_dimred)
+        ssh_3a = self.relu(self.ssh_3a(ssh_dimred))
+        ssh_3b = self.ssh_3b(ssh_3a)
+        ssh_out = self.cat((ssh1, ssh_2, ssh_3b))
+        ssh_out = self.relu(ssh_out)
+        return ssh_out
+class PyramidBox(nn.Cell):
+    """docstring for PyramidBox"""
+    def __init__(self,
+                 phase,
+                 base,
+                 extras,
+                 lfpn_cpm,
+                 head,
+                 num_classes):
+        super(PyramidBox, self).__init__()
+        self.vgg = nn.CellList(base)
+        self.extras = nn.CellList(extras)
+        self.num_classes = num_classes
+        self.L2Norm3_3 = L2Norm(256, 10)
+        self.L2Norm4_3 = L2Norm(512, 8)
+        self.L2Norm5_3 = L2Norm(512, 5)
+        self.lfpn_topdown = nn.CellList(lfpn_cpm[0])
+        self.lfpn_later = nn.CellList(lfpn_cpm[1])
+        self.cpm = nn.CellList(lfpn_cpm[2])
+        self.loc_layers = nn.CellList(head[0])
+        self.conf_layers = nn.CellList(head[1])
+        self.relu = nn.ReLU()
+        self.concat = ops.Concat(1)
+        self.is_infer = False
+        if phase == 'test':
+            self.softmax = nn.Softmax(axis=-1)
+            self.is_infer = True
+    def _upsample_prod(self, x, y):
+        _, _, H, W = y.shape
+        resize_bilinear = nn.ResizeBilinear()
+        result = resize_bilinear(x, size=(H, W), align_corners=True) * y
+        return result
+    def construct(self, x):
+        # apply vgg up to conv3_3 relu
+        for k in range(16):
+            x = self.vgg[k](x)
+        conv3_3 = x
+        # apply vgg up to conv4_3
+        for k in range(16, 23):
+            x = self.vgg[k](x)
+        conv4_3 = x
+        for k in range(23, 30):
+            x = self.vgg[k](x)
+        conv5_3 = x
+        for k in range(30, len(self.vgg)):
+            x = self.vgg[k](x)
+        convfc_7 = x
+        # apply extra layers and cache source layer outputs
+        for k in range(2):
+            x = self.relu(self.extras[k](x))
+        conv6_2 = x
+        for k in range(2, 4):
+            x = self.relu(self.extras[k](x))
+        conv7_2 = x
+        x = self.relu(self.lfpn_topdown[0](convfc_7))
+        lfpn2_on_conv5 = self.relu(self._upsample_prod(
+            x, self.lfpn_later[0](conv5_3)))
+        x = self.relu(self.lfpn_topdown[1](lfpn2_on_conv5))
+        lfpn1_on_conv4 = self.relu(self._upsample_prod(
+            x, self.lfpn_later[1](conv4_3)))
+        x = self.relu(self.lfpn_topdown[2](lfpn1_on_conv4))
+        lfpn0_on_conv3 = self.relu(self._upsample_prod(
+            x, self.lfpn_later[2](conv3_3)))
+        ssh_conv3_norm = self.cpm[0](self.L2Norm3_3(lfpn0_on_conv3))
+        ssh_conv4_norm = self.cpm[1](self.L2Norm4_3(lfpn1_on_conv4))
+        ssh_conv5_norm = self.cpm[2](self.L2Norm5_3(lfpn2_on_conv5))
+        ssh_convfc7 = self.cpm[3](convfc_7)
+        ssh_conv6 = self.cpm[4](conv6_2)
+        ssh_conv7 = self.cpm[5](conv7_2)
+        face_locs, face_confs = [], []
+        head_locs, head_confs = [], []
+        N = ssh_conv3_norm.shape[0]
+        mbox_loc = self.loc_layers[0](ssh_conv3_norm)
+        face_loc, head_loc = ops.Split(axis=1, output_num=2)(mbox_loc)
+        face_loc = ops.Transpose()(face_loc, (0, 2, 3, 1)).view(N, -1, 4)
+        if not self.is_infer:
+            head_loc = ops.Transpose()(head_loc, (0, 2, 3, 1)).view(N, -1, 4)
+        mbox_conf = self.conf_layers[0](ssh_conv3_norm)
+        face_conf1 = mbox_conf[:, 3:4, :, :]
+        _, face_conf3_maxin = ops.ArgMaxWithValue(axis=1, keep_dims=True)(mbox_conf[:, 0:3, :, :])
+        face_conf = self.concat((face_conf3_maxin, face_conf1))
+        face_conf = ops.Transpose()(face_conf, (0, 2, 3, 1)).view(N, -1, 2)
+        head_conf = None
+        if not self.is_infer:
+            _, head_conf3_maxin = ops.ArgMaxWithValue(axis=1, keep_dims=True)(mbox_conf[:, 4:7, :, :])
+            head_conf1 = mbox_conf[:, 7:, :, :]
+            head_conf = self.concat((head_conf3_maxin, head_conf1))
+            head_conf = ops.Transpose()(head_conf, (0, 2, 3, 1)).view(N, -1, 2)
+        face_locs.append(face_loc)
+        face_confs.append(face_conf)
+        if not self.is_infer:
+            head_locs.append(head_loc)
+            head_confs.append(head_conf)
+        inputs = [ssh_conv4_norm, ssh_conv5_norm,
+                  ssh_convfc7, ssh_conv6, ssh_conv7]
+        feature_maps = []
+        feat_size = ssh_conv3_norm.shape[2:]
+        feature_maps.append([feat_size[0], feat_size[1]])
+        for i, feat in enumerate(inputs):
+            feat_size = feat.shape[2:]
+            feature_maps.append([feat_size[0], feat_size[1]])
+            mbox_loc = self.loc_layers[i + 1](feat)
+            face_loc, head_loc = ops.Split(axis=1, output_num=2)(mbox_loc)
+            face_loc = ops.Transpose()(face_loc, (0, 2, 3, 1)).view(N, -1, 4)
+            if not self.is_infer:
+                head_loc = ops.Transpose()(head_loc, (0, 2, 3, 1)).view(N, -1, 4)
+            mbox_conf = self.conf_layers[i + 1](feat)
+            face_conf1 = mbox_conf[:, 0:1, :, :]
+            _, face_conf3_maxin = ops.ArgMaxWithValue(axis=1, keep_dims=True)(mbox_conf[:, 1:4, :, :])
+            face_conf = self.concat((face_conf1, face_conf3_maxin))
+            face_conf = ops.Transpose()(face_conf, (0, 2, 3, 1)).ravel().view(N, -1, 2)
+            if not self.is_infer:
+                head_conf = ops.Transpose()(mbox_conf[:, 4:, :, :], (0, 2, 3, 1)).view(N, -1, 2)
+            face_locs.append(face_loc)
+            face_confs.append(face_conf)
+            if not self.is_infer:
+                head_locs.append(head_loc)
+                head_confs.append(head_conf)
+        face_mbox_loc = self.concat(face_locs)
+        face_mbox_conf = self.concat(face_confs)
+        head_mbox_loc, head_mbox_conf = None, None
+        if not self.is_infer:
+            head_mbox_loc = self.concat(head_locs)
+            head_mbox_conf = self.concat(head_confs)
+        if not self.is_infer:
+            output = (face_mbox_loc, face_mbox_conf, head_mbox_loc, head_mbox_conf)
+        else:
+            output = (face_mbox_loc, self.softmax(face_mbox_conf), feature_maps)
+        return output
+vgg_cfg = [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M',
+           512, 512, 512, 'M']
+extras_cfg = [256, 'S', 512, 128, 'S', 256]
+lfpn_cpm_cfg = [256, 512, 512, 1024, 512, 256]
+multibox_cfg = [512, 512, 512, 512, 512, 512]
+def vgg_(cfg, i, batch_norm=False):
+    layers = []
+    in_channels = i
+    for v in cfg:
+        if v == 'M':
+            layers += [nn.MaxPool2d(kernel_size=2, stride=2)]
+        elif v == 'C':
+            layers += [nn.MaxPool2d(kernel_size=2, stride=2)]
+        else:
+            conv2d = nn.Conv2d(in_channels, v, kernel_size=3, pad_mode='pad', padding=1,
+                               has_bias=True, weight_init='xavier_uniform')
+            if batch_norm:
+                layers += [conv2d, nn.BatchNorm2d(v), nn.ReLU()]
+            else:
+                layers += [conv2d, nn.ReLU()]
+            in_channels = v
+    conv6 = nn.Conv2d(512, 1024, kernel_size=3, pad_mode='pad', padding=6,
+                      dilation=6, has_bias=True, weight_init='xavier_uniform')
+    conv7 = nn.Conv2d(1024, 1024, kernel_size=1, has_bias=True, weight_init='xavier_uniform')
+    layers += [conv6, nn.ReLU(), conv7, nn.ReLU()]
+    return layers
+def add_extras(cfg, i):
+    # Extra layers added to VGG for feature scaling
+    layers = []
+    in_channels = i
+    flag = False
+    for k, v in enumerate(cfg):
+        if in_channels != 'S':
+            if v == 'S':
+                layers += [nn.Conv2d(in_channels, cfg[k + 1], kernel_size=(1, 3)[flag], stride=2,
+                                     pad_mode='pad', padding=1, has_bias=True, weight_init='xavier_uniform')]
+            else:
+                layers += [nn.Conv2d(in_channels, v, kernel_size=(1, 3)[flag],
+                                     has_bias=True, weight_init='xavier_uniform')]
+            flag = not flag
+        in_channels = v
+    return layers
+def add_lfpn_cpm(cfg):
+    lfpn_topdown_layers = []
+    lfpn_latlayer = []
+    cpm_layers = []
+    for k, v in enumerate(cfg):
+        cpm_layers.append(CPM(v))
+    fpn_list = cfg[::-1][2:]
+    for k, v in enumerate(fpn_list[:-1]):
+        lfpn_latlayer.append(nn.Conv2d(fpn_list[k + 1], fpn_list[k + 1], kernel_size=1,
+                                       stride=1, padding=0, has_bias=True, weight_init='xavier_uniform'))
+        lfpn_topdown_layers.append(nn.Conv2d(v, fpn_list[k + 1], kernel_size=1, stride=1,
+                                             padding=0, has_bias=True, weight_init='xavier_uniform'))
+    return (lfpn_topdown_layers, lfpn_latlayer, cpm_layers)
+def multibox(vgg, extra_layers):
+    loc_layers = []
+    conf_layers = []
+    vgg_source = [21, 28, -2]
+    i = 0
+    loc_layers += [nn.Conv2d(multibox_cfg[i], 8, kernel_size=3, pad_mode='pad', padding=1,
+                             has_bias=True, weight_init='xavier_uniform')]
+    conf_layers += [nn.Conv2d(multibox_cfg[i], 8, kernel_size=3, pad_mode='pad', padding=1,
+                              has_bias=True, weight_init='xavier_uniform')]
+    i += 1
+    for _, _ in enumerate(vgg_source):
+        loc_layers += [nn.Conv2d(multibox_cfg[i], 8, kernel_size=3, pad_mode='pad', padding=1,
+                                 has_bias=True, weight_init='xavier_uniform')]
+        conf_layers += [nn.Conv2d(multibox_cfg[i], 6, kernel_size=3, pad_mode='pad', padding=1,
+                                  has_bias=True, weight_init='xavier_uniform')]
+        i += 1
+    for _, _ in enumerate(extra_layers[1::2], 2):
+        loc_layers += [nn.Conv2d(multibox_cfg[i], 8, kernel_size=3, pad_mode='pad',
+                                 padding=1, has_bias=True, weight_init='xavier_uniform')]
+        conf_layers += [nn.Conv2d(multibox_cfg[i], 6, kernel_size=3, pad_mode='pad', padding=1,
+                                  has_bias=True, weight_init='xavier_uniform')]
+        i += 1
+    return vgg, extra_layers, (loc_layers, conf_layers)
+def build_net(phase, num_classes=2):
+    base_, extras_, head_ = multibox(vgg_(vgg_cfg, 3), add_extras((extras_cfg), 1024))
+    lfpn_cpm = add_lfpn_cpm(lfpn_cpm_cfg)
+    return PyramidBox(phase, base_, extras_, lfpn_cpm, head_, num_classes)
+class NetWithLoss(nn.Cell):
+    def __init__(self, net):
+        super(NetWithLoss, self).__init__()
+        self.net = net
+        self.loss_fn_1 = MultiBoxLoss()
+        self.loss_fn_2 = MultiBoxLoss(use_head_loss=True)
+    def construct(self, images, face_loc, face_conf, head_loc, head_conf):
+        out = self.net(images)
+        face_loss_l, face_loss_c = self.loss_fn_1(out, (face_loc, face_conf))
+        head_loss_l, head_loss_c = self.loss_fn_2(out, (head_loc, head_conf))
+        loss = face_loss_l + face_loss_c + head_loss_l + head_loss_c
+        return loss
+class EvalLoss(nn.Cell):
+    """
+    Calculate loss value while training.
+    """
+    def __init__(self, net):
+        super(EvalLoss, self).__init__()
+        self.net = net
+        self.loss_fn = MultiBoxLoss()
+    def construct(self, images, face_loc, face_conf):
+        out = self.net(images)
+        face_loss_l, face_loss_c = self.loss_fn(out, (face_loc, face_conf))
+        loss = face_loss_l + face_loss_c
+        return loss
--- a/research/cv/PyramidBox/train.py
+++ b/research/cv/PyramidBox/train.py
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+import argparse
+import os
+import time
+from mindspore import context, nn
+from mindspore.common import set_seed
+from mindspore import save_checkpoint, load_checkpoint, load_param_into_net
+from mindspore.communication import management as D
+from mindspore.communication.management import get_group_size, get_rank
+from src.pyramidbox import build_net, NetWithLoss, EvalLoss
+from src.dataset import create_val_dataset, create_train_dataset
+from src.config import cfg
+MIN_LOSS = 10000
+def parse_args():
+    parser = argparse.ArgumentParser(description='Pyramidbox face Detector Training With MindSpore')
+    parser.add_argument('--basenet', default='vgg16.ckpt', help='Pretrained base model')
+    parser.add_argument('--batch_size', default=4, type=int, help='Batch size for training')
+    parser.add_argument('--num_workers', default=8, type=int, help='Number of workers used in dataloading')
+    parser.add_argument('--device_target', dest='device_target', help='device for training',
+                        choices=['GPU', 'Ascend'], default='GPU', type=str)
+    parser.add_argument('--lr', '--learning-rate', default=0.001, type=float, help='initial learning rate')
+    parser.add_argument('--momentum', default=0.9, type=float, help='Momentum value for optim')
+    parser.add_argument('--weight_decay', default=5e-4, type=float, help='Weight decay for SGD')
+    parser.add_argument('--gamma', default=0.1, type=float, help='Gamma update for SGD')
+    parser.add_argument('--distribute', default=False, type=bool, help='Use mutil Gpu training')
+    parser.add_argument('--save_folder', default='checkpoints/', help='Directory for saving checkpoint models')
+    parser.add_argument('--epoches', default=100, type=int, help="Epoches to train model")
+    parser.add_argument('--val_mindrecord', default='data/val.mindrecord', type=str, help="Path of val mindrecord file")
+    args_ = parser.parse_args()
+    return args_
+def train(args):
+    print("The argument is: ", args)
+    context.set_context(device_target=args.device_target, mode=context.GRAPH_MODE)
+    device_id = 0
+    device_num = 1
+    ckpt_folder = os.path.join(args.save_folder, 'distribute_0')
+    if args.distribute:
+        D.init()
+        device_id = get_rank()
+        device_num = get_group_size()
+        if device_id == 0 and not os.path.exists(ckpt_folder):
+            os.mkdir(ckpt_folder)
+        context.reset_auto_parallel_context()
+        context.set_auto_parallel_context(parallel_mode=context.ParallelMode.DATA_PARALLEL, gradients_mean=True,
+                                          device_num=device_num)
+    else:
+        context.set_context(device_id=int(os.getenv('DEVICE_ID', '0')))
+    # Create train dataset
+    ds_train = create_train_dataset(cfg, args.batch_size, device_num, device_id, args.num_workers)
+    # Create val dataset
+    ds_val = create_val_dataset(args.val_mindrecord, args.batch_size, 1, 0, args.num_workers)
+    steps_per_epoch = ds_train.get_dataset_size()
+    net = build_net("train", cfg.NUM_CLASSES)
+    # load pretrained vgg16
+    vgg_params = load_checkpoint(args.basenet)
+    load_param_into_net(net.vgg, vgg_params)
+    network = NetWithLoss(net)
+    network.set_train(True)
+    if args.distribute:
+        milestone = cfg.DIS_LR_STEPS + [args.epoches * steps_per_epoch]
+    else:
+        milestone = cfg.LR_STEPS + [args.epoches * steps_per_epoch]
+    learning_rates = [args.lr, args.lr * 0.1, args.lr * 0.01, args.lr * 0.001]
+    lr_scheduler = nn.piecewise_constant_lr(milestone, learning_rates)
+    optimizer = nn.SGD(params=network.trainable_params(), learning_rate=lr_scheduler, momentum=args.momentum,
+                       weight_decay=args.weight_decay)
+    # train net
+    train_net = nn.TrainOneStepCell(network, optimizer)
+    train_net.set_train(True)
+    eval_net = EvalLoss(net)
+    print("Start training net")
+    whole_step = 0
+    for epoch in range(1, args.epoches+1):
+        step = 0
+        time_list = []
+        for d in ds_train.create_tuple_iterator():
+            start_time = time.time()
+            loss = train_net(*d)
+            step += 1
+            whole_step += 1
+            print(f'epoch: {epoch} total step: {whole_step}, step: {step}, loss is {loss}')
+            per_time = time.time() - start_time
+            time_list.append(per_time)
+        net.set_train(False)
+        if args.distribute and device_id == 0:
+            print('per step time: ', '%.2f' % (sum(time_list) / len(time_list) * 1000), "(ms/step)")
+            val(epoch, eval_net, train_net, ds_val, ckpt_folder)
+        elif not args.distribute:
+            print('per step time: ', '%.2f' % (sum(time_list) / len(time_list) * 1000), "(ms/step)")
+            val(epoch, eval_net, train_net, ds_val, args.save_folder)
+        net.set_train(True)
+def val(epoch, eval_net, model, ds_val, ckpt_dir):
+    face_loss_list = []
+    global MIN_LOSS
+    for (images, face_loc, face_conf, _, _) in ds_val.create_tuple_iterator():
+        face_loss = eval_net(images, face_loc, face_conf)
+        face_loss_list.append(face_loss)
+    a_loss = sum(face_loss_list) / len(face_loss_list)
+    if a_loss < MIN_LOSS:
+        MIN_LOSS = a_loss
+        print("Saving best ckpt, epoch is ", epoch)
+        save_checkpoint(model, os.path.join(ckpt_dir, f'pyramidbox_best_{epoch}.ckpt'))
+if __name__ == '__main__':
+    train_args = parse_args()
+    set_seed(66)
+    if not os.path.exists(train_args.save_folder):
+        os.mkdir(train_args.save_folder)
+    train(train_args)