Skip to content
Snippets Groups Projects
Unverified Commit c545eab2 authored by i-robot's avatar i-robot Committed by Gitee
Browse files

!3721 [杭州电子科技大学][高校贡献][Mindspore][PyramidBox_gpu]

Merge pull request !3721 from Jamesbing/PyramidBox
parents 4e1318ed 8ad224e9
No related branches found
No related tags found
No related merge requests found
Showing
with 3348 additions and 0 deletions
# 目录
<!-- TOC -->
- [目录](#目录)
- [PyramidBox描述](#pyramidbox描述)
- [模型架构](#模型架构)
- [数据集](#数据集)
- [WIDER Face](#wider-face)
- [FDDB](#fddb)
- [环境要求](#环境要求)
- [快速入门](#快速入门)
- [脚本说明](#脚本说明)
- [脚本及样例代码](#脚本及样例代码)
- [脚本参数](#脚本参数)
- [训练模型](#训练模型)
- [评估模型](#评估模型)
- [配置参数](#配置参数)
- [训练过程](#训练过程)
- [单卡训练](#单卡训练)
- [多卡训练](#多卡训练)
- [评估过程](#评估过程)
- [模型描述](#模型描述)
- [性能](#性能)
<!-- /TOC -->
# PyramidBox描述
[PyramidBox](https://arxiv.org/pdf/1803.07737.pdf) 是一种基于SSD的单阶段人脸检测器,它利用上下文信息解决困难人脸的检测问题。如下图所示,PyramidBox在六个尺度的特征图上进行不同层级的预测。该工作主要包括以下模块:LFPN、Pyramid Anchors、CPM、Data-anchor-sampling。
[论文](https://arxiv.org/pdf/1803.07737.pdf): Tang, Xu, et al. "Pyramidbox: A context-assisted single shot face detector." Proceedings of the European conference on computer vision (ECCV). 2018.
# 模型架构
**LFPN**: LFPN全称Low-level Feature Pyramid Networks, 在检测任务中,LFPN可以充分结合高层次的包含更多上下文的特征和低层次的包含更多纹理的特征。高层级特征被用于检测尺寸较大的人脸,而低层级特征被用于检测尺寸较小的人脸。为了将高层级特征整合到高分辨率的低层级特征上,我们从中间层开始做自上而下的融合,构建Low-level FPN。
**Pyramid Anchors**: 该算法使用半监督解决方案来生成与人脸检测相关的具有语义的近似标签,提出基于anchor的语境辅助方法,它引入有监督的信息来学习较小的、模糊的和部分遮挡的人脸的语境特征。使用者可以根据标注的人脸标签,按照一定的比例扩充,得到头部的标签(上下左右各扩充1/2)和人体的标签(可自定义扩充比例)。
**CPM**: CPM全称Context-sensitive Predict Module, 本方法设计了一种上下文敏感结构(CPM)来提高预测网络的表达能力。
**Data-anchor-sampling**: 设计了一种新的采样方法,称作Data-anchor-sampling,该方法可以增加训练样本在不同尺度上的多样性。该方法改变训练样本的分布,重点关注较小的人脸。
# 数据集
使用的数据集一共有两个:
1. [WIDER Face](http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace/)
1. [FDDB](http://vis-www.cs.umass.edu/fddb/index.html)
详细地,
## WIDER Face
- WIDER Face数据集用于训练模型和验证模型,下载训练数据WIDER Face Training Images,解压下载的WIDER_train数据集;下载验证数据集WIDER Face Validation Images,解压下载的WIDER_val数据集。
- 下载WIDER Face的[标注文件](http://shuoyang1213.me/WIDERFACE/support/bbx_annotation/wider_face_split.zip),解压成文件夹wider_face_split。
- 在dataset文件夹下新建目录WIDERFACE,将WIDER_train,WIDER_val和wider_face_split文件夹放在目录WIDERFACE下。
- 数据集大小:包含32,203张图片,393,703个标注人脸。
- WIDER_train: 1.4G
- WIDER_val:355M
- 检查WIDER_train,WIDER_val和wider_face_split文件夹在WIDERFACE目录下。
## FDDB
- FDDB数据集用来评估模型,下载[originalPics.tar.gz](http://vis-www.cs.umass.edu/fddb/originalPics.tar.gz)压缩包和[FDDB-folds.tgz](http://vis-www.cs.umass.edu/fddb/FDDB-folds.tgz)压缩包,originalPics.tar.gz压缩包包含未标注的图片,FDDB-folds.tgz包含标注信息。
- 数据集大小:包含2,845张图片和5,171个人脸标注。
- originalPics.tar.gz:553M
- FDDB-folds.tgz:1M
- 在dataset文件夹下新建文件夹FDDB。
- 解压originalPics.tar.gz至FDDB,包含两个文件夹2002和2003:
````bash
├── 2002
│ ├── 07
│ ├── 08
│ ├── 09
│ ├── 10
│ ├── 11
│ └── 12
├── 2003
│ ├── 01
│ ├── 02
│ ├── 03
│ ├── 04
│ ├── 05
│ ├── 06
│ ├── 07
│ ├── 08
│ └── 09
````
- 解压FDDB-folds.tgz至FDDB,包含20个txt文件:
```bash
FDDB-folds
│ ├── FDDB-fold-01-ellipseList.txt
│ ├── FDDB-fold-01.txt
│ ├── FDDB-fold-02-ellipseList.txt
│ ├── FDDB-fold-02.txt
│ ├── FDDB-fold-03-ellipseList.txt
│ ├── FDDB-fold-03.txt
│ ├── FDDB-fold-04-ellipseList.txt
│ ├── FDDB-fold-04.txt
│ ├── FDDB-fold-05-ellipseList.txt
│ ├── FDDB-fold-05.txt
│ ├── FDDB-fold-06-ellipseList.txt
│ ├── FDDB-fold-06.txt
│ ├── FDDB-fold-07-ellipseList.txt
│ ├── FDDB-fold-07.txt
│ ├── FDDB-fold-08-ellipseList.txt
│ ├── FDDB-fold-08.txt
│ ├── FDDB-fold-09-ellipseList.txt
│ ├── FDDB-fold-09.txt
│ ├── FDDB-fold-10-ellipseList.txt
│ ├── FDDB-fold-10.txt
```
- 检查2002,2003,FDDB-folds三个文件夹在FDDB文件夹下,且FDDB文件夹在dataset文件夹下。
---------
综上,一共有一个训练集文件,是WIDER_train;两个验证集文件,WIDER_val和FDDB。
编辑`src/config.py`文件,将`_C.HOME`字段改成dataset数据集路径。
总的数据集目录结构如下:
```bash
dataset
├── FDDB
│ ├── 2002
│ ├── 2003
│ └── FDDB-folds
└── WIDERFACE
├── wider_face_split
│ ├── readme.txt
│ ├── wider_face_test_filelist.txt
│ ├── wider_face_test.mat
│ ├── wider_face_train_bbx_gt.txt
│ ├── wider_face_train.mat
│ ├── wider_face_val_bbx_gt.txt
│ └── wider_face_val.mat
├── WIDER_train
│ └── images
└── WIDER_val
└── images
```
# 环境要求
- 硬件(Ascend/GPU/CPU)
- 使用GPU搭建硬件环境
- 框架
[MindSpore](https://www.mindspore.cn/install/en)
- 如需查看详情,请参见如下资源:
- [MindSpore教程](https://www.mindspore.cn/tutorials/zh-CN/master/index.html)
- [MindSpore Python API](https://www.mindspore.cn/docs/api/zh-CN/master/index.html)
# 快速入门
通过官方网站安装MindSpore后,您可以按照如下步骤进行训练和评估:
在开始训练前,需要进行以下准备工作:
1. 检查`src/config.py`文件的`_C.HOME`字段为dataset文件夹路径。
2. 对wider_face_train_bbx_gt.txt和wider_face_val_bbx_gt.txt文件进行预处理以生成face_train.txt和face_val.txt文件。
````bash
# 预处理wider_face_train_bbx_gt.txt和wider_face_val_bbx_gt.txt文件
# 进入项目主目录
python preprocess.py
# 成功执行后data文件夹下出现face_train.txt和face_val.txt
````
3. 生成face_val.txt的mindrecord文件,用于训练过程中验证每一轮模型精度,找出最佳训练模型。
```bash
bash scripts/generate_mindrecord.sh
# 成功执行后data文件夹下出现val.mindrecord和val.mindrecord.db文件
```
4. 下载预训练完成的[vgg16.ckpt](https://pan.baidu.com/s/1e5qSW4e1QVZRnbyGRWi91Q?pwd=dryt)文件,该预训练模型转自PyTorch。
完成以上步骤后,开始训练模型。
1. 单卡训练
```bash
bash scripts/run_standalone_train_gpu.sh DEVICE_ID VGG16_CKPT VAL_MINDRECORD_FILE
# example: bash scripts/run_standalone_train_gpu.sh 0 vgg16.ckpt val.mindrecord
```
2. 多卡训练
```bash
bash scripts/run_distribute_train_gpu.sh DEVICE_NUM VGG16_CKPT VAL_MINDRECORD_FILE
# example: bash scripts/run_distribute_train_gpu.sh 8 vgg16.ckpt val.mindrecord
```
训练完毕后,开始验证PyramidBox模型。
3. 评估模型
``` bash
# 用FDDB数据集评估
bash scripts/run_eval_gpu.sh PYRAMIDBOX_CKPT
# example: bash scripts/run_eval_gpu.sh checkpoints/pyramidbox.ckpt
```
# 脚本说明
## 脚本及样例代码
```bash
PyramidBox
├── data // 保存预处理后数据集文件和mindrecord文件
├── eval.py // 评估模型脚本
├── preprocess.py // 数据集标注文件预处理脚本
├── generate_mindrecord.py // 创建mindrecord文件脚本
├── README_CN.md // PyramidBox中文描述文档
├── scripts
│ ├── generate_mindrecord_onet.sh // 生成用于验证的mindrecord文件shell脚本
│ ├── run_distributed_train_gpu.sh // GPU多卡训练shell脚本
│ ├── run_eval_gpu.sh // GPU模型评估shell脚本
│ └── run_standalone_train_gpu.sh // GPU单卡训练shell脚本
├── src
│ ├── augmentations.py // 数据增强脚本
│ ├── dataset.py // 数据集脚本
│ ├── evaluate.py // 模型评估脚本
│ ├── loss.py // 损失函数
│ ├── config.py // 配置文件
│ ├── bbox_utils.py // box处理函数
│ ├── detection.py // decode模型预测点和置信度
│ ├── prior_box.py // 默认候选框生成脚本
│ └── pyramidbox.py // PyramidBox模型
└── train.py // 训练模型脚本
```
## 脚本参数
### 训练模型
```bash
usage: train.py [-h] [--basenet BASENET] [--batch_size BATCH_SIZE]
[--num_workers NUM_WORKERS] [--device_target {GPU,Ascend}]
[--lr LR] [--momentum MOMENTUM] [--weight_decay WEIGHT_DECAY]
[--gamma GAMMA] [--distribute DISTRIBUTE]
[--save_folder SAVE_FOLDER] [--epoches EPOCHES]
[--val_mindrecord VAL_MINDRECORD]
Pyramidbox face Detector Training With MindSpore
optional arguments:
-h, --help show this help message and exit
--basenet BASENET Pretrained base model
--batch_size BATCH_SIZE
Batch size for training
--num_workers NUM_WORKERS
Number of workers used in dataloading
--device_target {GPU,Ascend}
device for training
--lr LR, --learning-rate LR
initial learning rate
--momentum MOMENTUM Momentum value for optim
--weight_decay WEIGHT_DECAY
Weight decay for SGD
--gamma GAMMA Gamma update for SGD
--distribute DISTRIBUTE
Use mutil Gpu training
--save_folder SAVE_FOLDER
Directory for saving checkpoint models
--epoches EPOCHES Epoches to train model
--val_mindrecord VAL_MINDRECORD
Path of val mindrecord file
```
### 评估模型
```bash
usage: eval.py [-h] [--model MODEL] [--thresh THRESH]
PyramidBox Evaluatuon on Fddb
optional arguments:
-h, --help show this help message and exit
--model MODEL trained model
--thresh THRESH Final confidence threshold
```
### 配置参数
```bash
config.py:
LR_STEPS: 单卡训练学习率衰减步数
DIS_LR_STEPS: 多卡训练学习率衰减步数
FEATURE_MAPS: 训练集数据特征形状列表
INPUT_SIZE: 输入数据大小
STEPS: 生成默认候选框步数
ANCHOR_SIZES: 默认候选框尺寸
NUM_CLASSES: 分类类别数
OVERLAP_THRESH: 重合度阈值
NEG_POS_RATIOS: 负样本与正样本比例
NMS_THRESH: nms阈值
TOP_K: top k数量
KEEP_TOP_K: 保留的top k数量
CONF_THRESH: 置信度阈值
HOME: 数据集主目录
FACE.FILE_DIR: data文件夹路径
FACE.TRIN_FILE: face_train.txt文件
FACE.VAL_FILE: face_val.txt文件
FACE.FDDB_DIR: FDDB文件夹
FACE.WIDER_DIR: WIDER face文件夹
```
## 训练过程
在开始训练之前,请确保已完成准备工作,即:
1. `src/config.py`文件的`_C.HOME`字段为dataset文件夹路径
2. 对wider_face_train_bbx_gt.txt和wider_face_val_bbx_gt.txt文件进行预处理以生成face_train.txt和face_val.txt文件。
3. 生成face_val.txt的mindrecord文件。
4. 下载预训练完成的[vgg16.ckpt](https://pan.baidu.com/s/1e5qSW4e1QVZRnbyGRWi91Q?pwd=dryt)文件。
准备工作完成后方可训练。
### 单卡训练
```bash
bash scripts/run_standalone_train_gpu.sh DEVICE_ID VGG16_CKPT VAL_MINDRECORD_FILE
# example: bash scripts/run_standalone_train_gpu.sh 0 vgg16.ckpt val.mindrecord
```
训练过程会在后台运行,训练模型将保存在`checkpoints`文件夹中,可以通过`logs/training_gpu.log`文件查看训练输出,输出结果如下所示:
```bash
epoch: 2 step: 456, loss is 0.3661264
epoch: 2 step: 457, loss is 0.32284224
epoch: 2 step: 458, loss is 0.29254544
epoch: 2 step: 459, loss is 0.32631972
epoch: 2 step: 460, loss is 0.3065704
epoch: 2 step: 461, loss is 0.3995605
epoch: 2 step: 462, loss is 0.2614449
epoch: 2 step: 463, loss is 0.50305885
epoch: 2 step: 464, loss is 0.30908597
···
```
### 多卡训练
```bash
bash scripts/run_distribute_train_gpu.sh [DEVICE_NUM] [VGG16_CKPT] [VAL_MINDRECORD_FILE]
# example: bash scripts/run_distribute_train_gpu.sh 8 vgg16.ckpt val.mindrecord
```
训练过程会在后台运行,只保存第一张卡的训练模型,训练模型将保存在`checkpoints/distribute_0/`文件夹中,可以通过`logs/distribute_training_gpu.log`文件查看训练输出,输出结果如下所示:
```bash
epoch: 1 total step: 2, step: 2, loss is 25.479286
epoch: 1 total step: 2, step: 2, loss is 30.297405
epoch: 1 total step: 2, step: 2, loss is 28.816475
epoch: 1 total step: 2, step: 2, loss is 25.439453
epoch: 1 total step: 2, step: 2, loss is 28.585438
epoch: 1 total step: 2, step: 2, loss is 31.117134
epoch: 1 total step: 2, step: 2, loss is 25.770748
epoch: 1 total step: 2, step: 2, loss is 27.557945
epoch: 1 total step: 3, step: 3, loss is 28.352016
epoch: 1 total step: 3, step: 3, loss is 31.99873
epoch: 1 total step: 3, step: 3, loss is 31.426039
epoch: 1 total step: 3, step: 3, loss is 24.02226
epoch: 1 total step: 3, step: 3, loss is 30.12824
epoch: 1 total step: 3, step: 3, loss is 29.977898
epoch: 1 total step: 3, step: 3, loss is 24.06476
epoch: 1 total step: 3, step: 3, loss is 28.573633
epoch: 1 total step: 4, step: 4, loss is 28.599226
epoch: 1 total step: 4, step: 4, loss is 34.262005
epoch: 1 total step: 4, step: 4, loss is 30.732353
epoch: 1 total step: 4, step: 4, loss is 28.62697
epoch: 1 total step: 4, step: 4, loss is 39.44549
epoch: 1 total step: 4, step: 4, loss is 27.754185
epoch: 1 total step: 4, step: 4, loss is 26.15754
...
```
## 评估过程
```bash
bash scripts/run_eval_gpu.sh [PYRAMIDBOX_CKPT]
# example: bash scripts/run_eval_gpu.sh checkpoints/pyramidbox.ckpt
```
注:模型名称为`pyramidbox_best_{epoch}.ckpt`,epoch表示该检查点保存时训练的轮数,epoch越大,WIDER val的loss值越小,模型精度相对越高,因此在评估最佳模型时,优先评估epoch最大的模型,按照epoch从大到小的顺序评估。
评估过程会在后台进行,评估结果可以通过`logs/eval_gpu.log`文件查看,输出结果如下所示:
```bash
==================== Results ====================
FDDB-fold-1 Val AP: 0.9614604685893
FDDB-fold-2 Val AP: 0.9615593696135745
FDDB-fold-3 Val AP: 0.9607889632039851
FDDB-fold-4 Val AP: 0.972454404596466
FDDB-fold-5 Val AP: 0.9734522365236052
FDDB-fold-6 Val AP: 0.952158002966933
FDDB-fold-7 Val AP: 0.9618735923917133
FDDB-fold-8 Val AP: 0.9501671313630741
FDDB-fold-9 Val AP: 0.9539008001056393
FDDB-fold-10 Val AP: 0.9664355605240443
FDDB Dataset Average AP: 0.9614250529878333
=================================================
```
# 模型描述
## 性能
| 参数 | PyramidBox |
| -------------------- | ------------------------------------------------------- |
| 资源 | GPU(Tesla V100 SXM2),CPU 2.1GHz 24cores,Memory 128G|
| 上传日期 | 2022-09-17 |
| MindSpore版本 | 1.8.1 |
| 数据集 | WIDER Face, FDDB |
| 训练参数 | epoch=100,batch_size=4, lr=5e-4 |
| 优化器 | SGD |
| 损失函数 | SoftmaxCrossEntropyWithLogits, SmoothL1Loss |
| 输出 | 坐标,置信度 |
| 损失 | 2-6 |
| 速度 | 570毫秒/步(单卡) 650毫秒/步(八卡) |
| 总时长 | 50时58分(单卡);7时12分(八卡) |
| 微调检查点 | 655M (.ckpt文件) |
# Copyright 2022 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
# This file was copied from project [ZhaoWeicheng][Pyramidbox.pytorch]
import os
import argparse
from PIL import Image
from mindspore import Tensor, context
from mindspore import load_checkpoint, load_param_into_net
import numpy as np
from src.config import cfg
from src.pyramidbox import build_net
from src.augmentations import to_chw_bgr
from src.prior_box import PriorBox
from src.detection import Detect
from src.evaluate import evaluation
parser = argparse.ArgumentParser(description='PyramidBox Evaluatuon on Fddb')
parser.add_argument('--model', type=str, default='checkpoints/pyramidbox.pth', help='trained model')
parser.add_argument('--thresh', default=0.1, type=float, help='Final confidence threshold')
args = parser.parse_args()
FDDB_IMG_DIR = cfg.FACE.FDDB_DIR
FDDB_FOLD_DIR = os.path.join(FDDB_IMG_DIR, 'FDDB-folds')
FDDB_OUT_DIR = 'FDDB-out'
if not os.path.exists(FDDB_OUT_DIR):
os.mkdir(FDDB_OUT_DIR)
def detect_face(net_, img_, thresh):
x = to_chw_bgr(img_).astype(np.float32)
x -= cfg.img_mean
x = x[[2, 1, 0], :, :]
x = Tensor(x)[None, :, :, :]
size = x.shape[2:]
loc, conf, feature_maps = net_(x)
prior_box = PriorBox(cfg, feature_maps, size, 'test')
default_priors = prior_box.forward()
detections = Detect(cfg).detect(loc, conf, default_priors)
scale = np.array([img_.shape[1], img_.shape[0], img_.shape[1], img_.shape[0]])
bboxes = []
for i in range(detections.shape[1]):
j = 0
while detections[0, i, j, 0] >= thresh:
box = []
score = detections[0, i, j, 0]
pt = (detections[0, i, j, 1:] * scale).astype(np.int32)
j += 1
box += [pt[0], pt[1], pt[2] - pt[0], pt[3] - pt[1], score]
bboxes += [box]
return bboxes
if __name__ == '__main__':
context.set_context(mode=context.PYNATIVE_MODE)
net = build_net('test', cfg.NUM_CLASSES)
params = load_checkpoint(args.model)
load_param_into_net(net, params)
net.set_train(False)
print("Start detecting FDDB images")
for index in range(1, 11):
if not os.path.exists(os.path.join(FDDB_OUT_DIR, str(index))):
os.mkdir(os.path.join(FDDB_OUT_DIR, str(index)))
print(f"Detecting folder {index}")
file_path = os.path.join(cfg.FACE.FDDB_DIR, 'FDDB-folds', 'FDDB-fold-%02d.txt' % index)
with open(file_path, 'r') as f:
lines = f.readlines()
for line in lines:
line = line.strip('\n')
image_path = os.path.join(cfg.FACE.FDDB_DIR, line) + '.jpg'
img = Image.open(image_path)
if img.mode == 'L':
img = img.convert('RGB')
img = np.array(img)
line = line.replace('/', '_')
with open(os.path.join(FDDB_OUT_DIR, str(index), line + '.txt'), 'w') as w:
w.write(line)
w.write('\n')
boxes = detect_face(net, img, args.thresh)
if not boxes is None:
w.write(str(len(boxes)))
w.write('\n')
for box_ in boxes:
w.write(f'{int(box_[0])} {int(box_[1])} {int(box_[2])} {int(box_[3])} {box_[4]}\n')
print("Detection Done!")
print("Start evluation!")
evaluation(FDDB_OUT_DIR, FDDB_FOLD_DIR)
# Copyright 2022 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
import argparse
import os
from mindspore.mindrecord import FileWriter
from src.dataset import WIDERDataset
from src.config import cfg
parser = argparse.ArgumentParser(description='Generate Mindrecord File for training')
parser.add_argument('--prefix', type=str, default='./data', help="Directory to store mindrecord file")
parser.add_argument('--val_name', type=str, default='val.mindrecord', help='Name of val mindrecord file')
args = parser.parse_args()
def data_to_mindrecord(mindrecord_prefix, mindrecord_name, dataset):
if not os.path.exists(mindrecord_prefix):
os.mkdir(mindrecord_prefix)
mindrecord_path = os.path.join(mindrecord_prefix, mindrecord_name)
writer = FileWriter(mindrecord_path, 1, overwrite=True)
data_json = {
'img': {"type": "float32", "shape": [3, 640, 640]},
'face_loc': {"type": "float32", "shape": [34125, 4]},
'face_conf': {"type": "float32", "shape": [34125]},
'head_loc': {"type": "float32", "shape": [34125, 4]},
'head_conf': {"type": "float32", "shape": [34125]}
}
writer.add_schema(data_json, 'data_json')
count = 0
for d in dataset:
img, face_loc, face_conf, head_loc, head_conf = d
row = {
"img": img,
"face_loc": face_loc,
"face_conf": face_conf,
"head_loc": head_loc,
"head_conf": head_conf
}
writer.write_raw_data([row])
count += 1
writer.commit()
print("Total train data: ", count)
print("Create mindrecord done!")
if __name__ == '__main__':
print("Start generating val mindrecord file")
ds_val = WIDERDataset(cfg.FACE.VAL_FILE, mode='val')
data_to_mindrecord(args.prefix, args.val_name, ds_val)
# Copyright 2022 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
# This file was copied from project [ZhaoWeicheng][Pyramidbox.pytorch]
import os
from src.config import cfg
WIDER_ROOT = os.path.join(cfg.HOME, 'WIDERFACE')
train_list_file = os.path.join(WIDER_ROOT, 'wider_face_split',
'wider_face_train_bbx_gt.txt')
val_list_file = os.path.join(WIDER_ROOT, 'wider_face_split',
'wider_face_val_bbx_gt.txt')
WIDER_TRAIN = os.path.join(WIDER_ROOT, 'WIDER_train', 'images')
WIDER_VAL = os.path.join(WIDER_ROOT, 'WIDER_val', 'images')
def parse_wider_file(root, file):
with open(file, 'r') as fr:
lines = fr.readlines()
face_count = []
img_paths = []
face_loc = []
img_faces = []
count = 0
flag = False
for k, line in enumerate(lines):
line = line.strip().strip('\n')
if count > 0:
line = line.split(' ')
count -= 1
loc = [int(line[0]), int(line[1]), int(line[2]), int(line[3])]
face_loc += [loc]
if flag:
face_count += [int(line)]
flag = False
count = int(line)
if 'jpg' in line:
img_paths += [os.path.join(root, line)]
flag = True
total_face = 0
for k in face_count:
face_ = []
for x in range(total_face, total_face + k):
face_.append(face_loc[x])
img_faces += [face_]
total_face += k
return img_paths, img_faces
def wider_data_file():
if not os.path.exists(cfg.FACE.FILE_DIR):
os.mkdir(cfg.FACE.FILE_DIR)
img_paths, bbox = parse_wider_file(WIDER_TRAIN, train_list_file)
fw = open(cfg.FACE.TRAIN_FILE, 'w')
for index in range(len(img_paths)):
path = img_paths[index]
boxes = bbox[index]
fw.write(path)
fw.write(' {}'.format(len(boxes)))
for box in boxes:
data = ' {} {} {} {} {}'.format(box[0], box[1], box[2], box[3], 1)
fw.write(data)
fw.write('\n')
fw.close()
img_paths, bbox = parse_wider_file(WIDER_VAL, val_list_file)
fw = open(cfg.FACE.VAL_FILE, 'w')
for index in range(len(img_paths)):
path = img_paths[index]
boxes = bbox[index]
fw.write(path)
fw.write(' {}'.format(len(boxes)))
for box in boxes:
data = ' {} {} {} {} {}'.format(box[0], box[1], box[2], box[3], 1)
fw.write(data)
fw.write('\n')
fw.close()
if __name__ == '__main__':
wider_data_file()
easydict==1.9
mindspore-gpu==1.8.1
numpy==1.21.5
opencv-python==4.5.5.62
Pillow==9.0.0
scikit-image==0.18.3
tqdm==4.64.1
\ No newline at end of file
#!/bin/bash
# Copyright 2022 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
echo "=============================================================================================================="
echo "Please run the script as: "
echo "bash generate_mindrecord.sh"
echo "for example: bash generate_mindrecord.sh"
echo "=============================================================================================================="
PROJECT_DIR=$(cd "$(dirname "$0")" || exit; pwd)
LOG_DIR=$PROJECT_DIR/../logs
if [ ! -d $LOG_DIR ]
then
mkdir $LOG_DIR
fi
python $PROJECT_DIR/../generate_mindrecord.py > $LOG_DIR/generate_mindrecord.log 2>&1 &
echo "The data log is at /logs/generate_mindrecord.log"
#!/bin/bash
# Copyright 2022 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
echo "=============================================================================================================="
echo "Please run the script as: "
echo "bash run_distribute_train_gpu.sh DEVICE_NUM VGG16_CKPT VAL_MINDRECORD_FILE"
echo "for example: bash run_distribute_train_gpu.sh 8 vgg16.ckpt val.mindrecord"
echo "=============================================================================================================="
DEVICE_NUM=$1
VGG16=$2
VAL_MINDRECORD=$3
if [ $# -lt 3 ];
then
echo "---------------------ERROR----------------------"
echo "You must specify number of gpu devices, vgg16 checkpoint, mindrecord file for evaling"
exit
fi
PROJECT_DIR=$(cd "$(dirname "$0")" || exit; pwd)
LOG_DIR=$PROJECT_DIR/../logs
if [ ! -d $LOG_DIR ]
then
mkdir $LOG_DIR
fi
export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python
mpirun -n $DEVICE_NUM --allow-run-as-root python $PROJECT_DIR/../train.py \
--distribute True \
--lr 5e-4 \
--device_target GPU \
--val_mindrecord $VAL_MINDRECORD \
--epoches 100 \
--basenet $VGG16 \
--num_workers 1 \
--batch_size 4 > $LOG_DIR/distribute_training_gpu.log 2>&1 &
echo "The distributed train log is at /logs/distribute_training_gpu.log"
#!/bin/bash
# Copyright 2022 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
echo "=============================================================================================================="
echo "Please run the script as: "
echo "bash run_eval_gpu.sh PYRAMIDBOX_CKPT"
echo "for example: bash run_eval_gpu.sh pyramidbox.ckpt"
echo "=============================================================================================================="
if [ $# -lt 1 ];
then
echo "---------------------ERROR----------------------"
echo "You must specify pyramidbox checkpoint"
exit
fi
CKPT=$1
PROJECT_DIR=$(cd "$(dirname "$0")" || exit; pwd)
LOG_DIR=$PROJECT_DIR/../logs
if [ ! -d $LOG_DIR ]
then
mkdir $LOG_DIR
fi
export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python
python $PROJECT_DIR/../eval.py --model $CKPT > $LOG_DIR/eval_gpu.log 2>&1 &
#!/bin/bash
# Copyright 2022 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
echo "=============================================================================================================="
echo "Please run the script as: "
echo "bash run_standalone_train_gpu.sh DEVICE_ID VGG16_CKPT VAL_MINDRECORD_FILE"
echo "for example: bash run_standalone_train_gpu.sh 0 vgg16.ckpt val.mindrecord"
echo "=============================================================================================================="
DEVICE_ID=$1
VGG16=$2
VAL_MINDRECORD=$3
if [ $# -lt 3 ];
then
echo "---------------------ERROR----------------------"
echo "You must specify gpu device, vgg16 checkpoint and mindrecord file for valing"
exit
fi
PROJECT_DIR=$(cd "$(dirname "$0")" || exit; pwd)
LOG_DIR=$PROJECT_DIR/../logs
if [ ! -d $LOG_DIR ]
then
mkdir $LOG_DIR
fi
export DEVICE_ID=$DEVICE_ID
export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python
python $PROJECT_DIR/../train.py \
--device_target GPU \
--epoches 100 \
--lr 5e-4 \
--basenet $VGG16 \
--num_workers 2 \
--val_mindrecord $VAL_MINDRECORD \
--batch_size 4 > $LOG_DIR/training_gpu.log 2>&1 &
echo "The standalone train log is at /logs/training_gpu.log"
# Copyright 2022 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
# This file was copied from project [ZhaoWeicheng][Pyramidbox.pytorch]
import math
import random
import six
import cv2
import numpy as np
from PIL import Image, ImageEnhance
from src.config import cfg
class Sampler():
def __init__(self,
max_sample,
max_trial,
min_scale,
max_scale,
min_aspect_ratio,
max_aspect_ratio,
min_jaccard_overlap,
max_jaccard_overlap,
min_object_coverage,
max_object_coverage,
use_square=False):
self.max_sample = max_sample
self.max_trial = max_trial
self.min_scale = min_scale
self.max_scale = max_scale
self.min_aspect_ratio = min_aspect_ratio
self.max_aspect_ratio = max_aspect_ratio
self.min_jaccard_overlap = min_jaccard_overlap
self.max_jaccard_overlap = max_jaccard_overlap
self.min_object_coverage = min_object_coverage
self.max_object_coverage = max_object_coverage
self.use_square = use_square
def intersect(box_a, box_b):
max_xy = np.minimum(box_a[:, 2:], box_b[2:])
min_xy = np.maximum(box_a[:, :2], box_b[:2])
inter = np.clip((max_xy - min_xy), a_min=0, a_max=np.inf)
return inter[:, 0] * inter[:, 1]
def jaccard_numpy(box_a, box_b):
"""Compute the jaccard overlap of two sets of boxes. The jaccard overlap
is simply the intersection over union of two boxes.
E.g.:
A ∩ B / A ∪ B = A ∩ B / (area(A) + area(B) - A ∩ B)
Args:
box_a: Multiple bounding boxes, Shape: [num_boxes,4]
box_b: Single bounding box, Shape: [4]
Return:
jaccard overlap: Shape: [box_a.shape[0], box_a.shape[1]]
"""
inter = intersect(box_a, box_b)
area_a = ((box_a[:, 2] - box_a[:, 0]) *
(box_a[:, 3] - box_a[:, 1])) # [A,B]
area_b = ((box_b[2] - box_b[0]) *
(box_b[3] - box_b[1])) # [A,B]
union = area_a + area_b - inter
return inter / union # [A,B]
class Bbox():
def __init__(self, xmin, ymin, xmax, ymax):
self.xmin = xmin
self.ymin = ymin
self.xmax = xmax
self.ymax = ymax
def random_brightness(img):
prob = np.random.uniform(0, 1)
if prob < cfg.brightness_prob:
delta = np.random.uniform(-cfg.brightness_delta,
cfg.brightness_delta) + 1
img = ImageEnhance.Brightness(img).enhance(delta)
return img
def random_contrast(img):
prob = np.random.uniform(0, 1)
if prob < cfg.contrast_prob:
delta = np.random.uniform(-cfg.contrast_delta,
cfg.contrast_delta) + 1
img = ImageEnhance.Contrast(img).enhance(delta)
return img
def random_saturation(img):
prob = np.random.uniform(0, 1)
if prob < cfg.saturation_prob:
delta = np.random.uniform(-cfg.saturation_delta,
cfg.saturation_delta) + 1
img = ImageEnhance.Color(img).enhance(delta)
return img
def random_hue(img):
prob = np.random.uniform(0, 1)
if prob < cfg.hue_prob:
delta = np.random.uniform(-cfg.hue_delta, cfg.hue_delta)
img_hsv = np.array(img.convert('HSV'))
img_hsv[:, :, 0] = img_hsv[:, :, 0] + delta
img = Image.fromarray(img_hsv, mode='HSV').convert('RGB')
return img
def distort_image(img):
prob = np.random.uniform(0, 1)
# Apply different distort order
if prob > 0.5:
img = random_brightness(img)
img = random_contrast(img)
img = random_saturation(img)
img = random_hue(img)
else:
img = random_brightness(img)
img = random_saturation(img)
img = random_hue(img)
img = random_contrast(img)
return img
def meet_emit_constraint(src_bbox, sample_bbox):
center_x = (src_bbox.xmax + src_bbox.xmin) / 2
center_y = (src_bbox.ymax + src_bbox.ymin) / 2
if sample_bbox.xmin <= center_x <= sample_bbox.xmax and \
sample_bbox.ymin <= center_y <= sample_bbox.ymax:
return True
return False
def project_bbox(object_bbox, sample_bbox):
if object_bbox.xmin >= sample_bbox.xmax or \
object_bbox.xmax <= sample_bbox.xmin or \
object_bbox.ymin >= sample_bbox.ymax or \
object_bbox.ymax <= sample_bbox.ymin:
return False
proj_bbox = Bbox(0, 0, 0, 0)
sample_width = sample_bbox.xmax - sample_bbox.xmin
sample_height = sample_bbox.ymax - sample_bbox.ymin
proj_bbox.xmin = (object_bbox.xmin - sample_bbox.xmin) / sample_width
proj_bbox.ymin = (object_bbox.ymin - sample_bbox.ymin) / sample_height
proj_bbox.xmax = (object_bbox.xmax - sample_bbox.xmin) / sample_width
proj_bbox.ymax = (object_bbox.ymax - sample_bbox.ymin) / sample_height
proj_bbox = clip_bbox(proj_bbox)
if bbox_area(proj_bbox) > 0:
return proj_bbox
return False
def transform_labels(bbox_labels, sample_bbox):
sample_labels = []
for i in range(len(bbox_labels)):
sample_label = []
object_bbox = Bbox(bbox_labels[i][1], bbox_labels[i][2],
bbox_labels[i][3], bbox_labels[i][4])
if not meet_emit_constraint(object_bbox, sample_bbox):
continue
proj_bbox = project_bbox(object_bbox, sample_bbox)
if proj_bbox:
sample_label.append(bbox_labels[i][0])
sample_label.append(float(proj_bbox.xmin))
sample_label.append(float(proj_bbox.ymin))
sample_label.append(float(proj_bbox.xmax))
sample_label.append(float(proj_bbox.ymax))
sample_label = sample_label + bbox_labels[i][5:]
sample_labels.append(sample_label)
return sample_labels
def expand_image(img, bbox_labels, img_width, img_height):
prob = np.random.uniform(0, 1)
if prob < cfg.expand_prob:
if cfg.expand_max_ratio - 1 >= 0.01:
expand_ratio = np.random.uniform(1, cfg.expand_max_ratio)
height = int(img_height * expand_ratio)
width = int(img_width * expand_ratio)
h_off = math.floor(np.random.uniform(0, height - img_height))
w_off = math.floor(np.random.uniform(0, width - img_width))
expand_bbox = Bbox(-w_off / img_width, -h_off / img_height,
(width - w_off) / img_width,
(height - h_off) / img_height)
expand_img = np.ones((height, width, 3))
expand_img = np.uint8(expand_img * np.squeeze(cfg.img_mean))
expand_img = Image.fromarray(expand_img)
expand_img.paste(img, (int(w_off), int(h_off)))
bbox_labels = transform_labels(bbox_labels, expand_bbox)
return expand_img, bbox_labels, width, height
return img, bbox_labels, img_width, img_height
def clip_bbox(src_bbox):
src_bbox.xmin = max(min(src_bbox.xmin, 1.0), 0.0)
src_bbox.ymin = max(min(src_bbox.ymin, 1.0), 0.0)
src_bbox.xmax = max(min(src_bbox.xmax, 1.0), 0.0)
src_bbox.ymax = max(min(src_bbox.ymax, 1.0), 0.0)
return src_bbox
def bbox_area(src_bbox):
if src_bbox.xmax < src_bbox.xmin or src_bbox.ymax < src_bbox.ymin:
return 0.
width = src_bbox.xmax - src_bbox.xmin
height = src_bbox.ymax - src_bbox.ymin
return width * height
def intersect_bbox(bbox1, bbox2):
if bbox2.xmin > bbox1.xmax or bbox2.xmax < bbox1.xmin or \
bbox2.ymin > bbox1.ymax or bbox2.ymax < bbox1.ymin:
intersection_box = Bbox(0.0, 0.0, 0.0, 0.0)
else:
intersection_box = Bbox(
max(bbox1.xmin, bbox2.xmin),
max(bbox1.ymin, bbox2.ymin),
min(bbox1.xmax, bbox2.xmax), min(bbox1.ymax, bbox2.ymax))
return intersection_box
def bbox_coverage(bbox1, bbox2):
inter_box = intersect_bbox(bbox1, bbox2)
intersect_size = bbox_area(inter_box)
if intersect_size > 0:
bbox1_size = bbox_area(bbox1)
return intersect_size / bbox1_size
return 0.
def generate_batch_random_samples(batch_sampler, bbox_labels, image_width,
image_height, scale_array, resize_width,
resize_height):
sampled_bbox = []
for sampler in batch_sampler:
found = 0
for _ in range(sampler.max_trial):
if found >= sampler.max_sample:
break
sample_bbox = data_anchor_sampling(
sampler, bbox_labels, image_width, image_height, scale_array,
resize_width, resize_height)
if sample_bbox == 0:
break
if satisfy_sample_constraint(sampler, sample_bbox, bbox_labels):
sampled_bbox.append(sample_bbox)
found = found + 1
return sampled_bbox
def data_anchor_sampling(sampler, bbox_labels, image_width, image_height,
scale_array, resize_width, resize_height):
num_gt = len(bbox_labels)
# np.random.randint range: [low, high)
rand_idx = np.random.randint(0, num_gt) if num_gt != 0 else 0
if num_gt != 0:
norm_xmin = bbox_labels[rand_idx][1]
norm_ymin = bbox_labels[rand_idx][2]
norm_xmax = bbox_labels[rand_idx][3]
norm_ymax = bbox_labels[rand_idx][4]
xmin = norm_xmin * image_width
ymin = norm_ymin * image_height
wid = image_width * (norm_xmax - norm_xmin)
hei = image_height * (norm_ymax - norm_ymin)
range_size = 0
area = wid * hei
for scale_ind in range(0, len(scale_array) - 1):
if scale_array[scale_ind] ** 2 < area < scale_array[scale_ind + 1] ** 2:
range_size = scale_ind + 1
break
if area > scale_array[len(scale_array) - 2]**2:
range_size = len(scale_array) - 2
scale_choose = 0.0
if range_size == 0:
rand_idx_size = 0
else:
# np.random.randint range: [low, high)
rng_rand_size = np.random.randint(0, range_size + 1)
rand_idx_size = rng_rand_size % (range_size + 1)
if rand_idx_size == range_size:
min_resize_val = scale_array[rand_idx_size] / 2.0
max_resize_val = min(2.0 * scale_array[rand_idx_size],
2 * math.sqrt(wid * hei))
scale_choose = random.uniform(min_resize_val, max_resize_val)
else:
min_resize_val = scale_array[rand_idx_size] / 2.0
max_resize_val = 2.0 * scale_array[rand_idx_size]
scale_choose = random.uniform(min_resize_val, max_resize_val)
sample_bbox_size = wid * resize_width / scale_choose
w_off_orig = 0.0
h_off_orig = 0.0
if sample_bbox_size < max(image_height, image_width):
if wid <= sample_bbox_size:
w_off_orig = np.random.uniform(xmin + wid - sample_bbox_size,
xmin)
else:
w_off_orig = np.random.uniform(xmin,
xmin + wid - sample_bbox_size)
if hei <= sample_bbox_size:
h_off_orig = np.random.uniform(ymin + hei - sample_bbox_size,
ymin)
else:
h_off_orig = np.random.uniform(ymin,
ymin + hei - sample_bbox_size)
else:
w_off_orig = np.random.uniform(image_width - sample_bbox_size, 0.0)
h_off_orig = np.random.uniform(
image_height - sample_bbox_size, 0.0)
w_off_orig = math.floor(w_off_orig)
h_off_orig = math.floor(h_off_orig)
# Figure out top left coordinates.
w_off = 0.0
h_off = 0.0
w_off = float(w_off_orig / image_width)
h_off = float(h_off_orig / image_height)
sampled_bbox = Bbox(w_off, h_off,
w_off + float(sample_bbox_size / image_width),
h_off + float(sample_bbox_size / image_height))
return sampled_bbox
return 0
def jaccard_overlap(sample_bbox, object_bbox):
if sample_bbox.xmin >= object_bbox.xmax or \
sample_bbox.xmax <= object_bbox.xmin or \
sample_bbox.ymin >= object_bbox.ymax or \
sample_bbox.ymax <= object_bbox.ymin:
return 0
intersect_xmin = max(sample_bbox.xmin, object_bbox.xmin)
intersect_ymin = max(sample_bbox.ymin, object_bbox.ymin)
intersect_xmax = min(sample_bbox.xmax, object_bbox.xmax)
intersect_ymax = min(sample_bbox.ymax, object_bbox.ymax)
intersect_size = (intersect_xmax - intersect_xmin) * (
intersect_ymax - intersect_ymin)
sample_bbox_size = bbox_area(sample_bbox)
object_bbox_size = bbox_area(object_bbox)
overlap = intersect_size / (
sample_bbox_size + object_bbox_size - intersect_size)
return overlap
def satisfy_sample_constraint(sampler, sample_bbox, bbox_labels):
if sampler.min_jaccard_overlap == 0 and sampler.max_jaccard_overlap == 0:
has_jaccard_overlap = False
else:
has_jaccard_overlap = True
if sampler.min_object_coverage == 0 and sampler.max_object_coverage == 0:
has_object_coverage = False
else:
has_object_coverage = True
if not has_jaccard_overlap and not has_object_coverage:
return True
found = False
for i in range(len(bbox_labels)):
object_bbox = Bbox(bbox_labels[i][1], bbox_labels[i][2],
bbox_labels[i][3], bbox_labels[i][4])
if has_jaccard_overlap:
overlap = jaccard_overlap(sample_bbox, object_bbox)
if sampler.min_jaccard_overlap != 0 and \
overlap < sampler.min_jaccard_overlap:
continue
if sampler.max_jaccard_overlap != 0 and \
overlap > sampler.max_jaccard_overlap:
continue
found = True
if has_object_coverage:
object_coverage = bbox_coverage(object_bbox, sample_bbox)
if sampler.min_object_coverage != 0 and \
object_coverage < sampler.min_object_coverage:
continue
if sampler.max_object_coverage != 0 and \
object_coverage > sampler.max_object_coverage:
continue
found = True
if found:
return True
return found
def crop_image_sampling(img, bbox_labels, sample_bbox, image_width,
image_height, resize_width, resize_height,
min_face_size):
# no clipping here
xmin = int(sample_bbox.xmin * image_width)
xmax = int(sample_bbox.xmax * image_width)
ymin = int(sample_bbox.ymin * image_height)
ymax = int(sample_bbox.ymax * image_height)
w_off = xmin
h_off = ymin
width = xmax - xmin
height = ymax - ymin
cross_xmin = max(0.0, float(w_off))
cross_ymin = max(0.0, float(h_off))
cross_xmax = min(float(w_off + width - 1.0), float(image_width))
cross_ymax = min(float(h_off + height - 1.0), float(image_height))
cross_width = cross_xmax - cross_xmin
cross_height = cross_ymax - cross_ymin
roi_xmin = 0 if w_off >= 0 else abs(w_off)
roi_ymin = 0 if h_off >= 0 else abs(h_off)
roi_width = cross_width
roi_height = cross_height
roi_y1 = int(roi_ymin)
roi_y2 = int(roi_ymin + roi_height)
roi_x1 = int(roi_xmin)
roi_x2 = int(roi_xmin + roi_width)
cross_y1 = int(cross_ymin)
cross_y2 = int(cross_ymin + cross_height)
cross_x1 = int(cross_xmin)
cross_x2 = int(cross_xmin + cross_width)
sample_img = np.zeros((height, width, 3))
# print(sample_img.shape)
sample_img[roi_y1: roi_y2, roi_x1: roi_x2] = \
img[cross_y1: cross_y2, cross_x1: cross_x2]
sample_img = cv2.resize(
sample_img, (resize_width, resize_height), interpolation=cv2.INTER_AREA)
resize_val = resize_width
sample_labels = transform_labels_sampling(bbox_labels, sample_bbox,
resize_val, min_face_size)
return sample_img, sample_labels
def transform_labels_sampling(bbox_labels, sample_bbox, resize_val,
min_face_size):
sample_labels = []
for i in range(len(bbox_labels)):
sample_label = []
object_bbox = Bbox(bbox_labels[i][1], bbox_labels[i][2],
bbox_labels[i][3], bbox_labels[i][4])
if not meet_emit_constraint(object_bbox, sample_bbox):
continue
proj_bbox = project_bbox(object_bbox, sample_bbox)
if proj_bbox:
real_width = float((proj_bbox.xmax - proj_bbox.xmin) * resize_val)
real_height = float((proj_bbox.ymax - proj_bbox.ymin) * resize_val)
if real_width * real_height < float(min_face_size * min_face_size):
continue
else:
sample_label.append(bbox_labels[i][0])
sample_label.append(float(proj_bbox.xmin))
sample_label.append(float(proj_bbox.ymin))
sample_label.append(float(proj_bbox.xmax))
sample_label.append(float(proj_bbox.ymax))
sample_label = sample_label + bbox_labels[i][5:]
sample_labels.append(sample_label)
return sample_labels
def generate_sample(sampler, image_width, image_height):
scale = np.random.uniform(sampler.min_scale, sampler.max_scale)
aspect_ratio = np.random.uniform(sampler.min_aspect_ratio,
sampler.max_aspect_ratio)
aspect_ratio = max(aspect_ratio, (scale**2.0))
aspect_ratio = min(aspect_ratio, 1 / (scale**2.0))
bbox_width = scale * (aspect_ratio**0.5)
bbox_height = scale / (aspect_ratio**0.5)
# guarantee a squared image patch after cropping
if sampler.use_square:
if image_height < image_width:
bbox_width = bbox_height * image_height / image_width
else:
bbox_height = bbox_width * image_width / image_height
xmin_bound = 1 - bbox_width
ymin_bound = 1 - bbox_height
xmin = np.random.uniform(0, xmin_bound)
ymin = np.random.uniform(0, ymin_bound)
xmax = xmin + bbox_width
ymax = ymin + bbox_height
sampled_bbox = Bbox(xmin, ymin, xmax, ymax)
return sampled_bbox
def generate_batch_samples(batch_sampler, bbox_labels, image_width,
image_height):
sampled_bbox = []
for sampler in batch_sampler:
found = 0
for _ in range(sampler.max_trial):
if found >= sampler.max_sample:
break
sample_bbox = generate_sample(sampler, image_width, image_height)
if satisfy_sample_constraint(sampler, sample_bbox, bbox_labels):
sampled_bbox.append(sample_bbox)
found = found + 1
return sampled_bbox
def crop_image(img, bbox_labels, sample_bbox, image_width, image_height,
resize_width, resize_height, min_face_size):
sample_bbox = clip_bbox(sample_bbox)
xmin = int(sample_bbox.xmin * image_width)
xmax = int(sample_bbox.xmax * image_width)
ymin = int(sample_bbox.ymin * image_height)
ymax = int(sample_bbox.ymax * image_height)
sample_img = img[ymin:ymax, xmin:xmax]
resize_val = resize_width
sample_labels = transform_labels_sampling(bbox_labels, sample_bbox,
resize_val, min_face_size)
return sample_img, sample_labels
def to_chw_bgr(image):
"""
Transpose image from HWC to CHW and from RBG to BGR.
Args:
image (np.array): an image with HWC and RBG layout.
"""
# HWC to CHW
if len(image.shape) == 3:
image = np.swapaxes(image, 1, 2)
image = np.swapaxes(image, 1, 0)
# RBG to BGR
image = image[[2, 1, 0], :, :]
return image
def anchor_crop_image_sampling(img,
bbox_labels,
scale_array,
img_width,
img_height):
mean = np.array([104, 117, 123], dtype=np.float32)
maxSize = 12000 # max size
infDistance = 9999999
bbox_labels = np.array(bbox_labels)
scale = np.array([img_width, img_height, img_width, img_height])
boxes = bbox_labels[:, 1:5] * scale
labels = bbox_labels[:, 0]
boxArea = (boxes[:, 2] - boxes[:, 0] + 1) * (boxes[:, 3] - boxes[:, 1] + 1)
rand_idx = np.random.randint(len(boxArea))
rand_Side = boxArea[rand_idx] ** 0.5
distance = infDistance
anchor_idx = 5
for i, anchor in enumerate(scale_array):
if abs(anchor - rand_Side) < distance:
distance = abs(anchor - rand_Side)
anchor_idx = i
target_anchor = random.choice(scale_array[0:min(anchor_idx + 1, 5) + 1])
ratio = float(target_anchor) / rand_Side
ratio = ratio * (2**random.uniform(-1, 1))
if int(img_height * ratio * img_width * ratio) > maxSize * maxSize:
ratio = (maxSize * maxSize / (img_height * img_width))**0.5
interp_methods = [cv2.INTER_LINEAR, cv2.INTER_CUBIC,
cv2.INTER_AREA, cv2.INTER_NEAREST, cv2.INTER_LANCZOS4]
interp_method = random.choice(interp_methods)
image = cv2.resize(img, None, None, fx=ratio,
fy=ratio, interpolation=interp_method)
boxes[:, 0] *= ratio
boxes[:, 1] *= ratio
boxes[:, 2] *= ratio
boxes[:, 3] *= ratio
height, width, _ = image.shape
sample_boxes = []
xmin = boxes[rand_idx, 0]
ymin = boxes[rand_idx, 1]
bw = (boxes[rand_idx, 2] - boxes[rand_idx, 0] + 1)
bh = (boxes[rand_idx, 3] - boxes[rand_idx, 1] + 1)
w = h = 640
for _ in range(50):
if w < max(height, width):
if bw <= w:
w_off = random.uniform(xmin + bw - w, xmin)
else:
w_off = random.uniform(xmin, xmin + bw - w)
if bh <= h:
h_off = random.uniform(ymin + bh - h, ymin)
else:
h_off = random.uniform(ymin, ymin + bh - h)
else:
w_off = random.uniform(width - w, 0)
h_off = random.uniform(height - h, 0)
w_off = math.floor(w_off)
h_off = math.floor(h_off)
# convert to integer rect x1,y1,x2,y2
rect = np.array(
[int(w_off), int(h_off), int(w_off + w), int(h_off + h)])
# keep overlap with gt box IF center in sampled patch
centers = (boxes[:, :2] + boxes[:, 2:]) / 2.0
# mask in all gt boxes that above and to the left of centers
m1 = (rect[0] <= boxes[:, 0]) * (rect[1] <= boxes[:, 1])
# mask in all gt boxes that under and to the right of centers
m2 = (rect[2] >= boxes[:, 2]) * (rect[3] >= boxes[:, 3])
# mask in that both m1 and m2 are true
mask = m1 * m2
overlap = jaccard_numpy(boxes, rect)
# have any valid boxes? try again if not
if not mask.any() and not overlap.max() > 0.7:
continue
else:
sample_boxes.append(rect)
sampled_labels = []
if sample_boxes:
choice_idx = np.random.randint(len(sample_boxes))
choice_box = sample_boxes[choice_idx]
centers = (boxes[:, :2] + boxes[:, 2:]) / 2.0
m1 = (choice_box[0] < centers[:, 0]) * \
(choice_box[1] < centers[:, 1])
m2 = (choice_box[2] > centers[:, 0]) * \
(choice_box[3] > centers[:, 1])
mask = m1 * m2
current_boxes = boxes[mask, :].copy()
current_labels = labels[mask]
current_boxes[:, :2] -= choice_box[:2]
current_boxes[:, 2:] -= choice_box[:2]
if choice_box[0] < 0 or choice_box[1] < 0:
new_img_width = width if choice_box[
0] >= 0 else width - choice_box[0]
new_img_height = height if choice_box[
1] >= 0 else height - choice_box[1]
image_pad = np.zeros(
(new_img_height, new_img_width, 3), dtype=float)
image_pad[:, :, :] = mean
start_left = 0 if choice_box[0] >= 0 else -choice_box[0]
start_top = 0 if choice_box[1] >= 0 else -choice_box[1]
image_pad[start_top:, start_left:, :] = image
choice_box_w = choice_box[2] - choice_box[0]
choice_box_h = choice_box[3] - choice_box[1]
start_left = choice_box[0] if choice_box[0] >= 0 else 0
start_top = choice_box[1] if choice_box[1] >= 0 else 0
end_right = start_left + choice_box_w
end_bottom = start_top + choice_box_h
current_image = image_pad[
start_top:end_bottom, start_left:end_right, :].copy()
image_height, image_width, _ = current_image.shape
if cfg.filter_min_face:
bbox_w = current_boxes[:, 2] - current_boxes[:, 0]
bbox_h = current_boxes[:, 3] - current_boxes[:, 1]
bbox_area_ = bbox_w * bbox_h
mask = bbox_area_ > (cfg.min_face_size * cfg.min_face_size)
current_boxes = current_boxes[mask]
current_labels = current_labels[mask]
for i in range(len(current_boxes)):
sample_label = []
sample_label.append(current_labels[i])
sample_label.append(current_boxes[i][0] / image_width)
sample_label.append(current_boxes[i][1] / image_height)
sample_label.append(current_boxes[i][2] / image_width)
sample_label.append(current_boxes[i][3] / image_height)
sampled_labels += [sample_label]
sampled_labels = np.array(sampled_labels)
else:
current_boxes /= np.array([image_width,
image_height, image_width, image_height])
sampled_labels = np.hstack(
(current_labels[:, np.newaxis], current_boxes))
return current_image, sampled_labels
current_image = image[choice_box[1]:choice_box[
3], choice_box[0]:choice_box[2], :].copy()
image_height, image_width, _ = current_image.shape
if cfg.filter_min_face:
bbox_w = current_boxes[:, 2] - current_boxes[:, 0]
bbox_h = current_boxes[:, 3] - current_boxes[:, 1]
bbox_area_ = bbox_w * bbox_h
mask = bbox_area_ > (cfg.min_face_size * cfg.min_face_size)
current_boxes = current_boxes[mask]
current_labels = current_labels[mask]
for i in range(len(current_boxes)):
sample_label = []
sample_label.append(current_labels[i])
sample_label.append(current_boxes[i][0] / image_width)
sample_label.append(current_boxes[i][1] / image_height)
sample_label.append(current_boxes[i][2] / image_width)
sample_label.append(current_boxes[i][3] / image_height)
sampled_labels += [sample_label]
sampled_labels = np.array(sampled_labels)
else:
current_boxes /= np.array([image_width,
image_height, image_width, image_height])
sampled_labels = np.hstack(
(current_labels[:, np.newaxis], current_boxes))
return current_image, sampled_labels
image_height, image_width, _ = image.shape
if cfg.filter_min_face:
bbox_w = boxes[:, 2] - boxes[:, 0]
bbox_h = boxes[:, 3] - boxes[:, 1]
bbox_area_ = bbox_w * bbox_h
mask = bbox_area_ > (cfg.min_face_size * cfg.min_face_size)
boxes = boxes[mask]
labels = labels[mask]
for i in range(len(boxes)):
sample_label = []
sample_label.append(labels[i])
sample_label.append(boxes[i][0] / image_width)
sample_label.append(boxes[i][1] / image_height)
sample_label.append(boxes[i][2] / image_width)
sample_label.append(boxes[i][3] / image_height)
sampled_labels += [sample_label]
sampled_labels = np.array(sampled_labels)
else:
boxes /= np.array([image_width, image_height,
image_width, image_height])
sampled_labels = np.hstack(
(labels[:, np.newaxis], boxes))
return image, sampled_labels
def preprocess(img, bbox_labels, mode):
img_width, img_height = img.size
sampled_labels = bbox_labels
if mode == 'train':
if cfg.apply_distort:
img = distort_image(img)
if cfg.apply_expand:
img, bbox_labels, img_width, img_height = expand_image(
img, bbox_labels, img_width, img_height)
batch_sampler = []
prob = np.random.uniform(0., 1.)
if prob > cfg.data_anchor_sampling_prob and cfg.anchor_sampling:
scale_array = np.array([16, 32, 64, 128, 256, 512])
img = np.array(img)
img, sampled_labels = anchor_crop_image_sampling(
img, bbox_labels, scale_array, img_width, img_height)
img = img.astype('uint8')
img = Image.fromarray(img)
else:
batch_sampler.append(Sampler(1, 50, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0,
0.0, True))
batch_sampler.append(Sampler(1, 50, 0.3, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0,
0.0, True))
batch_sampler.append(Sampler(1, 50, 0.3, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0,
0.0, True))
batch_sampler.append(Sampler(1, 50, 0.3, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0,
0.0, True))
batch_sampler.append(Sampler(1, 50, 0.3, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0,
0.0, True))
sampled_bbox = generate_batch_samples(
batch_sampler, bbox_labels, img_width, img_height)
img = np.array(img)
if sampled_bbox:
idx = int(np.random.uniform(0, len(sampled_bbox)))
img, sampled_labels = crop_image(
img, bbox_labels, sampled_bbox[idx], img_width, img_height,
cfg.resize_width, cfg.resize_height, cfg.min_face_size)
img = Image.fromarray(img)
interp_mode = [
Image.BILINEAR, Image.HAMMING, Image.NEAREST, Image.BICUBIC,
Image.LANCZOS
]
interp_indx = np.random.randint(0, 5)
img = img.resize((cfg.resize_width, cfg.resize_height),
resample=interp_mode[interp_indx])
img = np.array(img)
if mode == 'train':
mirror = int(np.random.uniform(0, 2))
if mirror == 1:
img = img[:, ::-1, :]
for i in six.moves.xrange(len(sampled_labels)):
tmp = sampled_labels[i][1]
sampled_labels[i][1] = 1 - sampled_labels[i][3]
sampled_labels[i][3] = 1 - tmp
img = to_chw_bgr(img)
img = img.astype('float32')
img -= cfg.img_mean
img = img[[2, 1, 0], :, :] # to RGB
return img, sampled_labels
# Copyright 2022 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
# This file was copied from project [ZhaoWeicheng][Pyramidbox.pytorch]
import numpy as np
def point_form(boxes):
""" Convert prior_boxes to (xmin, ymin, xmax, ymax)
representation for comparison to point form ground truth data.
Args:
boxes: center-size default boxes from priorbox layers.
Return:
boxes: Converted xmin, ymin, xmax, ymax form of boxes.
"""
return np.concatenate((boxes[:, :2] - boxes[:, 2:] / 2,
boxes[:, :2] + boxes[:, 2:] / 2), 1)
def center_size(boxes):
""" Convert prior_boxes to (cx, cy, w, h)
representation for comparison to center-size form ground truth data.
Args:
boxes: point_form boxes
Return:
boxes: Converted xmin, ymin, xmax, ymax form of boxes.
"""
return np.concatenate([(boxes[:, 2:] + boxes[:, :2]) / 2,
boxes[:, 2:] - boxes[:, :2]], 1)
def intersect(box_a, box_b):
""" We resize both tensors to [A,B,2] without new malloc:
[A,2] -> [A,1,2] -> [A,B,2]
[B,2] -> [1,B,2] -> [A,B,2]
Then we compute the area of intersect between box_a and box_b.
Args:
box_a: bounding boxes, Shape: [A,4].
box_b: bounding boxes, Shape: [B,4].
Return:
intersection area, Shape: [A,B].
"""
A = box_a.shape[0]
B = box_b.shape[0]
max_xy = np.minimum(np.broadcast_to(np.expand_dims(box_a[:, 2:], 1), (A, B, 2)),
np.broadcast_to(np.expand_dims(box_b[:, 2:], 0), (A, B, 2)))
min_xy = np.maximum(np.broadcast_to(np.expand_dims(box_a[:, :2], 1), (A, B, 2)),
np.broadcast_to(np.expand_dims(box_b[:, :2], 0), (A, B, 2)))
inter = np.clip((max_xy - min_xy), 0, np.inf)
return inter[:, :, 0] * inter[:, :, 1]
def jaccard(box_a, box_b):
"""Compute the jaccard overlap of two sets of boxes. The jaccard overlap
is simply the intersection over union of two boxes. Here we operate on
ground truth boxes and default boxes.
E.g.:
A ∩ B / A ∪ B = A ∩ B / (area(A) + area(B) - A ∩ B)
Args:
box_a: Ground truth bounding boxes, Shape: [num_objects,4]
box_b: Prior boxes from priorbox layers, Shape: [num_priors,4]
Return:
jaccard overlap: Shape: [box_a.size(0), box_b.size(0)]
"""
inter = intersect(box_a, box_b)
area_a = ((box_a[:, 2] - box_a[:, 0]) *
(box_a[:, 3] - box_a[:, 1]))
area_a = np.expand_dims(area_a, 1)
area_a = np.broadcast_to(area_a, inter.shape)
area_b = ((box_b[:, 2] - box_b[:, 0]) *
(box_b[:, 3] - box_b[:, 1]))
area_b = np.expand_dims(area_b, 0)
area_b = np.broadcast_to(area_b, inter.shape)
union = area_a + area_b - inter
return inter / union
def match(threshold, truths, priors, variances, labels, loc_t, conf_t, idx):
"""Match each prior box with the ground truth box of the highest jaccard
overlap, encode the bounding boxes, then return the matched indices
corresponding to both confidence and location preds.
Args:
threshold: (float) The overlap threshold used when matching boxes.
truths: Ground truth boxes, Shape: [num_obj, num_priors].
priors: Prior boxes from priorbox layers, Shape: [n_priors,4].
variances: Variances corresponding to each prior coord,
Shape: [num_priors, 4].
labels: All the class labels for the image, Shape: [num_obj].
loc_t: Tensor to be filled w/ encoded location targets.
conf_t: Tensor to be filled w/ matched indices for conf preds.
idx: (int) current batch index
Return:
The matched indices corresponding to 1)location and 2)confidence preds.
"""
overlaps = jaccard(truths, point_form(priors))
# best_prior_overlap, best_prior_idx = overlaps.max(1, keepdim=True)
best_prior_overlap = np.max(overlaps, 1, keepdims=True)
best_prior_idx = np.argmax(overlaps, 1)
best_truth_overlap = np.max(overlaps, 0, keepdims=True)
best_truth_idx = np.argmax(overlaps, 0)
best_truth_idx = np.squeeze(best_truth_idx, 0)
best_truth_overlap = np.squeeze(best_truth_overlap, 0)
best_prior_idx = np.squeeze(best_prior_idx, 1)
best_prior_overlap = np.squeeze(best_prior_overlap, 1)
for i in best_prior_idx:
best_truth_overlap[i, :] = 2
for j in range(best_prior_idx.shape[0]):
best_truth_idx[best_prior_idx[j]] = j
_th1, _th2, _th3 = threshold
N = (np.sum(best_prior_overlap >= _th2) +
np.sum(best_prior_overlap >= _th3)) // 2
matches = truths[best_truth_idx]
conf = labels[best_truth_idx]
conf[best_truth_overlap < _th2] = 0
best_truth_overlap_clone = best_truth_overlap.copy()
idx_1 = np.greater(best_truth_overlap_clone, _th1)
idx_2 = np.less(best_truth_overlap_clone, _th2)
add_idx = np.equal(idx_1, idx_2)
best_truth_overlap_clone[1 - add_idx] = 0
stage2_overlap = np.sort(best_truth_overlap_clone)[:, ::-1]
stage2_idx = np.argsort(best_truth_overlap_clone)[:, ::-1]
stage2_overlap = np.greater(stage2_overlap, _th1)
if N > 0:
N = np.sum(stage2_overlap[:N]) if np.sum(stage2_overlap[:N]) < N else N
conf[stage2_idx[:N]] += 1
loc = encode(matches, priors, variances)
loc_t[idx] = loc
conf_t[idx] = conf
def match_ssd(threshold, truths, priors, variances, labels):
"""Match each prior box with the ground truth box of the highest jaccard
overlap, encode the bounding boxes, then return the matched indices
corresponding to both confidence and location preds.
Args:
threshold: (float) The overlap threshold used when matching boxes.
truths: (tensor) Ground truth boxes, Shape: [num_obj, num_priors].
priors: (tensor) Prior boxes from priorbox layers, Shape: [n_priors,4].
variances: (tensor) Variances corresponding to each prior coord,
Shape: [num_priors, 4].
labels: (tensor) All the class labels for the image, Shape: [num_obj].
loc_t: (tensor) Tensor to be filled w/ encoded location targets.
conf_t: (tensor) Tensor to be filled w/ matched indices for conf preds.
idx: (int) current batch index
Return:
The matched indices corresponding to 1)location and 2)confidence preds.
"""
overlaps = jaccard(truths, point_form(priors))
# best_prior_overlap, best_prior_idx = overlaps.max(1, keepdim=True)
best_prior_overlap = np.max(overlaps, 1, keepdims=True)
best_prior_idx = np.argmax(overlaps, 1)
best_truth_overlap = np.max(overlaps, 0, keepdims=True)
best_truth_idx = np.argmax(overlaps, 0)
best_truth_overlap = np.squeeze(best_truth_overlap, 0)
best_prior_overlap = np.squeeze(best_prior_overlap, 1)
for i in best_prior_idx:
best_truth_overlap[i] = 2
for j in range(best_prior_idx.shape[0]):
best_truth_idx[best_prior_idx[j]] = j
matches = truths[best_truth_idx]
conf = labels[best_truth_idx]
conf[best_truth_overlap < threshold] = 0
loc = encode(matches, priors, variances)
return loc, conf
def encode(matched, priors, variances):
"""Encode the variances from the priorbox layers into the ground truth boxes
we have matched (based on jaccard overlap) with the prior boxes.
Args:
matched: Coords of ground truth for each prior in point-form
Shape: [num_priors, 4].
priors: Prior boxes in center-offset form
Shape: [num_priors,4].
variances: (list[float]) Variances of priorboxes
Return:
encoded boxes (tensor), Shape: [num_priors, 4]
"""
# dist b/t match center and prior's center
g_cxcy = (matched[:, :2] + matched[:, 2:]) / 2 - priors[:, :2]
# encode variance
g_cxcy /= (variances[0] * priors[:, 2:])
# match wh / prior wh
g_wh = (matched[:, 2:] - matched[:, :2]) / priors[:, 2:]
g_wh = np.log(g_wh) / variances[1]
# return target for smooth_l1_loss
return np.concatenate([g_cxcy, g_wh], 1)
def decode(loc, priors, variances):
"""Decode locations from predictions using priors to undo
the encoding we did for offset regression at train time.
Args:
loc: location predictions for loc layers,
Shape: [num_priors,4]
priors: Prior boxes in center-offset form.
Shape: [num_priors,4].
variances: (list[float]) Variances of priorboxes
Return:
decoded bounding box predictions
"""
if priors.shape[0] == 1:
priors = priors[0, :, :]
boxes = np.concatenate((priors[:, :2] + loc[:, :2] * variances[0] * priors[:, 2:],
priors[:, 2:] * np.exp(loc[:, 2:] * variances[1])), 1)
boxes[:, :2] -= boxes[:, 2:] / 2
boxes[:, 2:] += boxes[:, :2]
return boxes
def log_sum_exp(x):
"""Utility function for computing log_sum_exp while determining
This will be used to determine unaveraged confidence loss across
all examples in a batch.
Args:
x (Variable(tensor)): conf_preds from conf layers
"""
x_max = x.max()
return np.log(np.sum(np.exp(x - x_max), 1, keepdim=True)) + x_max
def nms(boxes, scores, overlap=0.5, top_k=200):
"""Apply non-maximum suppression at test time to avoid detecting too many
overlapping bounding boxes for a given object.
Args:
boxes: The location preds for the img, Shape: [num_priors,4].
scores: The class predscores for the img, Shape:[num_priors].
overlap: The overlap thresh for suppressing unnecessary boxes.
top_k: The Maximum number of box preds to consider.
Return:
The indices of the kept boxes with respect to num_priors.
"""
keep = np.zeros_like(scores).astype(np.int32)
if boxes.size == 0:
return keep, 0
x1 = boxes[:, 0]
y1 = boxes[:, 1]
x2 = boxes[:, 2]
y2 = boxes[:, 3]
area = np.multiply(x2 - x1, y2 - y1)
idx = np.argsort(scores, axis=0)
idx = idx[-top_k:]
count = 0
while idx.size > 0:
i = idx[-1]
keep[count] = i
count += 1
if idx.shape[0] == 1:
break
idx = idx[:-1]
xx1 = x1[idx]
yy1 = y1[idx]
xx2 = x2[idx]
yy2 = y2[idx]
xx1 = np.clip(xx1, x1[i], np.inf)
yy1 = np.clip(yy1, y1[i], np.inf)
xx2 = np.clip(xx2, -np.inf, x2[i])
yy2 = np.clip(yy2, -np.inf, y2[i])
w = xx2 - xx1
h = yy2 - yy1
w = np.clip(w, 0, np.inf)
h = np.clip(h, 0, np.inf)
inter = w * h
rem_areas = area[idx]
union = (rem_areas - inter) + area[i]
IoU = inter / union
idx = idx[np.less(IoU, overlap)]
return keep, count
# Copyright 2022 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
# This file was copied from project [ZhaoWeicheng][Pyramidbox.pytorch]
import os
from easydict import EasyDict
import numpy as np
_C = EasyDict()
cfg = _C
# data argument config
_C.expand_prob = 0.5
_C.expand_max_ratio = 4
_C.hue_prob = 0.5
_C.hue_delta = 18
_C.contrast_prob = 0.5
_C.contrast_delta = 0.5
_C.saturation_prob = 0.5
_C.saturation_delta = 0.5
_C.brightness_prob = 0.5
_C.brightness_delta = 0.125
_C.data_anchor_sampling_prob = 0.5
_C.min_face_size = 6.0
_C.apply_distort = True
_C.apply_expand = True
_C.img_mean = np.array([104., 117., 123.])[:, np.newaxis, np.newaxis].astype('float32')
_C.resize_width = 640
_C.resize_height = 640
_C.scale = 1 / 127.0
_C.anchor_sampling = True
_C.filter_min_face = True
# train config
_C.LR_STEPS = [80000, 100000, 120000]
_C.DIS_LR_STEPS = [30000, 35000, 40000]
# anchor config
_C.FEATURE_MAPS = [[160, 160], [80, 80], [40, 40], [20, 20], [10, 10], [5, 5]]
_C.INPUT_SIZE = (640, 640)
_C.STEPS = [4, 8, 16, 32, 64, 128]
_C.ANCHOR_SIZES = [16, 32, 64, 128, 256, 512]
_C.CLIP = False
_C.VARIANCE = [0.1, 0.2]
# loss config
_C.NUM_CLASSES = 2
_C.OVERLAP_THRESH = 0.35
_C.NEG_POS_RATIOS = 3
# detection config
_C.NMS_THRESH = 0.3
_C.TOP_K = 5000
_C.KEEP_TOP_K = 750
_C.CONF_THRESH = 0.05
# dataset config
_C.HOME = '/data2/James/dataset/pyramidbox_dataset/'
# face config
_C.FACE = EasyDict()
_C.FACE.FILE_DIR = os.path.dirname(os.path.realpath(__file__)) + '/../data'
_C.FACE.TRAIN_FILE = os.path.join(_C.FACE.FILE_DIR, 'face_train.txt')
_C.FACE.VAL_FILE = os.path.join(_C.FACE.FILE_DIR, 'face_val.txt')
_C.FACE.FDDB_DIR = os.path.join(_C.HOME, 'FDDB')
_C.FACE.WIDER_DIR = os.path.join(_C.HOME, 'WIDERFACE')
_C.FACE.OVERLAP_THRESH = 0.35
# Copyright 2022 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
import random
from PIL import Image
import numpy as np
from mindspore import dataset as ds
from src.augmentations import preprocess
from src.prior_box import PriorBox
from src.bbox_utils import match_ssd
from src.config import cfg
class WIDERDataset:
"""docstring for WIDERDetection"""
def __init__(self, list_file, mode='train'):
super(WIDERDataset, self).__init__()
self.mode = mode
self.fnames = []
self.boxes = []
self.labels = []
prior_box = PriorBox(cfg)
self.default_priors = prior_box.forward()
self.num_priors = self.default_priors.shape[0]
self.match = match_ssd
self.threshold = cfg.FACE.OVERLAP_THRESH
self.variance = cfg.VARIANCE
with open(list_file) as f:
lines = f.readlines()
for line in lines:
line = line.strip().split()
num_faces = int(line[1])
box = []
label = []
for i in range(num_faces):
x = float(line[2 + 5 * i])
y = float(line[3 + 5 * i])
w = float(line[4 + 5 * i])
h = float(line[5 + 5 * i])
c = int(line[6 + 5 * i])
if w <= 0 or h <= 0:
continue
box.append([x, y, x + w, y + h])
label.append(c)
if box:
self.fnames.append(line[0])
self.boxes.append(box)
self.labels.append(label)
self.num_samples = len(self.boxes)
def __len__(self):
return self.num_samples
def __getitem__(self, index):
img, face_loc, face_conf, head_loc, head_conf = self.pull_item(index)
return img, face_loc, face_conf, head_loc, head_conf
def pull_item(self, index):
while True:
image_path = self.fnames[index]
img = Image.open(image_path)
if img.mode == 'L':
img = img.convert('RGB')
im_width, im_height = img.size
boxes = self.annotransform(np.array(self.boxes[index]), im_width, im_height)
label = np.array(self.labels[index])
bbox_labels = np.hstack((label[:, np.newaxis], boxes)).tolist()
img, sample_labels = preprocess(img, bbox_labels, self.mode)
sample_labels = np.array(sample_labels)
if sample_labels.size > 0:
face_target = np.hstack(
(sample_labels[:, 1:], sample_labels[:, 0][:, np.newaxis]))
assert (face_target[:, 2] > face_target[:, 0]).any()
assert (face_target[:, 3] > face_target[:, 1]).any()
face_box = face_target[:, :-1]
head_box = self.expand_bboxes(face_box)
head_target = np.hstack((head_box, face_target[
:, -1][:, np.newaxis]))
break
else:
index = random.randrange(0, self.num_samples)
face_truth = face_target[:, :-1]
face_label = face_target[:, -1]
face_loc_t, face_conf_t = self.match(self.threshold, face_truth, self.default_priors,
self.variance, face_label)
head_truth = head_target[:, :-1]
head_label = head_target[:, -1]
head_loc_t, head_conf_t = self.match(self.threshold, head_truth, self.default_priors,
self.variance, head_label)
return img, face_loc_t, face_conf_t, head_loc_t, head_conf_t
def annotransform(self, boxes, im_width, im_height):
boxes[:, 0] /= im_width
boxes[:, 1] /= im_height
boxes[:, 2] /= im_width
boxes[:, 3] /= im_height
return boxes
def expand_bboxes(self,
bboxes,
expand_left=2.,
expand_up=2.,
expand_right=2.,
expand_down=2.):
expand_bboxes = []
for bbox in bboxes:
xmin = bbox[0]
ymin = bbox[1]
xmax = bbox[2]
ymax = bbox[3]
w = xmax - xmin
h = ymax - ymin
ex_xmin = max(xmin - w / expand_left, 0.)
ex_ymin = max(ymin - h / expand_up, 0.)
ex_xmax = max(xmax + w / expand_right, 0.)
ex_ymax = max(ymax + h / expand_down, 0.)
expand_bboxes.append([ex_xmin, ex_ymin, ex_xmax, ex_ymax])
expand_bboxes = np.array(expand_bboxes)
return expand_bboxes
def create_val_dataset(mindrecord_file, batch_size, device_num=1, device_id=0, num_workers=8):
"""
Create user-defined mindspore dataset for training
"""
column_names = ['img', 'face_loc', 'face_conf', 'head_loc', 'head_conf']
ds.config.set_num_parallel_workers(num_workers)
ds.config.set_enable_shared_mem(False)
ds.config.set_prefetch_size(batch_size * 2)
train_dataset = ds.MindDataset(mindrecord_file, columns_list=column_names, shuffle=True,
shard_id=device_id, num_shards=device_num)
train_dataset = train_dataset.batch(batch_size=batch_size, drop_remainder=True)
return train_dataset
def create_train_dataset(cfg_, batch_size, device_num=1, device_id=0, num_workers=8):
"""
Create user-defined mindspore dataset for training
"""
column_names = ['img', 'face_loc', 'face_conf', 'head_loc', 'head_conf']
ds.config.set_num_parallel_workers(num_workers)
ds.config.set_enable_shared_mem(False)
ds.config.set_prefetch_size(batch_size * 2)
train_dataset = ds.GeneratorDataset(WIDERDataset(cfg_.FACE.TRAIN_FILE, mode='train'),
column_names=column_names, shuffle=True, num_shards=device_num,
shard_id=device_id)
train_dataset = train_dataset.batch(batch_size=batch_size)
return train_dataset
# Copyright 2022 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
import numpy as np
from mindspore import Tensor
from src.bbox_utils import decode, nms
class Detect:
"""At test time, Detect is the final layer of SSD. Decode location preds,
apply non-maximum suppression to location predictions based on conf
scores and threshold to a top_k number of output predictions for both
confidence score and locations.
"""
def __init__(self, cfg):
self.num_classes = cfg.NUM_CLASSES
self.top_k = cfg.TOP_K
self.nms_thresh = cfg.NMS_THRESH
self.conf_thresh = cfg.CONF_THRESH
self.variance = cfg.VARIANCE
def detect(self, loc_data, conf_data, prior_data):
"""
Args:
loc_data: (Tensor) Loc preds from loc layers
Shape: [batch, num_priors*4]
conf_data: (Tensor) Shape: Conf preds from conf layers
Shape: [batch*num_priors, num_classes]
prior_data: Prior boxes and variances from priorbox layers
Shape: [1,num_priors,4]
"""
if isinstance(loc_data, Tensor):
loc_data = loc_data.asnumpy()
if isinstance(conf_data, Tensor):
conf_data = conf_data.asnumpy()
num = loc_data.shape[0]
num_priors = prior_data.shape[0]
conf_preds = np.transpose(conf_data.reshape((num, num_priors, self.num_classes)), (0, 2, 1))
batch_priors = prior_data.reshape((-1, num_priors, 4))
batch_priors = np.broadcast_to(batch_priors, (num, num_priors, 4))
decoded_boxes = decode(loc_data.reshape((-1, 4)), batch_priors, self.variance).reshape((num, num_priors, 4))
output = np.zeros((num, self.num_classes, self.top_k, 5))
for i in range(num):
boxes = decoded_boxes[i].copy()
conf_scores = conf_preds[i].copy()
for cl in range(1, self.num_classes):
c_mask = np.greater(conf_scores[cl], self.conf_thresh)
scores = conf_scores[cl][c_mask]
if scores.ndim == 0:
continue
l_mask = np.expand_dims(c_mask, 1)
l_mask = np.broadcast_to(l_mask, boxes.shape)
boxes_ = boxes[l_mask].reshape((-1, 4))
ids, count = nms(boxes_, scores, self.nms_thresh, self.top_k)
output[i, cl, :count] = np.concatenate((np.expand_dims(scores[ids[:count]], 1),
boxes_[ids[:count]]), 1)
return output
# Copyright 2022 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
# This file was copied from project [RuisongZhou][FDDB_Evaluation]
import os
import argparse
import tqdm
import numpy as np
import cv2
def bbox_overlaps(boxes, query_boxes):
"""
Parameters
----------
boxes: (N, 4) ndarray of float
query_boxes: (K, 4) ndarray of float
Returns
-------
overlaps: (N, K) ndarray of overlap between boxes and query_boxes
"""
N = boxes.shape[0]
K = query_boxes.shape[0]
overlaps = np.zeros((N, K), dtype=np.float32)
for k in range(K):
box_area = (
(query_boxes[k, 2] - query_boxes[k, 0] + 1) *
(query_boxes[k, 3] - query_boxes[k, 1] + 1)
)
for n in range(N):
iw = (
min(boxes[n, 2], query_boxes[k, 2]) -
max(boxes[n, 0], query_boxes[k, 0]) + 1
)
if iw > 0:
ih = (
min(boxes[n, 3], query_boxes[k, 3]) -
max(boxes[n, 1], query_boxes[k, 1]) + 1
)
if ih > 0:
ua = float(
(boxes[n, 2] - boxes[n, 0] + 1) *
(boxes[n, 3] - boxes[n, 1] + 1) +
box_area - iw * ih
)
overlaps[n, k] = iw * ih / ua
return overlaps
def get_gt_boxes(gt_dir):
gt_dict = {}
for i in range(1, 11):
filename = os.path.join(gt_dir, 'FDDB-fold-{}-ellipseList.txt'.format('%02d' % i))
assert os.path.exists(filename)
gt_sub_dict = {}
annotationfile = open(filename)
while True:
filename = annotationfile.readline()[:-1].replace('/', '_')
if not filename:
break
line = annotationfile.readline()
if not line:
break
facenum = int(line)
face_loc = []
for _ in range(facenum):
line = annotationfile.readline().strip().split()
major_axis_radius = float(line[0])
minor_axis_radius = float(line[1])
angle = float(line[2])
center_x = float(line[3])
center_y = float(line[4])
_ = float(line[5])
angle = angle / 3.1415926 * 180
mask = np.zeros((1000, 1000), dtype=np.uint8)
cv2.ellipse(mask, ((int)(center_x), (int)(center_y)),
((int)(major_axis_radius), (int)(minor_axis_radius)), angle, 0., 360., (255, 255, 255))
contours, _ = cv2.findContours(mask, cv2.RETR_LIST, cv2.CHAIN_APPROX_NONE)[-2:]
r = cv2.boundingRect(contours[0])
x_min = r[0]
y_min = r[1]
x_max = r[0] + r[2]
y_max = r[1] + r[3]
face_loc.append([x_min, y_min, x_max, y_max])
face_loc = np.array(face_loc)
gt_sub_dict[filename] = face_loc
gt_dict[i] = gt_sub_dict
return gt_dict
def read_pred_file(filepath):
with open(filepath, 'r') as f:
lines = f.readlines()
img_file = lines[0].rstrip('\n')
lines = lines[2:]
boxes = []
for line in lines:
line = line.rstrip('\n').split(' ')
if line[0] == '':
continue
boxes.append([float(line[0]), float(line[1]), float(line[2]), float(line[3]), float(line[4])])
boxes = np.array(boxes)
return img_file.split('/')[-1], boxes
def get_preds_box(pred_dir):
events = os.listdir(pred_dir)
boxes = dict()
pbar = tqdm.tqdm(events)
for event in pbar:
pbar.set_description('Reading Predictions Boxes')
event_dir = os.path.join(pred_dir, event)
event_images = os.listdir(event_dir)
current_event = dict()
for imgtxt in event_images:
imgname, _boxes = read_pred_file(os.path.join(event_dir, imgtxt))
current_event[imgname.rstrip('.jpg')] = _boxes
boxes[event] = current_event
return boxes
def norm_score(pred):
""" norm score
pred {key: [[x1,y1,x2,y2,s]]}
"""
max_score = 0
min_score = 1
for _, k in pred.items():
for _, v in k.items():
if v.size == 0:
continue
_min = np.min(v[:, -1])
_max = np.max(v[:, -1])
max_score = max(_max, max_score)
min_score = min(_min, min_score)
diff = max_score - min_score
for _, k in pred.items():
for _, v in k.items():
if v.size:
continue
v[:, -1] = (v[:, -1] - min_score) / diff
def image_eval(pred, gt, ignore, iou_thresh):
""" single image evaluation
pred: Nx5
gt: Nx4
ignore:
"""
_pred = pred.copy()
_gt = gt.copy()
pred_recall = np.zeros(_pred.shape[0])
recall_list = np.zeros(_gt.shape[0])
proposal_list = np.ones(_pred.shape[0])
_pred[:, 2] = _pred[:, 2] + _pred[:, 0]
_pred[:, 3] = _pred[:, 3] + _pred[:, 1]
overlaps = bbox_overlaps(_pred[:, :4], _gt)
for h in range(_pred.shape[0]):
gt_overlap = overlaps[h]
max_overlap, max_idx = gt_overlap.max(), gt_overlap.argmax()
if max_overlap >= iou_thresh:
if ignore[max_idx] == 0:
recall_list[max_idx] = -1
proposal_list[h] = -1
elif recall_list[max_idx] == 0:
recall_list[max_idx] = 1
r_keep_index = np.where(recall_list == 1)[0]
pred_recall[h] = len(r_keep_index)
return pred_recall, proposal_list
def img_pr_info(thresh_num, pred_info, proposal_list, pred_recall):
pr_info = np.zeros((thresh_num, 2)).astype('float')
for t in range(thresh_num):
thresh = 1 - (t + 1) / thresh_num
r_index = np.where(pred_info[:, 4] >= thresh)[0]
if r_index.size == 0:
pr_info[t, 0] = 0
pr_info[t, 1] = 0
else:
r_index = r_index[-1]
p_index = np.where(proposal_list[:r_index + 1] == 1)[0]
pr_info[t, 0] = len(p_index)
pr_info[t, 1] = pred_recall[r_index]
return pr_info
def dataset_pr_info(thresh_num, pr_curve, count_face):
_pr_curve = np.zeros((thresh_num, 2))
for i in range(thresh_num):
_pr_curve[i, 0] = pr_curve[i, 1] / pr_curve[i, 0]
_pr_curve[i, 1] = pr_curve[i, 1] / count_face
return _pr_curve
def voc_ap(rec, prec):
# correct AP calculation
# first append sentinel values at the end
mrec = np.concatenate(([0.], rec, [1.]))
mpre = np.concatenate(([0.], prec, [0.]))
# compute the precision envelope
for i in range(mpre.size - 1, 0, -1):
mpre[i - 1] = np.maximum(mpre[i - 1], mpre[i])
# to calculate area under PR curve, look for points
# where X axis (recall) changes value
i = np.where(mrec[1:] != mrec[:-1])[0]
# and sum (\Delta recall) * prec
ap = np.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1])
return ap
def evaluation(pred, gt_path, iou_thresh=0.5):
pred = get_preds_box(pred)
norm_score(pred)
gt_box_dict = get_gt_boxes(gt_path)
event = list(pred.keys())
event = [int(e) for e in event]
event.sort()
thresh_num = 1000
aps = []
pbar = tqdm.tqdm(range(len(event)))
for setting_id in pbar:
pbar.set_description('Predicting ... ')
# different setting
count_face = 0
pr_curve = np.zeros((thresh_num, 2)).astype('float')
gt = gt_box_dict[event[setting_id]]
pred_list = pred[str(event[setting_id])]
gt_list = list(gt.keys())
for j in range(len(gt_list)):
gt_boxes = gt[gt_list[j]].astype('float') # from image name get gt boxes
pred_info = pred_list[gt_list[j]]
keep_index = np.array(range(1, len(gt_boxes) + 1))
count_face += len(keep_index)
ignore = np.zeros(gt_boxes.shape[0])
if gt_boxes.size == 0 or pred_info.size == 0:
continue
if keep_index.size != 0:
ignore[keep_index - 1] = 1
pred_recall, proposal_list = image_eval(pred_info, gt_boxes, ignore, iou_thresh)
_img_pr_info = img_pr_info(thresh_num, pred_info, proposal_list, pred_recall)
pr_curve += _img_pr_info
pr_curve = dataset_pr_info(thresh_num, pr_curve, count_face)
propose = pr_curve[:, 0]
recall = pr_curve[:, 1]
ap = voc_ap(recall, propose)
aps.append(ap)
print("==================== Results ====================")
for i in range(len(aps)):
print("FDDB-fold-{} Val AP: {}".format(event[i], aps[i]))
print("FDDB Dataset Average AP: {}".format(sum(aps)/len(aps)))
print("=================================================")
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument('--pred')
parser.add_argument('--gt')
args = parser.parse_args()
evaluation(args.pred, args.gt)
# Copyright 2022 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
from mindspore import nn, Tensor, ops
from mindspore import dtype as mstype
from mindspore import numpy as mnp
from src.config import cfg
class MultiBoxLoss(nn.Cell):
"""SSD Weighted Loss Function
"""
def __init__(self, use_head_loss=False):
super(MultiBoxLoss, self).__init__()
self.use_head_loss = use_head_loss
self.num_classes = cfg.NUM_CLASSES
self.negpos_ratio = cfg.NEG_POS_RATIOS
self.cast = ops.Cast()
self.sum = ops.ReduceSum()
self.loc_loss = nn.SmoothL1Loss()
self.cls_loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True)
self.sort_descending = ops.Sort(descending=True)
self.stack = ops.Stack(axis=1)
self.unsqueeze = ops.ExpandDims()
self.gather = ops.GatherNd()
def construct(self, predictions, targets):
"""Multibox Loss"""
if self.use_head_loss:
_, _, loc_data, conf_data = predictions
else:
loc_data, conf_data, _, _ = predictions
loc_t, conf_t = targets
loc_data = self.cast(loc_data, mstype.float32)
conf_data = self.cast(conf_data, mstype.float32)
loc_t = self.cast(loc_t, mstype.float32)
conf_t = self.cast(conf_t, mstype.int32)
batch_size, box_num, _ = conf_data.shape
mask = self.cast(conf_t > 0, mstype.float32)
pos_num = self.sum(mask, 1)
loc_loss = self.sum(self.loc_loss(loc_data, loc_t), 2)
loc_loss = self.sum(mask * loc_loss)
# Hard Negative Mining
con = self.cls_loss(conf_data.view(-1, self.num_classes), conf_t.view(-1))
con = con.view(batch_size, -1)
con_neg = con * (1 - mask)
value, _ = self.sort_descending(con_neg)
neg_num = self.cast(ops.minimum(self.negpos_ratio * pos_num, box_num), mstype.int32)
batch_iter = Tensor(mnp.arange(batch_size), dtype=mstype.int32)
neg_index = self.stack((batch_iter, neg_num))
min_neg_score = self.unsqueeze(self.gather(value, neg_index), 1)
neg_mask = self.cast(con_neg > min_neg_score, mstype.float32)
all_mask = mask + neg_mask
all_mask = ops.stop_gradient(all_mask)
cls_loss = self.sum(con * all_mask)
N = self.sum(pos_num)
N = ops.maximum(self.cast(N, mstype.float32), 0.25)
loc_loss /= N
cls_loss /= N
return loc_loss, cls_loss
# Copyright 2022 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
# This file was copied from project [ZhaoWeicheng][Pyramidbox.pytorch]
from itertools import product
import numpy as np
class PriorBox:
"""Compute priorbox coordinates in center-offset form for each source
feature map.
"""
def __init__(self, cfg, feature_maps=None, input_size=(640, 640), phase='train'):
self.imh = input_size[0]
self.imw = input_size[1]
# number of priors for feature map location (either 4 or 6)
self.variance = cfg.VARIANCE or [0.1]
if phase == 'train':
self.feature_maps = cfg.FEATURE_MAPS
else:
self.feature_maps = feature_maps
self.min_sizes = cfg.ANCHOR_SIZES
self.steps = cfg.STEPS
self.clip = cfg.CLIP
for v in self.variance:
if v <= 0:
raise ValueError('Variances must be greater than 0')
def forward(self):
mean = []
for k in range(len(self.feature_maps)):
feath = self.feature_maps[k][0]
featw = self.feature_maps[k][1]
for i, j in product(range(feath), range(featw)):
f_kw = self.imw / self.steps[k]
f_kh = self.imh / self.steps[k]
cx = (j + 0.5) / f_kw
cy = (i + 0.5) / f_kh
s_kw = self.min_sizes[k] / self.imw
s_kh = self.min_sizes[k] / self.imh
mean += [cx, cy, s_kw, s_kh]
output = np.array(mean).reshape(-1, 4)
if self.clip:
output = np.clip(output, 0, 1)
return output
# Copyright 2022 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
# This file was copied from project [ZhaoWeicheng][Pyramidbox.pytorch]
from mindspore import nn, ops, Parameter, Tensor
from mindspore.common import initializer
from mindspore import dtype as mstype
from src.loss import MultiBoxLoss
class L2Norm(nn.Cell):
def __init__(self, n_channles, scale):
super(L2Norm, self).__init__()
self.n_channels = n_channles
self.gamma = scale or None
self.eps = 1e-10
self.weight = Parameter(Tensor(shape=(self.n_channels), init=initializer.Constant(value=self.gamma),
dtype=mstype.float32))
self.pow = ops.Pow()
self.sum = ops.ReduceSum()
self.div = ops.Div()
def construct(self, x):
norm = self.pow(x, 2).sum(axis=1, keepdims=True)
norm = ops.sqrt(norm) + self.eps
x = self.div(x, norm)
out = self.weight[None, :][:, :, None][:, :, :, None].expand_as(x) * x
return out
class ConvBn(nn.Cell):
"""docstring for conv"""
def __init__(self,
in_plane,
out_plane,
kernel_size,
stride,
padding):
super(ConvBn, self).__init__()
self.conv1 = nn.Conv2d(in_plane, out_plane, kernel_size, stride, pad_mode='pad',
padding=padding, has_bias=True, weight_init='xavier_uniform')
self.bn1 = nn.BatchNorm2d(out_plane)
def construct(self, x):
x = self.conv1(x)
return self.bn1(x)
class CPM(nn.Cell):
"""docstring for CPM"""
def __init__(self, in_plane):
super(CPM, self).__init__()
self.branch1 = ConvBn(in_plane, 1024, 1, 1, 0)
self.branch2a = ConvBn(in_plane, 256, 1, 1, 0)
self.branch2b = ConvBn(256, 256, 3, 1, 1)
self.branch2c = ConvBn(256, 1024, 1, 1, 0)
self.relu = nn.ReLU()
self.ssh_1 = nn.Conv2d(1024, 256, kernel_size=3, stride=1, pad_mode='pad', padding=1,
has_bias=True, weight_init='xavier_uniform')
self.ssh_dimred = nn.Conv2d(1024, 128, kernel_size=3, stride=1, pad_mode='pad', padding=1,
has_bias=True, weight_init='xavier_uniform')
self.ssh_2 = nn.Conv2d(128, 128, kernel_size=3, stride=1, pad_mode='pad', padding=1,
has_bias=True, weight_init='xavier_uniform')
self.ssh_3a = nn.Conv2d(128, 128, kernel_size=3, stride=1, pad_mode='pad', padding=1, has_bias=True,
weight_init='xavier_uniform')
self.ssh_3b = nn.Conv2d(128, 128, kernel_size=3, stride=1, pad_mode='pad', padding=1,
has_bias=True, weight_init='xavier_uniform')
self.cat = ops.Concat(1)
def construct(self, x):
out_residual = self.branch1(x)
x = self.relu(self.branch2a(x))
x = self.relu(self.branch2b(x))
x = self.branch2c(x)
rescomb = self.relu(x + out_residual)
ssh1 = self.ssh_1(rescomb)
ssh_dimred = self.relu(self.ssh_dimred(rescomb))
ssh_2 = self.ssh_2(ssh_dimred)
ssh_3a = self.relu(self.ssh_3a(ssh_dimred))
ssh_3b = self.ssh_3b(ssh_3a)
ssh_out = self.cat((ssh1, ssh_2, ssh_3b))
ssh_out = self.relu(ssh_out)
return ssh_out
class PyramidBox(nn.Cell):
"""docstring for PyramidBox"""
def __init__(self,
phase,
base,
extras,
lfpn_cpm,
head,
num_classes):
super(PyramidBox, self).__init__()
self.vgg = nn.CellList(base)
self.extras = nn.CellList(extras)
self.num_classes = num_classes
self.L2Norm3_3 = L2Norm(256, 10)
self.L2Norm4_3 = L2Norm(512, 8)
self.L2Norm5_3 = L2Norm(512, 5)
self.lfpn_topdown = nn.CellList(lfpn_cpm[0])
self.lfpn_later = nn.CellList(lfpn_cpm[1])
self.cpm = nn.CellList(lfpn_cpm[2])
self.loc_layers = nn.CellList(head[0])
self.conf_layers = nn.CellList(head[1])
self.relu = nn.ReLU()
self.concat = ops.Concat(1)
self.is_infer = False
if phase == 'test':
self.softmax = nn.Softmax(axis=-1)
self.is_infer = True
def _upsample_prod(self, x, y):
_, _, H, W = y.shape
resize_bilinear = nn.ResizeBilinear()
result = resize_bilinear(x, size=(H, W), align_corners=True) * y
return result
def construct(self, x):
# apply vgg up to conv3_3 relu
for k in range(16):
x = self.vgg[k](x)
conv3_3 = x
# apply vgg up to conv4_3
for k in range(16, 23):
x = self.vgg[k](x)
conv4_3 = x
for k in range(23, 30):
x = self.vgg[k](x)
conv5_3 = x
for k in range(30, len(self.vgg)):
x = self.vgg[k](x)
convfc_7 = x
# apply extra layers and cache source layer outputs
for k in range(2):
x = self.relu(self.extras[k](x))
conv6_2 = x
for k in range(2, 4):
x = self.relu(self.extras[k](x))
conv7_2 = x
x = self.relu(self.lfpn_topdown[0](convfc_7))
lfpn2_on_conv5 = self.relu(self._upsample_prod(
x, self.lfpn_later[0](conv5_3)))
x = self.relu(self.lfpn_topdown[1](lfpn2_on_conv5))
lfpn1_on_conv4 = self.relu(self._upsample_prod(
x, self.lfpn_later[1](conv4_3)))
x = self.relu(self.lfpn_topdown[2](lfpn1_on_conv4))
lfpn0_on_conv3 = self.relu(self._upsample_prod(
x, self.lfpn_later[2](conv3_3)))
ssh_conv3_norm = self.cpm[0](self.L2Norm3_3(lfpn0_on_conv3))
ssh_conv4_norm = self.cpm[1](self.L2Norm4_3(lfpn1_on_conv4))
ssh_conv5_norm = self.cpm[2](self.L2Norm5_3(lfpn2_on_conv5))
ssh_convfc7 = self.cpm[3](convfc_7)
ssh_conv6 = self.cpm[4](conv6_2)
ssh_conv7 = self.cpm[5](conv7_2)
face_locs, face_confs = [], []
head_locs, head_confs = [], []
N = ssh_conv3_norm.shape[0]
mbox_loc = self.loc_layers[0](ssh_conv3_norm)
face_loc, head_loc = ops.Split(axis=1, output_num=2)(mbox_loc)
face_loc = ops.Transpose()(face_loc, (0, 2, 3, 1)).view(N, -1, 4)
if not self.is_infer:
head_loc = ops.Transpose()(head_loc, (0, 2, 3, 1)).view(N, -1, 4)
mbox_conf = self.conf_layers[0](ssh_conv3_norm)
face_conf1 = mbox_conf[:, 3:4, :, :]
_, face_conf3_maxin = ops.ArgMaxWithValue(axis=1, keep_dims=True)(mbox_conf[:, 0:3, :, :])
face_conf = self.concat((face_conf3_maxin, face_conf1))
face_conf = ops.Transpose()(face_conf, (0, 2, 3, 1)).view(N, -1, 2)
head_conf = None
if not self.is_infer:
_, head_conf3_maxin = ops.ArgMaxWithValue(axis=1, keep_dims=True)(mbox_conf[:, 4:7, :, :])
head_conf1 = mbox_conf[:, 7:, :, :]
head_conf = self.concat((head_conf3_maxin, head_conf1))
head_conf = ops.Transpose()(head_conf, (0, 2, 3, 1)).view(N, -1, 2)
face_locs.append(face_loc)
face_confs.append(face_conf)
if not self.is_infer:
head_locs.append(head_loc)
head_confs.append(head_conf)
inputs = [ssh_conv4_norm, ssh_conv5_norm,
ssh_convfc7, ssh_conv6, ssh_conv7]
feature_maps = []
feat_size = ssh_conv3_norm.shape[2:]
feature_maps.append([feat_size[0], feat_size[1]])
for i, feat in enumerate(inputs):
feat_size = feat.shape[2:]
feature_maps.append([feat_size[0], feat_size[1]])
mbox_loc = self.loc_layers[i + 1](feat)
face_loc, head_loc = ops.Split(axis=1, output_num=2)(mbox_loc)
face_loc = ops.Transpose()(face_loc, (0, 2, 3, 1)).view(N, -1, 4)
if not self.is_infer:
head_loc = ops.Transpose()(head_loc, (0, 2, 3, 1)).view(N, -1, 4)
mbox_conf = self.conf_layers[i + 1](feat)
face_conf1 = mbox_conf[:, 0:1, :, :]
_, face_conf3_maxin = ops.ArgMaxWithValue(axis=1, keep_dims=True)(mbox_conf[:, 1:4, :, :])
face_conf = self.concat((face_conf1, face_conf3_maxin))
face_conf = ops.Transpose()(face_conf, (0, 2, 3, 1)).ravel().view(N, -1, 2)
if not self.is_infer:
head_conf = ops.Transpose()(mbox_conf[:, 4:, :, :], (0, 2, 3, 1)).view(N, -1, 2)
face_locs.append(face_loc)
face_confs.append(face_conf)
if not self.is_infer:
head_locs.append(head_loc)
head_confs.append(head_conf)
face_mbox_loc = self.concat(face_locs)
face_mbox_conf = self.concat(face_confs)
head_mbox_loc, head_mbox_conf = None, None
if not self.is_infer:
head_mbox_loc = self.concat(head_locs)
head_mbox_conf = self.concat(head_confs)
if not self.is_infer:
output = (face_mbox_loc, face_mbox_conf, head_mbox_loc, head_mbox_conf)
else:
output = (face_mbox_loc, self.softmax(face_mbox_conf), feature_maps)
return output
vgg_cfg = [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M',
512, 512, 512, 'M']
extras_cfg = [256, 'S', 512, 128, 'S', 256]
lfpn_cpm_cfg = [256, 512, 512, 1024, 512, 256]
multibox_cfg = [512, 512, 512, 512, 512, 512]
def vgg_(cfg, i, batch_norm=False):
layers = []
in_channels = i
for v in cfg:
if v == 'M':
layers += [nn.MaxPool2d(kernel_size=2, stride=2)]
elif v == 'C':
layers += [nn.MaxPool2d(kernel_size=2, stride=2)]
else:
conv2d = nn.Conv2d(in_channels, v, kernel_size=3, pad_mode='pad', padding=1,
has_bias=True, weight_init='xavier_uniform')
if batch_norm:
layers += [conv2d, nn.BatchNorm2d(v), nn.ReLU()]
else:
layers += [conv2d, nn.ReLU()]
in_channels = v
conv6 = nn.Conv2d(512, 1024, kernel_size=3, pad_mode='pad', padding=6,
dilation=6, has_bias=True, weight_init='xavier_uniform')
conv7 = nn.Conv2d(1024, 1024, kernel_size=1, has_bias=True, weight_init='xavier_uniform')
layers += [conv6, nn.ReLU(), conv7, nn.ReLU()]
return layers
def add_extras(cfg, i):
# Extra layers added to VGG for feature scaling
layers = []
in_channels = i
flag = False
for k, v in enumerate(cfg):
if in_channels != 'S':
if v == 'S':
layers += [nn.Conv2d(in_channels, cfg[k + 1], kernel_size=(1, 3)[flag], stride=2,
pad_mode='pad', padding=1, has_bias=True, weight_init='xavier_uniform')]
else:
layers += [nn.Conv2d(in_channels, v, kernel_size=(1, 3)[flag],
has_bias=True, weight_init='xavier_uniform')]
flag = not flag
in_channels = v
return layers
def add_lfpn_cpm(cfg):
lfpn_topdown_layers = []
lfpn_latlayer = []
cpm_layers = []
for k, v in enumerate(cfg):
cpm_layers.append(CPM(v))
fpn_list = cfg[::-1][2:]
for k, v in enumerate(fpn_list[:-1]):
lfpn_latlayer.append(nn.Conv2d(fpn_list[k + 1], fpn_list[k + 1], kernel_size=1,
stride=1, padding=0, has_bias=True, weight_init='xavier_uniform'))
lfpn_topdown_layers.append(nn.Conv2d(v, fpn_list[k + 1], kernel_size=1, stride=1,
padding=0, has_bias=True, weight_init='xavier_uniform'))
return (lfpn_topdown_layers, lfpn_latlayer, cpm_layers)
def multibox(vgg, extra_layers):
loc_layers = []
conf_layers = []
vgg_source = [21, 28, -2]
i = 0
loc_layers += [nn.Conv2d(multibox_cfg[i], 8, kernel_size=3, pad_mode='pad', padding=1,
has_bias=True, weight_init='xavier_uniform')]
conf_layers += [nn.Conv2d(multibox_cfg[i], 8, kernel_size=3, pad_mode='pad', padding=1,
has_bias=True, weight_init='xavier_uniform')]
i += 1
for _, _ in enumerate(vgg_source):
loc_layers += [nn.Conv2d(multibox_cfg[i], 8, kernel_size=3, pad_mode='pad', padding=1,
has_bias=True, weight_init='xavier_uniform')]
conf_layers += [nn.Conv2d(multibox_cfg[i], 6, kernel_size=3, pad_mode='pad', padding=1,
has_bias=True, weight_init='xavier_uniform')]
i += 1
for _, _ in enumerate(extra_layers[1::2], 2):
loc_layers += [nn.Conv2d(multibox_cfg[i], 8, kernel_size=3, pad_mode='pad',
padding=1, has_bias=True, weight_init='xavier_uniform')]
conf_layers += [nn.Conv2d(multibox_cfg[i], 6, kernel_size=3, pad_mode='pad', padding=1,
has_bias=True, weight_init='xavier_uniform')]
i += 1
return vgg, extra_layers, (loc_layers, conf_layers)
def build_net(phase, num_classes=2):
base_, extras_, head_ = multibox(vgg_(vgg_cfg, 3), add_extras((extras_cfg), 1024))
lfpn_cpm = add_lfpn_cpm(lfpn_cpm_cfg)
return PyramidBox(phase, base_, extras_, lfpn_cpm, head_, num_classes)
class NetWithLoss(nn.Cell):
def __init__(self, net):
super(NetWithLoss, self).__init__()
self.net = net
self.loss_fn_1 = MultiBoxLoss()
self.loss_fn_2 = MultiBoxLoss(use_head_loss=True)
def construct(self, images, face_loc, face_conf, head_loc, head_conf):
out = self.net(images)
face_loss_l, face_loss_c = self.loss_fn_1(out, (face_loc, face_conf))
head_loss_l, head_loss_c = self.loss_fn_2(out, (head_loc, head_conf))
loss = face_loss_l + face_loss_c + head_loss_l + head_loss_c
return loss
class EvalLoss(nn.Cell):
"""
Calculate loss value while training.
"""
def __init__(self, net):
super(EvalLoss, self).__init__()
self.net = net
self.loss_fn = MultiBoxLoss()
def construct(self, images, face_loc, face_conf):
out = self.net(images)
face_loss_l, face_loss_c = self.loss_fn(out, (face_loc, face_conf))
loss = face_loss_l + face_loss_c
return loss
# Copyright 2022 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
import argparse
import os
import time
from mindspore import context, nn
from mindspore.common import set_seed
from mindspore import save_checkpoint, load_checkpoint, load_param_into_net
from mindspore.communication import management as D
from mindspore.communication.management import get_group_size, get_rank
from src.pyramidbox import build_net, NetWithLoss, EvalLoss
from src.dataset import create_val_dataset, create_train_dataset
from src.config import cfg
MIN_LOSS = 10000
def parse_args():
parser = argparse.ArgumentParser(description='Pyramidbox face Detector Training With MindSpore')
parser.add_argument('--basenet', default='vgg16.ckpt', help='Pretrained base model')
parser.add_argument('--batch_size', default=4, type=int, help='Batch size for training')
parser.add_argument('--num_workers', default=8, type=int, help='Number of workers used in dataloading')
parser.add_argument('--device_target', dest='device_target', help='device for training',
choices=['GPU', 'Ascend'], default='GPU', type=str)
parser.add_argument('--lr', '--learning-rate', default=0.001, type=float, help='initial learning rate')
parser.add_argument('--momentum', default=0.9, type=float, help='Momentum value for optim')
parser.add_argument('--weight_decay', default=5e-4, type=float, help='Weight decay for SGD')
parser.add_argument('--gamma', default=0.1, type=float, help='Gamma update for SGD')
parser.add_argument('--distribute', default=False, type=bool, help='Use mutil Gpu training')
parser.add_argument('--save_folder', default='checkpoints/', help='Directory for saving checkpoint models')
parser.add_argument('--epoches', default=100, type=int, help="Epoches to train model")
parser.add_argument('--val_mindrecord', default='data/val.mindrecord', type=str, help="Path of val mindrecord file")
args_ = parser.parse_args()
return args_
def train(args):
print("The argument is: ", args)
context.set_context(device_target=args.device_target, mode=context.GRAPH_MODE)
device_id = 0
device_num = 1
ckpt_folder = os.path.join(args.save_folder, 'distribute_0')
if args.distribute:
D.init()
device_id = get_rank()
device_num = get_group_size()
if device_id == 0 and not os.path.exists(ckpt_folder):
os.mkdir(ckpt_folder)
context.reset_auto_parallel_context()
context.set_auto_parallel_context(parallel_mode=context.ParallelMode.DATA_PARALLEL, gradients_mean=True,
device_num=device_num)
else:
context.set_context(device_id=int(os.getenv('DEVICE_ID', '0')))
# Create train dataset
ds_train = create_train_dataset(cfg, args.batch_size, device_num, device_id, args.num_workers)
# Create val dataset
ds_val = create_val_dataset(args.val_mindrecord, args.batch_size, 1, 0, args.num_workers)
steps_per_epoch = ds_train.get_dataset_size()
net = build_net("train", cfg.NUM_CLASSES)
# load pretrained vgg16
vgg_params = load_checkpoint(args.basenet)
load_param_into_net(net.vgg, vgg_params)
network = NetWithLoss(net)
network.set_train(True)
if args.distribute:
milestone = cfg.DIS_LR_STEPS + [args.epoches * steps_per_epoch]
else:
milestone = cfg.LR_STEPS + [args.epoches * steps_per_epoch]
learning_rates = [args.lr, args.lr * 0.1, args.lr * 0.01, args.lr * 0.001]
lr_scheduler = nn.piecewise_constant_lr(milestone, learning_rates)
optimizer = nn.SGD(params=network.trainable_params(), learning_rate=lr_scheduler, momentum=args.momentum,
weight_decay=args.weight_decay)
# train net
train_net = nn.TrainOneStepCell(network, optimizer)
train_net.set_train(True)
eval_net = EvalLoss(net)
print("Start training net")
whole_step = 0
for epoch in range(1, args.epoches+1):
step = 0
time_list = []
for d in ds_train.create_tuple_iterator():
start_time = time.time()
loss = train_net(*d)
step += 1
whole_step += 1
print(f'epoch: {epoch} total step: {whole_step}, step: {step}, loss is {loss}')
per_time = time.time() - start_time
time_list.append(per_time)
net.set_train(False)
if args.distribute and device_id == 0:
print('per step time: ', '%.2f' % (sum(time_list) / len(time_list) * 1000), "(ms/step)")
val(epoch, eval_net, train_net, ds_val, ckpt_folder)
elif not args.distribute:
print('per step time: ', '%.2f' % (sum(time_list) / len(time_list) * 1000), "(ms/step)")
val(epoch, eval_net, train_net, ds_val, args.save_folder)
net.set_train(True)
def val(epoch, eval_net, model, ds_val, ckpt_dir):
face_loss_list = []
global MIN_LOSS
for (images, face_loc, face_conf, _, _) in ds_val.create_tuple_iterator():
face_loss = eval_net(images, face_loc, face_conf)
face_loss_list.append(face_loss)
a_loss = sum(face_loss_list) / len(face_loss_list)
if a_loss < MIN_LOSS:
MIN_LOSS = a_loss
print("Saving best ckpt, epoch is ", epoch)
save_checkpoint(model, os.path.join(ckpt_dir, f'pyramidbox_best_{epoch}.ckpt'))
if __name__ == '__main__':
train_args = parse_args()
set_seed(66)
if not os.path.exists(train_args.save_folder):
os.mkdir(train_args.save_folder)
train(train_args)
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment