diff --git a/research/cv/MTCNN/README_CN.md b/research/cv/MTCNN/README_CN.md new file mode 100644 index 0000000000000000000000000000000000000000..fa8d8df5a5e7ccb366b4efc962c6997c2ac6842d --- /dev/null +++ b/research/cv/MTCNN/README_CN.md @@ -0,0 +1,598 @@ +# 目录 + +<!-- TOC --> + +- [目录](#目录) +- [MTCNN描述](#mtcnn描述) +- [模型架构](#模型架构) +- [数据集](#数据集) + - [WIDER Face](#wider-face) + - [Dataset of Deep Convolutional Network Cascade for Facial Point Detection](#dataset-of-deep-convolutional-network-cascade-for-facial-point-detection) + - [FDDB](#fddb) +- [环境要求](#环境要求) +- [快速入门](#快速入门) +- [脚本说明](#脚本说明) + - [脚本及样例代码](#脚本及样例代码) + - [脚本参数](#脚本参数) + - [wider_face_train_bbx_gt.txt预处理](#widerfacetrainbbxgttxt预处理) + - [训练模型](#训练模型) + - [评估模型](#评估模型) + - [配置参数](#配置参数) + - [训练过程](#训练过程) + - [1.训练PNet](#1-训练pnet) + - [单卡训练PNet](#单卡训练pnet) + - [多卡训练PNet](#多卡训练pnet) + - [2.训练RNet](#2-训练rnet) + - [单卡训练RNet](#单卡训练rnet) + - [多卡训练RNet](#多卡训练rnet) + - [3.训练ONet](#3-训练onet) + - [单卡训练ONet](#单卡训练onet) + - [多卡训练ONet](#多卡训练onet) + - [评估过程](#评估过程) +- [模型描述](#模型描述) + - [性能](#性能) + +<!-- /TOC --> + +# MTCNN描述 + +MTCNN(Multi-task Cascaded Convolutional Networks)是一种多任务级联卷积神经网络,用以同时处理人脸检测和人脸关键点定位问题。作者认为人脸检测和人脸关键点检测两个任务之间往往存在着潜在的联系,然而大多数方法都未将两个任务有效的结合起来,MTCNN充分利用两任务之间潜在的联系,将人脸检测和人脸关键点检测同时进行,可以实现人脸检测和5个特征点的标定。 + +[论文](https://kpzhang93.github.io/MTCNN_face_detection_alignment/): Zhang K , Zhang Z , Li Z , et al. Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks[J]. IEEE Signal Processing Letters, 2016, 23(10):1499-1503. + +# 模型架构 + +MTCNN为了解决人脸识别的两阶段问题,提出三个级联的多任务卷积神经网络(Proposal Network (P-Net)、Refine Network (R-Net)、Output Network (O-Net),每个多任务卷积神经网络均有三个学习任务,分别是人脸分类、边框回归和关键点定位。每一级的输出作为下一级的输入。 + +# 数据集 + +使用的数据集一共有三个: + +1. [WIDER Face](http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace/) +1. [Dataset of Deep Convolutional Network Cascade for Facial Point Detection](http://mmlab.ie.cuhk.edu.hk/archive/CNN_FacePoint.htm) +1. [FDDB](http://vis-www.cs.umass.edu/fddb/index.html) + +详细地, + +## WIDER Face + +- WIDER Face数据集用于训练模型,下载训练数据WIDER Face Training Images,解压下载的WIDER_train数据集于项目dataset文件夹下。 +- 下载WIDER Face的[标注文件](http://shuoyang1213.me/WIDERFACE/support/bbx_annotation/wider_face_split.zip),解压并将wider_face_train_bbx_gt.txt文件保存在dataset文件夹下。 +- 数据集大小:包含32,203张图片,393,703个标注人脸。 + - WIDER_train: 1.4G +- 检查WIDER_train文件夹在dataset文件夹下,并检查dataset文件下含有wider_face_train_bbx_gt.txt文件,包含人脸标注信息。 + +## Dataset of Deep Convolutional Network Cascade for Facial Point Detection + +- 该数据集用于训练模型,下载数据集Training set并解压,将其中的lfw_5590和net_7876文件夹以及trainImageList.txt文件放置在datatset文件夹下。 +- 数据集大小:包含5,590张LFW图片和7,876张其他图片。 + - lfw_5590:58M + - net_7876:100M +- 检查trainImageList.txt文件、lfw_5590文件夹和net_7876文件夹在dataset文件夹下。 + +## FDDB + +- FDDB数据集用来评估模型,下载[originalPics.tar.gz](http://vis-www.cs.umass.edu/fddb/originalPics.tar.gz)压缩包和[FDDB-folds.tgz](http://vis-www.cs.umass.edu/fddb/FDDB-folds.tgz)压缩包,originalPics.tar.gz压缩包包含未标注的图片,FDDB-folds.tgz包含标注信息。 + +- 数据集大小:包含2,845张图片和5,171个人脸标注。 + + - originalPics.tar.gz:553M + - FDDB-folds.tgz:1M + +- 在dataset文件夹下新建文件夹FDDB。 + +- 解压originalPics.tar.gz至FDDB,包含两个文件夹2002和2003: + + ````bash + ├── 2002 + │ ├── 07 + │ ├── 08 + │ ├── 09 + │ ├── 10 + │ ├── 11 + │ └── 12 + ├── 2003 + │ ├── 01 + │ ├── 02 + │ ├── 03 + │ ├── 04 + │ ├── 05 + │ ├── 06 + │ ├── 07 + │ ├── 08 + │ └── 09 + ```` + +- 解压FDDB-folds.tgz至FDDB,包含20个txt文件: + + ```bash + FDDB-folds + │ ├── FDDB-fold-01-ellipseList.txt + │ ├── FDDB-fold-01.txt + │ ├── FDDB-fold-02-ellipseList.txt + │ ├── FDDB-fold-02.txt + │ ├── FDDB-fold-03-ellipseList.txt + │ ├── FDDB-fold-03.txt + │ ├── FDDB-fold-04-ellipseList.txt + │ ├── FDDB-fold-04.txt + │ ├── FDDB-fold-05-ellipseList.txt + │ ├── FDDB-fold-05.txt + │ ├── FDDB-fold-06-ellipseList.txt + │ ├── FDDB-fold-06.txt + │ ├── FDDB-fold-07-ellipseList.txt + │ ├── FDDB-fold-07.txt + │ ├── FDDB-fold-08-ellipseList.txt + │ ├── FDDB-fold-08.txt + │ ├── FDDB-fold-09-ellipseList.txt + │ ├── FDDB-fold-09.txt + │ ├── FDDB-fold-10-ellipseList.txt + │ ├── FDDB-fold-10.txt + ``` + +- 检查2002,2003,FDDB-folds三个文件夹在FDDB文件夹下,且FDDB文件夹在dataset文件夹下。 + +--------- +综上,一共有两个训练集,分别是WIDER Face和Dataset of Deep Convolutional Network Cascade for Facial Point Detection;一个测试集FDDB。 + +训练之前,请修改`config.py`文件中的`DATASET_DIR`字段为dataset文件夹路径。 + +总的数据集目录结构如下: + +```bash +dataset +├── FDDB + ├── 2002 + ├── 2003 + └── FDDB-folds +├── lfw_5590 +├── net_7876 +├── trainImageList.txt +├── wider_face_train_bbx_gt.txt +└── WIDER_train + └── images +``` + +# 环境要求 + +- 硬件(Ascend/GPU/CPU) + - 使用GPU搭建硬件环境 +- 框架 + [MindSpore](https://www.mindspore.cn/install/en) +- 如需查看详情,请参见如下资源: + - [MindSpore教程](https://www.mindspore.cn/tutorials/zh-CN/master/index.html) + - [MindSpore Python API](https://www.mindspore.cn/docs/api/zh-CN/master/index.html) + +# 快速入门 + +通过官方网站安装MindSpore后,您可以按照如下步骤进行训练和评估: + +数据集准备完成后,请修改`config.py`文件中的`DATASET_DIR`字段为dataset文件夹路径。 + +在开始训练前,需要对wider_face_train_bbx_gt.txt文件进行预处理以生成wider_face_train.txt文件。 + +````bash +# 预处理wider_face_train_bbx_gt.txt文件 + +# 切换到项目主目录 +python preprocess.py +# 成功执行后dataset文件夹下出现wider_face_train.txt +```` + +因为MTCNN由PNet, RNet, ONet三个子模型组成,因此训练过程总体分为三大步骤: + +1. 训练PNet + +```bash +# 1-1. 生成用于训练PNet模型的mindrecord文件,默认保存在mindrecords文件夹中 +bash scripts/generate_train_mindrecord_pnet.sh + +``` + +``` bash +# 1-2. 待mindrecord文件生成完毕,开始训练PNet模型 + +# 单卡训练 +bash scripts/run_standalone_train_gpu.sh pnet DEVICE_ID MINDRECORD_FILE +# example: bash scripts/run_standalone_train_gpu.sh pent 0 mindrecords/PNet_train.mindrecord + +# 多卡训练 +bash scripts/run_distribute_train.sh pnet DEVICE_NUM MINDRECORD_FILE +# example: bash scripts/run_distribute_train.sh pnet 8 mindrecords/PNET_train.mindrecord +``` + +2. 训练RNet + +```bash +# 2-1. 生成用于训练RNet模型的mindrecord文件,默认保存在mindrecords文件夹中 +bash scripts/generate_train_mindrecord_rnet.sh PNET_CKPT +# example: scripts/generate_train_mindrecord_rnet.sh checkpoints/pnet.ckpt +``` + +```bash +# 2-2. 待mindrecord文件生成完毕,开始训练RNet模型 + +# 单卡训练 +bash scripts/run_standalone_train_gpu.sh rnet DEVICE_ID MINDRECORD_FILE +# example: bash scripts/run_standalone_train_gpu.sh rnet 0 mindrecords/RNET_train.mindrecord + +# 多卡训练 +bash scripts/run_distribute_train_gpu.sh rnet DEVICE_NUM MINDRECORD_FILE +# example: bash scripts/rum_distribute_train_gpu.sh rnet 8 mindrecords/RNET_train.mindrecord +``` + +3. 训练ONet + +```bash +# 3-1. 生成用于训练ONet模型的mindrecord文件,默认保存在mindrecords文件夹中 +bash scripts/generate_train_mindrecord_onet.sh PNET_CKPT RNET_CKPT +# example: bash scripts/generate_train_mindrecord_onet.sh checkpoints/pnet.ckpt checkpoints/rnet.ckpt +``` + +```bash +# 3-2. 待mindrecord文件生成完毕,开始训练ONet模型 + +# 单卡训练 +bash scripts/run_standalone_train_gpu.sh onet DEVICE_ID MINDRECORD_FILE +# example: bash scripts/run_standalone_train_gpu.sh onet 0 mindrecords/ONET_train.mindrecord + +# 多卡训练 +bash scripts/run_distribute_train_gpu.sh onet DEVICE_NUM MINDRECORD_FILE +# example: bash scripts/rum_distribute_train_gpu.sh onet 8 mindrecords/ONET_train.mindrecord +``` + +训练完毕后,开始评估MTCNN模型。 + +``` bash +# 评估模型 +bash scripts/run_eval_gpu.sh PNET_CKPT RNET_CKPT ONET_CKPT +# example: bash scripts/run_eval_gpu.sh checkpoints/pnet.ckpt checkpoints/rnet.ckpt checkpoints/onet.ckpt +``` + +# 脚本说明 + +## 脚本及样例代码 + +```bash +MTCNN +├── dataset // 保存原始数据集和标注文件(需要自行创建该文件夹) +├── eval.py // 评估脚本 +├── preprocess.py // wider_face_train_bbx_gt.txt文件预处理脚本 +├── README_CN.md // MTCNN中文描述文档 +├── config.py // 配置文件 +├── scripts +│ ├── generate_train_mindrecord_pnet.sh // 生成用于训练PNet的mindrecord文件shell脚本 +│ ├── generate_train_mindrecord_rnet.sh // 生成用于训练RNet的mindrecord文件shell脚本 +│ ├── generate_train_mindrecord_onet.sh // 生成用于训练ONet的mindrecord文件shell脚本 +│ ├── run_distributed_train_gpu.sh // GPU多卡训练shell脚本 +│ ├── run_eval_gpu.sh // GPU模型评估shell脚本 +│ └── run_standalone_train_gpu.sh // GPU单卡训练shell脚本 +├── src +│ ├── acc_callback.py // 自定义训练回调函数脚本 +│ ├── dataset.py // 创建Mindspore数据集脚本 +│ ├── evaluate.py // 模型评估脚本 +│ ├── loss.py // 损失函数 +│ ├── utils.py // 工具函数 +│ ├── models +│ │ ├── mtcnn.py // MTCNN模型 +│ │ ├── mtcnn_detector.py // MTCNN检测器 +│ │ └── predict_nets.py // 模型推理函数 +│ ├── prepare_data +│ │ ├── generate_PNet_data.py // 生成PNet的mindrecord文件 +│ │ ├── generate_RNet_data.py // 生成RNet的mindrecord文件 +│ │ └── generate_ONet_data.py // 生成ONet的mindrecord文件 +│ └── train_models +│ ├── train_p_net.py // 训练PNet脚本 +│ ├── train_r_net.py // 训练RNet脚本 +│ └── train_o_net.py // 训练ONet脚本 +└── train.py // 训练模型脚本 + +``` + +## 脚本参数 + +### wider_face_train_bbx_gt.txt预处理 + +```bash +usage: preprocess.py [-f F] + +Preprocess WIDER Face Annotation file + +optional arguments: + -f F Original wider face train annotation file +``` + +### 训练模型 + +```bash +usage: train.py --model {pnet,rnet,onet} --mindrecord_file + MINDRECORD_FILE [--ckpt_path CKPT_PATH] + [--save_ckpt_steps SAVE_CKPT_STEPS] [--max_ckpt MAX_CKPT] + [--end_epoch END_EPOCH] [--lr LR] [--batch_size BATCH_SIZE] + [--device_target {GPU,Ascend}] [--distribute] + [--num_workers NUM_WORKERS] + +Train PNet/RNet/ONet + +optional arguments: + --model {pnet,rnet,onet} + Choose model to train + --mindrecord_file MINDRECORD_FILE + mindrecord file for training + --ckpt_path CKPT_PATH + save checkpoint directory + --save_ckpt_steps SAVE_CKPT_STEPS + steps to save checkpoint + --max_ckpt MAX_CKPT maximum number of ckpt + --end_epoch END_EPOCH + end epoch of training + --lr LR learning rate + --batch_size BATCH_SIZE + train batch size + --device_target {GPU,Ascend} + device for training + --distribute + --num_workers NUM_WORKERS +``` + +### 评估模型 + +```bash +usage: eval.py --pnet_ckpt PNET_CKPT --rnet_ckpt RNET_CKPT --onet_ckpt ONET_CKPT + +Evaluate MTCNN on FDDB dataset + +optional arguments: + --pnet_ckpt PNET_CKPT, -p PNET_CKPT checkpoint of PNet + --rnet_ckpt RNET_CKPT, -r RNET_CKPT checkpoint of RNet + --onet_ckpt ONET_CKPT, -o ONET_CKPT checkpoint of ONet +``` + +### 配置参数 + +```bash +config.py: + DATASET_DIR: 原始数据集文件夹 + FDDB_DIR: 验证数据集FDDB文件夹 + TRAIN_DATA_DIR: 训练数据集文件夹,保存用于生成mindrecord的临时数据文件 + MINDRECORD_DIR: mindrecord文件夹 + CKPT_DIR: checkpoint文件夹 + LOG_DIR: logs文件夹 + + RADIO_CLS_LOSS:classification loss比例 + RADIO_BOX_LOSS:box loss比例 + RADIO_LANDMARK_LOSS: landmark loss比例 + + TRAIN_BATCH_SIZE: 训练batch size大小 + TRAIN_LR: 默认学习率 + END_EPOCH: 训练轮数 + MIN_FACE_SIZE: 脸最小尺寸 + SCALE_FACTOR: 缩放比例 + P_THRESH: PNet阈值 + R_THRESH: RNet阈值 + O_THRESH: ONet阈值 +``` + +## 训练过程 + +在开始训练之前,需要先在主目录下创建dataset文件夹,按照数据集部分的步骤下载并保存原始数据集文件在dataset文件夹下。 + +dataset文件夹准备完毕后,即可开始数据预处理、训练集生成以及模型训练。 + +因为MTCNN由PNet, RNet和ONet三个子模型串联而成,因此整个训练过程分为三大步骤: + +### 1. 训练PNet + +```bash +# 预处理wider_face_train_bbx_gt.txt文件 +python preprocess.py + +# 生成用于训练PNet的mindrecord文件 +bash scripts/generate_train_mindrecord_pnet.sh +``` + +运行后,将产生`generate_pnet_mindrecord.log`日志文件,保存于`logs`文件夹下。 + +运行完成后,生成`PNet_train.mindrecord`文件,默认保存在`mindrecords`文件夹下。 + +#### 单卡训练PNet + +```bash +bash scripts/run_standalone_train_gpu.sh pnet [DEVICE_ID] [MINDRECORD_FILE] +# example: bash scripts/run_standalone_train_gpu.sh pnet 0 mindrecords/PNet_train.mindrecord +``` + +训练过程会在后台运行,训练模型将保存在`checkpoints`文件夹中,可以通过`logs/training_gpu_pnet.log`文件查看训练输出,输出结果如下所示: + +```bash +epoch: 2 step: 456, loss is 0.3661264 +epoch: 2 step: 457, loss is 0.32284224 +epoch: 2 step: 458, loss is 0.29254544 +epoch: 2 step: 459, loss is 0.32631972 +epoch: 2 step: 460, loss is 0.3065704 +epoch: 2 step: 461, loss is 0.3995605 +epoch: 2 step: 462, loss is 0.2614449 +epoch: 2 step: 463, loss is 0.50305885 +epoch: 2 step: 464, loss is 0.30908597 +··· +``` + +#### 多卡训练PNet + +```bash +bash scripts/run_distributed_train_gpu.sh pnet [DEVICE_NUM] [MINDRECORD_FILE] +# example: bash scripts/run_distributed_train_gpu.sh pnet 8 mindrecord/PNet_train.mindrecord +``` + +训练过程会在后台运行,只保存第一张卡的训练模型,训练模型将保存在`checkpoints`文件夹中,可以通过`logs/distribute_training_gpu_pnet.log`文件查看训练输出,输出结果如下所示: + +```bash +epoch: 2 step: 456, loss is 0.3661264 +epoch: 2 step: 457, loss is 0.32284224 +epoch: 2 step: 458, loss is 0.29254544 +epoch: 2 step: 459, loss is 0.32631972 +epoch: 2 step: 460, loss is 0.3065704 +epoch: 2 step: 461, loss is 0.3995605 +epoch: 2 step: 462, loss is 0.2614449 +epoch: 2 step: 463, loss is 0.50305885 +epoch: 2 step: 464, loss is 0.30908597 +... +``` + +### 2. 训练RNet + +``` bash +# 生成用于训练RNet的mindrecord文件 +bash scripts/generate_train_mindrecord_rnet.sh [PNET_CKPT] +# example: bash scripts/generate_train_mindrecord_rnet.sh checkpoints/pnet.ckpt +``` + +将产生`generate_rnet_mindrecord.log`日志文件,保存于`logs`文件夹下。 + +运行完成后,生成`RNet_train.mindrecord`文件,默认保存在`mindrecords`文件夹下。 + +#### 单卡训练RNet + +```bash +bash scripts/run_standalone_train_gpu.sh rnet [DEVICE_ID] [MINDRECORD_FILE] +# example: bash scripts/run_standalone_train_gpu.sh rnet 0 mindrecords/RNet_train.mindrecord +``` + +训练过程会在后台运行,训练模型将保存在`checkpoints`文件夹中,可以通过`logs/training_gpu_rnet.log`文件查看训练输出,输出结果如下所示: + +```bash +epoch: 1 step: 1189, loss is 0.4912308 +epoch: 1 step: 1190, loss is 0.52638006 +epoch: 1 step: 1191, loss is 0.44296187 +epoch: 1 step: 1192, loss is 0.522378 +epoch: 1 step: 1193, loss is 0.5238542 +epoch: 1 step: 1194, loss is 0.49850246 +epoch: 1 step: 1195, loss is 0.47963354 +epoch: 1 step: 1196, loss is 0.49311465 +epoch: 1 step: 1197, loss is 0.45008135 +··· +``` + +#### 多卡训练RNet + +```bash +bash scripts/run_distributed_train_gpu.sh rnet [DEVICE_NUM] [MINDRECORD_FILE] +# example: bash scripts/run_distributed_train_gpu.sh rnet 8 mindrecord/RNet_train.mindrecord +``` + +训练过程会在后台运行,只保存第一张卡的训练模型,训练模型将保存在`checkpoints`文件夹中,可以通过`logs/distribute_training_gpu_rnet.log`文件查看训练输出,输出结果如下所示: + +```bash +epoch: 1 step: 1189, loss is 0.4912308 +epoch: 1 step: 1190, loss is 0.52638006 +epoch: 1 step: 1191, loss is 0.44296187 +epoch: 1 step: 1192, loss is 0.522378 +epoch: 1 step: 1193, loss is 0.5238542 +epoch: 1 step: 1194, loss is 0.49850246 +epoch: 1 step: 1195, loss is 0.47963354 +epoch: 1 step: 1196, loss is 0.49311465 +epoch: 1 step: 1197, loss is 0.45008135 +... +``` + +### 3. 训练ONet + +``` bash +# 生成用于训练ONet的mindrecord文件 +bash scripts/generate_train_mindrecord_onet.sh [PNET_CKPT] [RNET_CKPT] +# example: bash scripts/generate_train_mindrecord_rnet.sh checkpoints/pnet.ckpt checkpoints/rnet.ckpt +``` + +将产生`generate_onet_mindrecord.log`日志文件,保存于`logs`文件夹下。 + +运行完成后,生成`ONet_train.mindrecord`文件,默认保存在`mindrecords`文件夹下。 + +#### 单卡训练ONet + +```bash +bash scripts/run_standalone_train_gpu.sh onet [DEVICE_ID] [MINDRECORD_FILE] +# example: bash scripts/run_standalone_train_gpu.sh onet 0 mindrecords/ONet_train.mindrecord +``` + +训练过程会在后台运行,训练模型将保存在`checkpoints`文件夹中,可以通过`logs/training_gpu_onet.log`文件查看训练输出,输出结果如下所示: + +```bash +epoch: 1 step: 561, loss is 0.1627587 +epoch: 1 step: 562, loss is 0.20395292 +epoch: 1 step: 563, loss is 0.24887425 +epoch: 1 step: 564, loss is 0.31067476 +epoch: 1 step: 565, loss is 0.20113933 +epoch: 1 step: 566, loss is 0.2834522 +epoch: 1 step: 567, loss is 0.18775874 +epoch: 1 step: 568, loss is 0.2714229 +epoch: 1 step: 569, loss is 0.22088407 +epoch: 1 step: 570, loss is 0.22690454 +··· +``` + +#### 多卡训练ONet + +```bash +bash scripts/run_distributed_train_gpu.sh onet [DEVICE_NUM] [MINDRECORD_FILE] +# example: bash scripts/run_distributed_train_gpu.sh onet 8 mindrecord/ONet_train.mindrecord +``` + +训练过程会在后台运行,只保存第一张卡的训练模型,训练模型将保存在`checkpoints`文件夹中,可以通过`logs/distribute_training_gpu_onet.log`文件查看训练输出,输出结果如下所示: + +```bash +epoch: 1 step: 561, loss is 0.1627587 +epoch: 1 step: 562, loss is 0.20395292 +epoch: 1 step: 563, loss is 0.24887425 +epoch: 1 step: 564, loss is 0.31067476 +epoch: 1 step: 565, loss is 0.20113933 +epoch: 1 step: 566, loss is 0.2834522 +epoch: 1 step: 567, loss is 0.18775874 +epoch: 1 step: 568, loss is 0.2714229 +epoch: 1 step: 569, loss is 0.22088407 +epoch: 1 step: 570, loss is 0.22690454 +... +``` + +## 评估过程 + +```bash +bash scripts/run_eval_gpu.sh [PNET_CKPT] [RNET_CKPT] [ONET_CKPT] +# example: bash scripts/run_eval_gpu.sh checkpoints/pnet.ckpt checkpoints/rnet.ckpt checkpoints/onet.ckpt +``` + +评估过程会在后台进行,评估结果可以通过`logs/eval_gpu.log`文件查看,输出结果如下所示: + +```bash +==================== Results ==================== +FDDB-fold-1 Val AP: 0.846041313059397 +FDDB-fold-2 Val AP: 0.8452332863014286 +FDDB-fold-3 Val AP: 0.854312327697665 +FDDB-fold-4 Val AP: 0.8449615417375469 +FDDB-fold-5 Val AP: 0.868903617729559 +FDDB-fold-6 Val AP: 0.8857753502792894 +FDDB-fold-7 Val AP: 0.8200462708769559 +FDDB-fold-8 Val AP: 0.8390865359172448 +FDDB-fold-9 Val AP: 0.8584513847530266 +FDDB-fold-10 Val AP: 0.8363366158400566 +FDDB Dataset Average AP: 0.8499148244192171 +================================================= +``` + +# 模型描述 + +## 性能 + +| 参数 | MTCNN | +| -------------------- | ------------------------------------------------------- | +| 资源 | GPU(Tesla V100 SXM2),CPU 2.1GHz 24cores,Memory 128G| +| 上传日期 | 2022-08-05 | +| MindSpore版本 | 1.8.0 | +| 数据集 | WIDER Face, Dataset of Deep Convolutional Network Cascade for Facial Point Detection, FDDB | +| 训练参数 | PNet: epoch=30,batch_size=384, lr=0.001; RNet: epoch=22, batch_size=384, lr=0.001; ONet: epoch=22, batch_size=384, lr=0.001 | +| 优化器 | Adam | +| 损失函数 | SoftmaxCrossEntropyWithLogits, MSELoss | +| 输出 | 类别,坐标 | +| 损失 | PNet: 0.20 RNet: 0.15 ONet: 0.04 | +| 速度 | PNet: 6毫秒/步 RNet: 8毫秒/步 ONet: 18毫秒/步 | +| 总时长 | 8时40分(单卡);1时22分(八卡) | +| 微调检查点 | PNet: 1M (.ckpt文件) RNet: 2M (.ckpt文件) ONet: 6M (.ckpt文件) | + diff --git a/research/cv/MTCNN/config.py b/research/cv/MTCNN/config.py new file mode 100644 index 0000000000000000000000000000000000000000..47e666a9538ba486bab8d2e2301ecc19722b2b1c --- /dev/null +++ b/research/cv/MTCNN/config.py @@ -0,0 +1,52 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +import os + +# Path of original dataset +DATASET_DIR = "XXXXXXXXXXXXXX/dataset" + +# Path of FDDB dataset +FDDB_DIR = os.path.join(DATASET_DIR, 'FDDB') + +# Store train data +TRAIN_DATA_DIR = os.path.join(DATASET_DIR, "train_data") + +# Path of mindrecords +MINDRECORD_DIR = os.path.dirname(os.path.realpath(__file__)) + "/mindrecords" + +# Path of checkpoints +CKPT_DIR = os.path.dirname(os.path.realpath(__file__)) + "/checkpoints" + +# Configure the ratio of each loss +RADIO_CLS_LOSS = 1.0 +RADIO_BOX_LOSS = 0.5 +RADIO_LANDMARK_LOSS = 0.5 + +# Path to store logs +LOG_DIR = os.path.dirname(os.path.realpath(__file__)) + "/log" + +TRAIN_BATCH_SIZE = 384 + +TRAIN_LR = 0.001 + +END_EPOCH = 30 + +MIN_FACE_SIZE = 20 +SCALE_FACTOR = 0.79 + +P_THRESH = 0.6 +R_THRESH = 0.7 +O_THRESH = 0.7 diff --git a/research/cv/MTCNN/eval.py b/research/cv/MTCNN/eval.py new file mode 100644 index 0000000000000000000000000000000000000000..b33ac209acce07949508a09ac5fdb4ffe2517544 --- /dev/null +++ b/research/cv/MTCNN/eval.py @@ -0,0 +1,86 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +import os +import argparse + +from src.evaluate import evaluation +from src.models.mtcnn_detector import MtcnnDetector +from src.models.mtcnn import PNet, RNet, ONet + +from mindspore import load_checkpoint, load_param_into_net, context +import config as cfg + + +def parse_args(): + parser = argparse.ArgumentParser(description="Evaluate MTCNN on FDDB dataset") + parser.add_argument('--pnet_ckpt', '-p', required=True, help="checkpoint of PNet") + parser.add_argument('--rnet_ckpt', '-r', required=True, help="checkpoint of RNet") + parser.add_argument('--onet_ckpt', '-o', required=True, help="checkpoint of ONet") + + args = parser.parse_args() + return args + +def main(args): + context.set_context(device_target='GPU') + + pnet = PNet() + pnet_params = load_checkpoint(args.pnet_ckpt) + load_param_into_net(pnet, pnet_params) + pnet.set_train(False) + + rnet = RNet() + rnet_params = load_checkpoint(args.rnet_ckpt) + load_param_into_net(rnet, rnet_params) + rnet.set_train(False) + + onet = ONet() + onet_params = load_checkpoint(args.onet_ckpt) + load_param_into_net(onet, onet_params) + onet.set_train(False) + + mtcnn_detector = MtcnnDetector(pnet, rnet, onet) + + FDDB_out_dir = os.path.join(cfg.DATASET_DIR, 'FDDB_out') + if not os.path.exists(FDDB_out_dir): + os.mkdir(FDDB_out_dir) + + print("Start detecting FDDB images") + for i in range(1, 11): + if not os.path.exists(os.path.join(FDDB_out_dir, str(i))): + os.mkdir(os.path.join(FDDB_out_dir, str(i))) + file_path = os.path.join(cfg.FDDB_DIR, 'FDDB-folds', 'FDDB-fold-%02d.txt' % i) + with open(file_path, 'r') as f: + lines = f.readlines() + for line in lines: + line = line.strip('\n') + image_path = os.path.join(cfg.FDDB_DIR, line) + '.jpg' + line = line.replace('/', '_') + with open(os.path.join(FDDB_out_dir, str(i), line + '.txt'), 'w') as w: + w.write(line) + w.write('\n') + boxes_c, _ = mtcnn_detector.detect_face(image_path) + if boxes_c is not None: + w.write(str(boxes_c.shape[0])) + w.write('\n') + for box in boxes_c: + w.write(f'{int(box[0])} {int(box[1])} {int(box[2]-box[0])} {int(box[3]-box[1])} {box[4]}\n') + print("Detection Done!") + print("Start evluation!") + evaluation(FDDB_out_dir, os.path.join(cfg.FDDB_DIR, 'FDDB-folds')) + + +if __name__ == '__main__': + main(parse_args()) diff --git a/research/cv/MTCNN/preprocess.py b/research/cv/MTCNN/preprocess.py new file mode 100644 index 0000000000000000000000000000000000000000..07e3a079c1b788239e26425f8d3a31a32989f858 --- /dev/null +++ b/research/cv/MTCNN/preprocess.py @@ -0,0 +1,56 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +import argparse +import numpy as np + +parser = argparse.ArgumentParser(description="Preprocess WIDER Face Annotation file") +parser.add_argument('-f', type=str, default='dataset/wider_face_train_bbx_gt.txt', + help="Original wider face train annotation file") +args = parser.parse_args() + +wider_face_train = open('dataset/wider_face_train.txt', 'w') + +with open(args.f, 'r') as f: + lines = f.readlines() + total_num = len(lines) + + i = 0 + while i < total_num: + image_name = lines[i].strip().rstrip('.jpg') + wider_face_train.write(image_name + ' ') + face_num = int(lines[i+1].strip()) + if face_num == 0: + for _ in range(4): + wider_face_train.write(str(0.0) + ' ') + wider_face_train.write('\n') + i = i + 3 + continue + box_list = [] + for j in range(face_num): + box = lines[i+2+j].split(' ') + x = float(box[0]) + y = float(box[1]) + w = float(box[2]) + h = float(box[3]) + box_list.append([x, y, x + w, y + h]) + box_list = np.array(box_list).flatten() + for num in box_list: + wider_face_train.write(str(num) + ' ') + wider_face_train.write('\n') + i = i + face_num + 2 + +wider_face_train.close() +print("wider_face_train.txt has been successfully created in dataset dir!!") diff --git a/research/cv/MTCNN/requirement.txt b/research/cv/MTCNN/requirement.txt new file mode 100644 index 0000000000000000000000000000000000000000..e17a72be41d982c8c26e825cf55a9f5d846d8f22 --- /dev/null +++ b/research/cv/MTCNN/requirement.txt @@ -0,0 +1,3 @@ +opencv-python==4.5.5.64 +tqdm==4.64.0 +mindspore-gpu==1.8.0 \ No newline at end of file diff --git a/research/cv/MTCNN/scripts/generate_train_mindrecord_onet.sh b/research/cv/MTCNN/scripts/generate_train_mindrecord_onet.sh new file mode 100644 index 0000000000000000000000000000000000000000..b4c3c4b8744b2d274b895481ddda520ea815074a --- /dev/null +++ b/research/cv/MTCNN/scripts/generate_train_mindrecord_onet.sh @@ -0,0 +1,41 @@ +#!/bin/bash +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +echo "==============================================================================================================" +echo "Please run the script as: " +echo "bash generate_train_mindrecord_onet.sh PNET_CKPT RNET_CKPT" +echo "for example: bash generate_train_mindrecord_onet.sh pnet.ckpt rnet.ckpt" +echo "==============================================================================================================" + +if [ $# -lt 2 ]; +then + echo "---------------------ERROR----------------------" + echo "You must specify the checkpoint of PNet and RNet" + exit +fi + +PNET_CKPT=$1 +RNET_CKPT=$2 + +PROJECT_DIR=$(cd "$(dirname "$0")" || exit; pwd) +LOG_DIR=$PROJECT_DIR/../logs + +cd $PROJECT_DIR/../ || exit + +python -m src.prepare_data.generate_ONet_data \ + --pnet_ckpt $PNET_CKPT \ + --rnet_ckpt $RNET_CKPT > $LOG_DIR/generate_onet_mindrecord.log 2>&1 & +echo "The data log is at /logs/generate_onet_mindrecord.log" diff --git a/research/cv/MTCNN/scripts/generate_train_mindrecord_pnet.sh b/research/cv/MTCNN/scripts/generate_train_mindrecord_pnet.sh new file mode 100644 index 0000000000000000000000000000000000000000..a0ce559b1eea2a46839adb44d0755e8d61af7118 --- /dev/null +++ b/research/cv/MTCNN/scripts/generate_train_mindrecord_pnet.sh @@ -0,0 +1,33 @@ +#!/bin/bash +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +echo "==============================================================================================================" +echo "Please run the script as: " +echo "bash generate_train_mindrecord_pnet.sh" +echo "==============================================================================================================" + +PROJECT_DIR=$(cd "$(dirname "$0")" || exit; pwd) +LOG_DIR=$PROJECT_DIR/../logs + +if [ ! -d $LOG_DIR ]; +then + mkdir $LOG_DIR +fi + +cd $PROJECT_DIR/../ || exit + +python -m src.prepare_data.generate_PNet_data > $LOG_DIR/generate_pnet_mindrecord.log 2>&1 & +echo "The data log is at /logs/generate_pnet_mindrecord.log" diff --git a/research/cv/MTCNN/scripts/generate_train_mindrecord_rnet.sh b/research/cv/MTCNN/scripts/generate_train_mindrecord_rnet.sh new file mode 100644 index 0000000000000000000000000000000000000000..426b5b61c9a821959e4412b96ec80af5acfe7c5e --- /dev/null +++ b/research/cv/MTCNN/scripts/generate_train_mindrecord_rnet.sh @@ -0,0 +1,40 @@ +#!/bin/bash +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +echo "==============================================================================================================" +echo "Please run the script as: " +echo "bash generate_train_mindrecord_rnet.sh PNET_CKPT" +echo "for example: bash generate_train_mindrecord_rnet.sh pnet.ckpt" +echo "==============================================================================================================" + +if [ $# -lt 1 ]; +then + echo "---------------------ERROR----------------------" + echo "You must specify the checkpoint of PNet" + exit +fi + +PNET_CKPT=$1 + +PROJECT_DIR=$(cd "$(dirname "$0")" || exit; pwd) +LOG_DIR=$PROJECT_DIR/../logs + +cd $PROJECT_DIR/../ || exit + +python -m src.prepare_data.generate_RNet_data \ + --pnet_ckpt $PNET_CKPT > $LOG_DIR/generate_rnet_mindrecord.log 2>&1 & + +echo "The data log is at /logs/generate_rnet_mindrecord.log" diff --git a/research/cv/MTCNN/scripts/run_distribute_train_gpu.sh b/research/cv/MTCNN/scripts/run_distribute_train_gpu.sh new file mode 100644 index 0000000000000000000000000000000000000000..98edf35b38789ef66b9b4b66f5cd79c4a78c69c8 --- /dev/null +++ b/research/cv/MTCNN/scripts/run_distribute_train_gpu.sh @@ -0,0 +1,62 @@ +#!/bin/bash +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +echo "==============================================================================================================" +echo "Please run the script as: " +echo "bash run_distributed_train_gpu.sh MODEL_NAME(pnet|rnet|onet) DEVICE_NUM MINDRECORD_FILE" +echo "for example train PNet: bash run_distributed_train_gpu.sh pnet 8 pnet_train.mindrecord" +echo "==============================================================================================================" + +MODEL=$1 +DEVICE_NUM=$2 +MINDRECORD_FILE=$3 + +if [ $# -lt 3 ]; +then + echo "---------------------ERROR----------------------" + echo "You must specify model name and number of gpu devices and mindrecord file for training" + exit +fi + +PROJECT_DIR=$(cd "$(dirname "$0")" || exit; pwd) +LOG_DIR=$PROJECT_DIR/../logs +if [ ! -d $LOG_DIR ] +then + mkdir $LOG_DIR +fi + +if [ $MODEL == "pnet" ]; +then + END_EPOCH=32 +elif [ $MODEL == 'rnet' ]; +then + END_EPOCH=24 +else + END_EPOCH=24 +fi + +export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python + +mpirun -n $DEVICE_NUM --allow-run-as-root python $PROJECT_DIR/../train.py \ + --distribute \ + --device_target GPU \ + --end_epoch $END_EPOCH \ + --model $MODEL \ + --mindrecord_file $MINDRECORD_FILE \ + --num_workers 8 \ + --save_ckpt_steps 100 > $LOG_DIR/distribute_training_gpu_$MODEL.log 2>&1 & + +echo "The distributed train log is at /logs/distribute_training_gpu_$MODEL.log" diff --git a/research/cv/MTCNN/scripts/run_eval_gpu.sh b/research/cv/MTCNN/scripts/run_eval_gpu.sh new file mode 100644 index 0000000000000000000000000000000000000000..34da5cb96d218952c925fd1b64243f99fc48ea45 --- /dev/null +++ b/research/cv/MTCNN/scripts/run_eval_gpu.sh @@ -0,0 +1,43 @@ +#!/bin/bash +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +echo "==============================================================================================================" +echo "Please run the script as: " +echo "bash run_eval_gpu.sh PNET_CKPT RNET_CKPT ONET_CKPT" +echo "for example: bash run_eval_gpu.sh pnet.ckpt rnet.ckpt onet.ckpt" +echo "==============================================================================================================" + +if [ $# -lt 3 ]; +then + echo "---------------------ERROR----------------------" + echo "You must specify PNet checkpoint, RNet checkpoint and ONet checkpoint" + exit +fi + +PNET=$1 +RNET=$2 +ONET=$3 + +PROJECT_DIR=$(cd "$(dirname "$0")" || exit; pwd) +LOG_DIR=$PROJECT_DIR/../logs +if [ ! -d $LOG_DIR ] +then + mkdir $LOG_DIR +fi + +export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python + +python $PROJECT_DIR/../eval.py -p=$PNET -r=$RNET -o=$ONET > $LOG_DIR/eval_gpu.log 2>&1 & diff --git a/research/cv/MTCNN/scripts/run_standalone_train_gpu.sh b/research/cv/MTCNN/scripts/run_standalone_train_gpu.sh new file mode 100644 index 0000000000000000000000000000000000000000..a4009ca21c961c9aea296d9ee39e60e26370792f --- /dev/null +++ b/research/cv/MTCNN/scripts/run_standalone_train_gpu.sh @@ -0,0 +1,61 @@ +#!/bin/bash +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +echo "==============================================================================================================" +echo "Please run the script as: " +echo "bash run_standalone_train_gpu.sh MODEL_NAME(pnet|rnet|onet) DEVICE_ID MINDRECORD_FILE" +echo "for example train PNet: bash run_standalone_train_gpu.sh pnet 0 pnet_train.mindrecord" +echo "==============================================================================================================" + +MODEL=$1 +DEVICE_ID=$2 +MINDRECORD_FILE=$3 + +if [ $# -lt 3 ]; +then + echo "---------------------ERROR----------------------" + echo "You must specify model name and gpu device and mindrecord file for training" + exit +fi + +PROJECT_DIR=$(cd "$(dirname "$0")" || exit; pwd) +LOG_DIR=$PROJECT_DIR/../logs +if [ ! -d $LOG_DIR ] +then + mkdir $LOG_DIR +fi + +export DEVICE_ID=$DEVICE_ID +export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python + +if [ $MODEL == "pnet" ]; +then + END_EPOCH=30 +elif [ $MODEL == 'rnet' ]; +then + END_EPOCH=22 +else + END_EPOCH=22 +fi + +python $PROJECT_DIR/../train.py \ + --device_target GPU \ + --end_epoch $END_EPOCH \ + --model $MODEL \ + --mindrecord_file $MINDRECORD_FILE \ + --num_workers 8 > $LOG_DIR/training_gpu_$MODEL.log 2>&1 & + +echo "The standalone train log is at /logs/training_gpu_$MODEL.log" diff --git a/research/cv/MTCNN/src/dataset.py b/research/cv/MTCNN/src/dataset.py new file mode 100644 index 0000000000000000000000000000000000000000..d3f82f0366af6b65c1883a016806da54e12c6837 --- /dev/null +++ b/research/cv/MTCNN/src/dataset.py @@ -0,0 +1,39 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +from mindspore.dataset import MindDataset +try: + from mindspore.dataset.vision import Decode, Normalize, HWC2CHW +except ImportError as error: + from mindspore.dataset.vision.c_transforms import Decode, Normalize, HWC2CHW + + +def create_train_dataset(mindrecord_file, batch_size=128, device_num=1, rank_id=0, num_workers=8, do_shuffle=True): + """Create MTCNN dataset with mindrecord for training""" + ds = MindDataset( + mindrecord_file, + num_shards=device_num, + columns_list=['image', 'label', 'box_target', 'landmark_target'], + shard_id=rank_id, + num_parallel_workers=num_workers, + shuffle=do_shuffle + ) + + op_list = [Decode(), lambda x: x[:, :, (2, 1, 0)], + Normalize(mean=[127.5, 127.5, 127.5], std=[128.0, 128.0, 128.0]), HWC2CHW()] + ds = ds.map(operations=op_list, input_columns=['image']) + ds = ds.batch(batch_size, drop_remainder=True) + + return ds diff --git a/research/cv/MTCNN/src/evaluate.py b/research/cv/MTCNN/src/evaluate.py new file mode 100644 index 0000000000000000000000000000000000000000..60d0b9b77ce451022a245f14a2ecfbf112724ef3 --- /dev/null +++ b/research/cv/MTCNN/src/evaluate.py @@ -0,0 +1,284 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +import os +import argparse +import tqdm +import numpy as np +import cv2 + + +def bbox_overlaps(boxes, query_boxes): + """ + Parameters + ---------- + boxes: (N, 4) ndarray of float + query_boxes: (K, 4) ndarray of float + Returns + ------- + overlaps: (N, K) ndarray of overlap between boxes and query_boxes + """ + N = boxes.shape[0] + K = query_boxes.shape[0] + overlaps = np.zeros((N, K), dtype=np.float32) + for k in range(K): + box_area = ( + (query_boxes[k, 2] - query_boxes[k, 0] + 1) * + (query_boxes[k, 3] - query_boxes[k, 1] + 1) + ) + for n in range(N): + iw = ( + min(boxes[n, 2], query_boxes[k, 2]) - + max(boxes[n, 0], query_boxes[k, 0]) + 1 + ) + if iw > 0: + ih = ( + min(boxes[n, 3], query_boxes[k, 3]) - + max(boxes[n, 1], query_boxes[k, 1]) + 1 + ) + if ih > 0: + ua = float( + (boxes[n, 2] - boxes[n, 0] + 1) * + (boxes[n, 3] - boxes[n, 1] + 1) + + box_area - iw * ih + ) + overlaps[n, k] = iw * ih / ua + return overlaps + + +def get_gt_boxes(gt_dir): + gt_dict = {} + for i in range(1, 11): + filename = os.path.join(gt_dir, 'FDDB-fold-{}-ellipseList.txt'.format('%02d' % i)) + assert os.path.exists(filename) + gt_sub_dict = {} + annotationfile = open(filename) + while True: + filename = annotationfile.readline()[:-1].replace('/', '_') + if not filename: + break + line = annotationfile.readline() + if not line: + break + facenum = int(line) + face_loc = [] + for _ in range(facenum): + line = annotationfile.readline().strip().split() + major_axis_radius = float(line[0]) + minor_axis_radius = float(line[1]) + angle = float(line[2]) + center_x = float(line[3]) + center_y = float(line[4]) + _ = float(line[5]) + angle = angle / 3.1415926 * 180 + mask = np.zeros((1000, 1000), dtype=np.uint8) + cv2.ellipse(mask, ((int)(center_x), (int)(center_y)), + ((int)(major_axis_radius), (int)(minor_axis_radius)), angle, 0., 360., (255, 255, 255)) + contours, _ = cv2.findContours(mask, cv2.RETR_LIST, cv2.CHAIN_APPROX_NONE)[-2:] + r = cv2.boundingRect(contours[0]) + x_min = r[0] + y_min = r[1] + x_max = r[0] + r[2] + y_max = r[1] + r[3] + face_loc.append([x_min, y_min, x_max, y_max]) + face_loc = np.array(face_loc) + + gt_sub_dict[filename] = face_loc + gt_dict[i] = gt_sub_dict + return gt_dict + +def read_pred_file(filepath): + with open(filepath, 'r') as f: + lines = f.readlines() + img_file = lines[0].rstrip('\n') + lines = lines[2:] + boxes = [] + for line in lines: + line = line.rstrip('\n').split(' ') + if line[0] == '': + continue + boxes.append([float(line[0]), float(line[1]), float(line[2]), float(line[3]), float(line[4])]) + boxes = np.array(boxes) + return img_file.split('/')[-1], boxes + +def get_preds_box(pred_dir): + events = os.listdir(pred_dir) + boxes = dict() + pbar = tqdm.tqdm(events) + for event in pbar: + pbar.set_description('Reading Predictions Boxes') + event_dir = os.path.join(pred_dir, event) + event_images = os.listdir(event_dir) + current_event = dict() + for imgtxt in event_images: + imgname, _boxes = read_pred_file(os.path.join(event_dir, imgtxt)) + current_event[imgname.rstrip('.jpg')] = _boxes + boxes[event] = current_event + return boxes + +def norm_score(pred): + """ norm score + pred {key: [[x1,y1,x2,y2,s]]} + """ + + max_score = 0 + min_score = 1 + + for _, k in pred.items(): + for _, v in k.items(): + if v.size == 0: + continue + _min = np.min(v[:, -1]) + _max = np.max(v[:, -1]) + max_score = max(_max, max_score) + min_score = min(_min, min_score) + + diff = max_score - min_score + for _, k in pred.items(): + for _, v in k.items(): + if v.size == 0: + continue + v[:, -1] = (v[:, -1] - min_score) / diff + +def image_eval(pred, gt, ignore, iou_thresh): + """ single image evaluation + pred: Nx5 + gt: Nx4 + ignore: + """ + + _pred = pred.copy() + _gt = gt.copy() + pred_recall = np.zeros(_pred.shape[0]) + recall_list = np.zeros(_gt.shape[0]) + proposal_list = np.ones(_pred.shape[0]) + + _pred[:, 2] = _pred[:, 2] + _pred[:, 0] + _pred[:, 3] = _pred[:, 3] + _pred[:, 1] + + overlaps = bbox_overlaps(_pred[:, :4], _gt) + + for h in range(_pred.shape[0]): + gt_overlap = overlaps[h] + max_overlap, max_idx = gt_overlap.max(), gt_overlap.argmax() + if max_overlap >= iou_thresh: + if ignore[max_idx] == 0: + recall_list[max_idx] = -1 + proposal_list[h] = -1 + elif recall_list[max_idx] == 0: + recall_list[max_idx] = 1 + + r_keep_index = np.where(recall_list == 1)[0] + pred_recall[h] = len(r_keep_index) + return pred_recall, proposal_list + + +def img_pr_info(thresh_num, pred_info, proposal_list, pred_recall): + pr_info = np.zeros((thresh_num, 2)).astype('float') + for t in range(thresh_num): + + thresh = 1 - (t + 1) / thresh_num + r_index = np.where(pred_info[:, 4] >= thresh)[0] + if r_index.size == 0: + pr_info[t, 0] = 0 + pr_info[t, 1] = 0 + else: + r_index = r_index[-1] + p_index = np.where(proposal_list[:r_index + 1] == 1)[0] + pr_info[t, 0] = len(p_index) + pr_info[t, 1] = pred_recall[r_index] + return pr_info + +def dataset_pr_info(thresh_num, pr_curve, count_face): + _pr_curve = np.zeros((thresh_num, 2)) + + for i in range(thresh_num): + _pr_curve[i, 0] = pr_curve[i, 1] / pr_curve[i, 0] + _pr_curve[i, 1] = pr_curve[i, 1] / count_face + return _pr_curve + + +def voc_ap(rec, prec): + # correct AP calculation + # first append sentinel values at the end + mrec = np.concatenate(([0.], rec, [1.])) + mpre = np.concatenate(([0.], prec, [0.])) + + # compute the precision envelope + for i in range(mpre.size - 1, 0, -1): + mpre[i - 1] = np.maximum(mpre[i - 1], mpre[i]) + + # to calculate area under PR curve, look for points + # where X axis (recall) changes value + i = np.where(mrec[1:] != mrec[:-1])[0] + + # and sum (\Delta recall) * prec + ap = np.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1]) + return ap + +def evaluation(pred, gt_path, iou_thresh=0.5): + pred = get_preds_box(pred) + norm_score(pred) + gt_box_dict = get_gt_boxes(gt_path) + event = list(pred.keys()) + event = [int(e) for e in event] + event.sort() + thresh_num = 1000 + aps = [] + + pbar = tqdm.tqdm(range(len(event))) + for setting_id in pbar: + pbar.set_description('Predicting ... ') + # different setting + count_face = 0 + pr_curve = np.zeros((thresh_num, 2)).astype('float') + gt = gt_box_dict[event[setting_id]] + pred_list = pred[str(event[setting_id])] + gt_list = list(gt.keys()) + for j in range(len(gt_list)): + gt_boxes = gt[gt_list[j]].astype('float') # from image name get gt boxes + pred_info = pred_list[gt_list[j]] + keep_index = np.array(range(1, len(gt_boxes) + 1)) + count_face += len(keep_index) + ignore = np.zeros(gt_boxes.shape[0]) + if gt_boxes.size == 0 or pred_info.size == 0: + continue + if keep_index.size != 0: + ignore[keep_index - 1] = 1 + pred_recall, proposal_list = image_eval(pred_info, gt_boxes, ignore, iou_thresh) + + _img_pr_info = img_pr_info(thresh_num, pred_info, proposal_list, pred_recall) + + pr_curve += _img_pr_info + pr_curve = dataset_pr_info(thresh_num, pr_curve, count_face) + + propose = pr_curve[:, 0] + recall = pr_curve[:, 1] + + ap = voc_ap(recall, propose) + aps.append(ap) + + print("==================== Results ====================") + for i in range(len(aps)): + print("FDDB-fold-{} Val AP: {}".format(event[i], aps[i])) + print("FDDB Dataset Average AP: {}".format(sum(aps)/len(aps))) + print("=================================================") + +if __name__ == "__main__": + parser = argparse.ArgumentParser() + parser.add_argument('--pred') + parser.add_argument('--gt') + args = parser.parse_args() + evaluation(args.pred, args.gt) diff --git a/research/cv/MTCNN/src/loss.py b/research/cv/MTCNN/src/loss.py new file mode 100644 index 0000000000000000000000000000000000000000..7cc4416dceb44eac7f7c974c71fb32fdad8e58d3 --- /dev/null +++ b/research/cv/MTCNN/src/loss.py @@ -0,0 +1,99 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +from mindspore import nn, ops +import mindspore +import numpy + +class ClassLoss(nn.Cell): + def __init__(self, valid_class_num=2): + super(ClassLoss, self).__init__() + self.one_hot = nn.OneHot(depth=valid_class_num) + self.keep_ratio = 0.7 + self.sort_descending = ops.Sort(descending=True) + self.reduce_sum = ops.ReduceSum() + self.stack = ops.Stack() + self.gather = ops.Gather() + + def construct(self, gt_label, class_out): + """ + gt_label: shape=(B) + class_out: shape=(B, 2) + """ + # Keep neg 0 and pos 1 data, ignore part -1, landmark -2 + valid_label = ops.select(gt_label >= 0, 1, ops.zeros_like(gt_label)) + num_valid = valid_label.sum() + valid_class_out = class_out * valid_label.expand_dims(-1) + keep_num = (num_valid * self.keep_ratio).astype(mindspore.int32) + one_hot_label = self.one_hot(gt_label * valid_label) + loss = ops.SoftmaxCrossEntropyWithLogits()(valid_class_out, one_hot_label)[0] * \ + valid_label.astype(valid_class_out.dtype) + + value, _ = self.sort_descending(loss) + min_score = value[keep_num] + mask = self.cast(loss > min_score, mindspore.float32) + mask = ops.stop_gradient(mask) + + return self.reduce_sum(loss * mask) / keep_num + + +class BoxLoss(nn.Cell): + def __init__(self): + super(BoxLoss, self).__init__() + self.loss_box = nn.MSELoss(reduction='none') + self.abs = ops.Abs() + self.reduce_sum = ops.ReduceSum() + + def construct(self, gt_label, gt_offset, pred_offset): + # Keep pos 1 and part -1 + valid_label = ops.select(self.abs(gt_label) == 1, 1, ops.zeros_like(gt_label)) + + keep_num = valid_label.sum() + loss = self.loss_box(pred_offset, gt_offset) + loss = loss.sum(axis=1) + loss = loss * valid_label + # top k + return self.reduce_sum(loss) / keep_num + +class LandmarkLoss(nn.Cell): + def __init__(self): + super(LandmarkLoss, self).__init__() + self.loss_landmark = nn.MSELoss(reduction='none') + self.reduce_sum = ops.ReduceSum() + + def construct(self, gt_label, gt_landmark, pred_landmark): + # Keep landmark -2 + valid_label = ops.select(gt_label == -2, 1, ops.zeros_like(gt_label)) + + keep_num = valid_label.sum() + loss = self.loss_landmark(pred_landmark, gt_landmark) + loss = loss.sum(axis=1) + loss = loss * valid_label + + return self.reduce_sum(loss) / keep_num + +# Calculate accuracy while training +def accuracy(pred_label, gt_label): + pred_label = pred_label.asnumpy() + pred_label = pred_label.argmax(axis=1) + gt_label = gt_label.asnumpy() + zeros = numpy.zeros(gt_label.shape) + cond = numpy.greater_equal(gt_label, zeros) + picked = numpy.where(cond) + valid_gt_label = gt_label[picked] + valid_pred_label = pred_label[picked] + + acc = numpy.sum(valid_pred_label == valid_gt_label, dtype=numpy.float32) / valid_gt_label.shape[0] + return acc diff --git a/research/cv/MTCNN/src/models/mtcnn.py b/research/cv/MTCNN/src/models/mtcnn.py new file mode 100644 index 0000000000000000000000000000000000000000..2c4348d2a6358c646518f78bbfabf2aa4c2808b3 --- /dev/null +++ b/research/cv/MTCNN/src/models/mtcnn.py @@ -0,0 +1,265 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +from src.loss import BoxLoss, LandmarkLoss, ClassLoss +import config as cfg + +from mindspore import nn, ops +from mindspore.common.initializer import initializer, HeNormal, Uniform + + +class PNet(nn.Cell): + """fast Proposal Network(P-Net)""" + def __init__(self): + super(PNet, self).__init__() + + self.conv1 = nn.Conv2d(3, 10, 3, 1, has_bias=True, pad_mode='valid') + self.prelu1 = nn.PReLU() + self.pool1 = nn.MaxPool2d(kernel_size=2, stride=2, pad_mode='same') + + self.conv2 = nn.Conv2d(10, 16, 3, 1, has_bias=True, pad_mode='valid') + self.prelu2 = nn.PReLU() + + self.conv3 = nn.Conv2d(16, 32, 3, 1, has_bias=True, pad_mode='valid') + self.prelu3 = nn.PReLU() + + # detection + self.conv4_1 = nn.Conv2d(32, 2, 1, 1, has_bias=True, pad_mode='valid') + # bounding box regression + self.conv4_2 = nn.Conv2d(32, 4, 1, 1, has_bias=True, pad_mode='valid') + # landmark regression + self.conv4_3 = nn.Conv2d(32, 10, 1, 1, has_bias=True, pad_mode='valid') + + for cell in self.cells(): + if isinstance(cell, nn.Conv2d): + cell.weight.set_data(initializer(HeNormal(mode='fan_out', nonlinearity='relu'), cell.weight.shape)) + cell.bias.set_data(initializer(Uniform(), cell.bias.shape)) + self.squeeze = ops.Squeeze() + + def construct(self, x): + x = self.prelu1(self.conv1(x)) + x = self.pool1(x) + x = self.prelu2(self.conv2(x)) + x = self.prelu3(self.conv3(x)) + + # Output classification result + label = self.conv4_1(x) + # box regression + offset = self.conv4_2(x) + # landmark regression + landmark = self.conv4_3(x) + return self.squeeze(label), self.squeeze(offset), self.squeeze(landmark) + +class PNetWithLoss(nn.Cell): + """PNet with loss cell""" + def __init__(self): + super(PNetWithLoss, self).__init__() + self.net = PNet() + self.cls_loss = ClassLoss() + self.box_loss = BoxLoss() + self.landmark_loss = LandmarkLoss() + + def construct(self, x, gt_label, gt_box, gt_landmark): + pred_label, pred_box, pred_landmark = self.net(x) + assert pred_label.ndim == 2 and pred_box.ndim == 2 and pred_landmark.ndim == 2, "Need to squeeze" + cls_loss_value = self.cls_loss(gt_label, pred_label) + box_loss_value = self.box_loss(gt_label, gt_box, pred_box) + landmark_loss_value = self.landmark_loss(gt_label, gt_landmark, pred_landmark) + total_loss = cfg.RADIO_CLS_LOSS * cls_loss_value + \ + cfg.RADIO_BOX_LOSS * box_loss_value + cfg.RADIO_LANDMARK_LOSS * landmark_loss_value + return total_loss + + +class PNetTrainOneStepCell(nn.Cell): + """PNet Train One Step Cell""" + def __init__(self, network, optimizer): + super(PNetTrainOneStepCell, self).__init__() + self.network = network + self.network.set_grad() + self.optimizer = optimizer + self.weights = optimizer.parameters + self.grad = ops.GradOperation(get_by_list=True) + + def construct(self, *inputs): + loss = self.network(*inputs) + grads = self.grad(self.network, self.weights)(*inputs) + self.optimizer(grads) + return loss + +class RNet(nn.Cell): + """Refinement Network(R-Net)""" + def __init__(self): + super(RNet, self).__init__() + + self.conv1 = nn.Conv2d(3, 28, 3, 1, has_bias=True, pad_mode='valid') + self.prelu1 = nn.PReLU() + self.pool1 = nn.MaxPool2d(3, 2, pad_mode='same') + + self.conv2 = nn.Conv2d(28, 48, 3, 1, has_bias=True, pad_mode='valid') + self.prelu2 = nn.PReLU() + self.pool2 = nn.MaxPool2d(3, 2, pad_mode='valid') + + self.conv3 = nn.Conv2d(48, 64, 2, 1, has_bias=True, pad_mode='valid') + self.prelu3 = nn.PReLU() + self.flatten = nn.Flatten() + + self.fc = nn.Dense(576, 128) + + # detection + self.class_fc = nn.Dense(128, 2) + # bounding box regression + self.bbox_fc = nn.Dense(128, 4) + # landmark localization + self.landmark_fc = nn.Dense(128, 10) + + for cell in self.cells(): + if isinstance(cell, (nn.Conv2d, nn.Dense)): + cell.weight.set_data(initializer(HeNormal(mode='fan_out', nonlinearity='relu'), cell.weight.shape)) + cell.bias.set_data(initializer(Uniform(), cell.bias.shape)) + + def construct(self, x): + x = self.prelu1(self.conv1(x)) + x = self.pool1(x) + x = self.prelu2(self.conv2(x)) + x = self.pool2(x) + x = self.prelu3(self.conv3(x)) + + x = self.flatten(x) + x = self.fc(x) + + # detection + det = self.class_fc(x) + box = self.bbox_fc(x) + landmark = self.landmark_fc(x) + + return det, box, landmark + +class RNetWithLoss(nn.Cell): + def __init__(self): + super(RNetWithLoss, self).__init__() + self.net = RNet() + self.cls_loss = ClassLoss() + self.box_loss = BoxLoss() + self.landmark_loss = LandmarkLoss() + + def construct(self, x, gt_label, gt_box, gt_landmark): + pred_label, pred_box, pred_landmark = self.net(x) + cls_loss_value = self.cls_loss(gt_label, pred_label) + box_loss_value = self.box_loss(gt_label, gt_box, pred_box) + landmark_loss_value = self.landmark_loss(gt_label, gt_landmark, pred_landmark) + total_loss = cfg.RADIO_CLS_LOSS * cls_loss_value + \ + cfg.RADIO_BOX_LOSS * box_loss_value + cfg.RADIO_LANDMARK_LOSS * landmark_loss_value + return total_loss + +class RNetTrainOneStepCell(nn.Cell): + def __init__(self, network, optimizer): + super(RNetTrainOneStepCell, self).__init__() + self.network = network + self.network.set_grad() + self.optimizer = optimizer + self.weights = optimizer.parameters + self.grad = ops.GradOperation(get_by_list=True) + + def construct(self, *inputs): + loss = self.network(*inputs) + grads = self.grad(self.network, self.weights)(*inputs) + self.optimizer(grads) + return loss + +class ONet(nn.Cell): + """Output Network(O-Net)""" + def __init__(self): + super(ONet, self).__init__() + + self.conv1 = nn.Conv2d(3, 32, 3, 1, has_bias=True, pad_mode='valid') + self.prelu1 = nn.PReLU() + self.pool1 = nn.MaxPool2d(3, 2, pad_mode='same') + + self.conv2 = nn.Conv2d(32, 64, 3, 1, has_bias=True, pad_mode='valid') + self.prelu2 = nn.PReLU() + self.pool2 = nn.MaxPool2d(3, 2, pad_mode='valid') + + self.conv3 = nn.Conv2d(64, 64, 3, 1, has_bias=True, pad_mode='valid') + self.prelu3 = nn.PReLU() + self.pool3 = nn.MaxPool2d(2, 2, pad_mode='valid') + + self.conv4 = nn.Conv2d(64, 128, 2, 1, has_bias=True, pad_mode='valid') + self.prelu4 = nn.PReLU() + + self.fc = nn.Dense(1152, 256) + + self.flatten = nn.Flatten() + + # detection + self.class_fc = nn.Dense(256, 2) + # bounding box regression + self.bbox_fc = nn.Dense(256, 4) + # landmark localization + self.landmark_fc = nn.Dense(256, 10) + + for cell in self.cells(): + if isinstance(cell, nn.Conv2d): + cell.weight.set_data(initializer(HeNormal(mode='fan_out', nonlinearity='relu'), cell.weight.shape)) + cell.bias.set_data(initializer(Uniform(), cell.bias.shape)) + + def construct(self, x): + x = self.prelu1(self.conv1(x)) + x = self.pool1(x) + x = self.prelu2(self.conv2(x)) + x = self.pool2(x) + x = self.prelu3(self.conv3(x)) + x = self.pool3(x) + x = self.prelu4(self.conv4(x)) + x = self.flatten(x) + x = self.fc(x) + # detection + det = self.class_fc(x) + # box regression + box = self.bbox_fc(x) + # landmark regression + landmark = self.landmark_fc(x) + + return det, box, landmark + +class ONetWithLoss(nn.Cell): + def __init__(self): + super(ONetWithLoss, self).__init__() + self.net = ONet() + self.cls_loss = ClassLoss() + self.box_loss = BoxLoss() + self.landmark_loss = LandmarkLoss() + + def construct(self, x, gt_label, gt_box, gt_landmark): + pred_label, pred_box, pred_landmark = self.net(x) + cls_loss_value = self.cls_loss(gt_label, pred_label) + box_loss_value = self.box_loss(gt_label, gt_box, pred_box) + landmark_loss_value = self.landmark_loss(gt_label, gt_landmark, pred_landmark) + total_loss = 1.0 * cls_loss_value + 0.5 * box_loss_value + 1.0 * landmark_loss_value + return total_loss + +class ONetTrainOneStepCell(nn.Cell): + def __init__(self, network, optimizer): + super(ONetTrainOneStepCell, self).__init__() + self.network = network + self.network.set_grad() + self.optimizer = optimizer + self.weights = optimizer.parameters + self.grad = ops.GradOperation(get_by_list=True) + + def construct(self, *inputs): + loss = self.network(*inputs) + grads = self.grad(self.network, self.weights)(*inputs) + self.optimizer(grads) + return loss diff --git a/research/cv/MTCNN/src/models/mtcnn_detector.py b/research/cv/MTCNN/src/models/mtcnn_detector.py new file mode 100644 index 0000000000000000000000000000000000000000..05a06bb13718c0285aef027acb25eea0a623b204 --- /dev/null +++ b/research/cv/MTCNN/src/models/mtcnn_detector.py @@ -0,0 +1,208 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +from src.utils import generate_box, nms, convert_to_square, pad, calibrate_box, process_image +from src.models.predict_nets import predict_pnet, predict_rnet, predict_onet +import config as cfg + +import numpy as np +import cv2 + + +class MtcnnDetector: + """Detect Image By MTCNN Model""" + def __init__(self, pnet, rnet, onet, min_face_size=20, scale_factor=0.79): + self.pnet = pnet + self.rnet = rnet + self.onet = onet + self.min_face_size = min_face_size + self.scale_factor = scale_factor + + def detect_pnet(self, im, thresh=cfg.P_THRESH): + """Get face candidates through pnet + Parameters: + ---------- + im: numpy array + input image array + Returns: + ------- + boxes_c: numpy array + boxes after calibration + """ + net_size = 12 + current_scale = float(net_size) / self.min_face_size + im_resized = process_image(im, current_scale) + _, current_height, current_width = im_resized.shape + all_boxes = list() + while min(current_height, current_width) > net_size: + cls_map, reg = predict_pnet(im_resized, self.pnet) + boxes = generate_box(cls_map[1, :, :], reg, current_scale, thresh) + + current_scale *= self.scale_factor + im_resized = process_image(im, current_scale) + _, current_height, current_width = im_resized.shape + + if boxes.size == 0: + continue + keep = nms(boxes[:, :5], 0.5, 'Union') + boxes = boxes[keep] + all_boxes.append(boxes) + + if not all_boxes: + return None + + all_boxes = np.vstack(all_boxes) + + keep = nms(all_boxes[:, 0:5], 0.7) + all_boxes = all_boxes[keep] + + bbw = all_boxes[:, 2] - all_boxes[:, 0] + 1 + bbh = all_boxes[:, 3] - all_boxes[:, 1] + 1 + + boxes_c = np.vstack([all_boxes[:, 0] + all_boxes[:, 5] * bbw, + all_boxes[:, 1] + all_boxes[:, 6] * bbh, + all_boxes[:, 2] + all_boxes[:, 7] * bbw, + all_boxes[:, 3] + all_boxes[:, 8] * bbh, + all_boxes[:, 4]]) + boxes_c = boxes_c.T + return boxes_c + + def detect_rnet(self, im, dets, thresh=cfg.R_THRESH): + """Get face candidates using rnet + Parameters: + ---------- + im: numpy array + input image array + dets: numpy array + detection results of pnet + Returns: + ------- + boxes_c: numpy array + boxes after calibration + """ + h, w, _ = im.shape + dets = convert_to_square(dets) + dets[:, 0:4] = np.round(dets[:, 0:4]) + + [dy, edy, dx, edx, y, ey, x, ex, tmpw, tmph] = pad(dets, w, h) + delete_size = np.ones_like(tmpw) * 20 + ones = np.ones_like(tmpw) + zeros = np.zeros_like(tmpw) + num_boxes = np.sum(np.where((np.minimum(tmpw, tmph) >= delete_size), ones, zeros)) + cropped_ims = np.zeros((num_boxes, 3, 24, 24), dtype=np.float32) + for i in range(int(num_boxes)): + if tmph[i] < 20 or tmpw[i] < 20: + continue + tmp = np.zeros((tmph[i], tmpw[i], 3), dtype=np.uint8) + try: + tmp[dy[i]:edy[i] + 1, dx[i]:edx[i] + 1, :] = im[y[i]:ey[i] + 1, x[i]:ex[i] + 1, :] + img = cv2.resize(tmp, (24, 24), interpolation=cv2.INTER_LINEAR) + img = img.transpose((2, 0, 1)) + img = (img - 127.5) / 128 + cropped_ims[i, :, :, :] = img + except ValueError: + continue + cls_scores, reg = predict_rnet(cropped_ims, self.rnet) + if cls_scores.ndim < 2: + cls_scores = cls_scores[None, :] + cls_scores = cls_scores[:, 1] + keep_inds = np.where(cls_scores > thresh)[0] + if keep_inds.size != 0: + boxes = dets[keep_inds] + boxes[:, 4] = cls_scores[keep_inds] + reg = reg[keep_inds] + else: + return None + keep = nms(boxes, 0.4) + boxes = boxes[keep] + + boxes_c = calibrate_box(boxes, reg[keep]) + return boxes_c + + def detect_onet(self, im, dets, thresh=cfg.O_THRESH): + """Get face candidates using onet + Parameters: + ---------- + im: numpy array + input image array + dets: numpy array + detection results of rnet + Returns: + ------- + boxes_c: numpy array + boxes after calibration + landmark: numpy array + """ + h, w, _ = im.shape + dets = convert_to_square(dets) + dets[:, 0:4] = np.round(dets[:, 0:4]) + [dy, edy, dx, edx, y, ey, x, ex, tmpw, tmph] = pad(dets, w, h) + num_boxes = dets.shape[0] + cropped_ims = np.zeros((num_boxes, 3, 48, 48), dtype=np.float32) + for i in range(num_boxes): + tmp = np.zeros((tmph[i], tmpw[i], 3), dtype=np.uint8) + tmp[dy[i]:edy[i] + 1, dx[i]:edx[i] + 1, :] = im[y[i]:ey[i] + 1, x[i]:ex[i] + 1, :] + img = cv2.resize(tmp, (48, 48), interpolation=cv2.INTER_LINEAR) + img = img.transpose((2, 0, 1)) + img = (img - 127.5) / 128 + cropped_ims[i, :, :, :] = img + + cls_scores, reg, landmark = predict_onet(cropped_ims, self.onet) + if cls_scores.ndim < 2: + cls_scores = cls_scores[None, :] + if reg.ndim < 2: + reg = reg[None, :] + if landmark.ndim < 2: + landmark = landmark[None, :] + + cls_scores = cls_scores[:, 1] + keep_inds = np.where(cls_scores > thresh)[0] + if keep_inds.size != 0: + boxes = dets[keep_inds] + boxes[:, 4] = cls_scores[keep_inds] + reg = reg[keep_inds] + landmark = landmark[keep_inds] + else: + return None, None + + w = boxes[:, 2] - boxes[:, 0] + 1 + + h = boxes[:, 3] - boxes[:, 1] + 1 + + landmark[:, 0::2] = (np.tile(w, (5, 1)) * landmark[:, 0::2].T + np.tile(boxes[:, 0], (5, 1)) - 1).T + landmark[:, 1::2] = (np.tile(h, (5, 1)) * landmark[:, 1::2].T + np.tile(boxes[:, 1], (5, 1)) - 1).T + boxes_c = calibrate_box(boxes, reg) + + keep = nms(boxes_c, 0.6, mode='Minimum') + boxes_c = boxes_c[keep] + landmark = landmark[keep] + return boxes_c, landmark + + def detect_face(self, image_path): + im = cv2.imread(image_path) + + boxes_c = self.detect_pnet(im, 0.9) + if boxes_c is None: + return None, None + + boxes_c = self.detect_rnet(im, boxes_c, 0.6) + if boxes_c is None: + return None, None + + boxes_c, landmark = self.detect_onet(im, boxes_c, 0.7) + if boxes_c is None: + return None, None + + return boxes_c, landmark diff --git a/research/cv/MTCNN/src/models/predict_nets.py b/research/cv/MTCNN/src/models/predict_nets.py new file mode 100644 index 0000000000000000000000000000000000000000..f28fc3764021f5328c0ec08390c9cca46adc168e --- /dev/null +++ b/research/cv/MTCNN/src/models/predict_nets.py @@ -0,0 +1,42 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +from mindspore import Tensor, ops +import mindspore as ms + + +def predict_pnet(data, net): + """Predict data by PNet""" + data = Tensor(data, dtype=ms.float32)[None, :] + cls_prob, box_pred, _ = net(data) + softmax = ops.Softmax(axis=0) + cls_prob = softmax(cls_prob) + + return cls_prob.asnumpy(), box_pred.asnumpy() + +def predict_rnet(data, net): + """Predict data by RNet""" + data = Tensor(data, dtype=ms.float32) + cls_prob, box_pred, _ = net(data) + softmax = ops.Softmax() + cls_prob = softmax(cls_prob) + return cls_prob.asnumpy(), box_pred.asnumpy() + +def predict_onet(data, net): + """Predict data by ONet""" + data = Tensor(data, dtype=ms.float32) + cls_prob, box_pred, landmark_pred = net(data) + softmax = ops.Softmax() + cls_prob = softmax(cls_prob) + return cls_prob.asnumpy(), box_pred.asnumpy(), landmark_pred.asnumpy() diff --git a/research/cv/MTCNN/src/prepare_data/generate_ONet_data.py b/research/cv/MTCNN/src/prepare_data/generate_ONet_data.py new file mode 100644 index 0000000000000000000000000000000000000000..6bcc63f3bd3f68a31e4cd7e4e6289fb0c0802dbc --- /dev/null +++ b/research/cv/MTCNN/src/prepare_data/generate_ONet_data.py @@ -0,0 +1,166 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +import os +import argparse + +import pickle +import cv2 +import numpy as np +from tqdm import tqdm +from mindspore import load_checkpoint, load_param_into_net + +from src.utils import convert_to_square, delete_old_img, nms, combine_data_list, data_to_mindrecord, crop_landmark_image +from src.utils import read_annotation, save_hard_example, get_landmark_from_lfw_neg, pad, calibrate_box +from src.models.mtcnn import PNet, RNet +from src.prepare_data.generate_RNet_data import detect_pnet +from src.models.predict_nets import predict_rnet +import config as cfg + +def parse_args(): + parser = argparse.ArgumentParser(description="Generate ONet data") + parser.add_argument('--pnet_ckpt', type=str, required=True, help="Path of PNet checkpoint to detect") + parser.add_argument('--rnet_ckpt', type=str, required=True, help="Path of RNet checkpoint to detect") + + return parser.parse_args() + +def detect_rnet(im, dets, thresh, net): + """Filter box and landmark by RNet""" + h, w, _ = im.shape + dets = convert_to_square(dets) + dets[:, 0:4] = np.round(dets[:, 0:4]) + + [dy, edy, dx, edx, y, ey, x, ex, tmpw, tmph] = pad(dets, w, h) + delete_size = np.ones_like(tmpw) * 20 + ones = np.ones_like(tmpw) + zeros = np.zeros_like(tmpw) + num_boxes = np.sum(np.where((np.minimum(tmpw, tmph) >= delete_size), ones, zeros)) + cropped_ims = np.zeros((num_boxes, 3, 24, 24), dtype=np.float32) + if int(num_boxes) == 0: + print('Detection reasult of PNet is null!') + return None, None + + for i in range(int(num_boxes)): + if tmph[i] < 20 or tmpw[i] < 20: + continue + tmp = np.zeros((tmph[i], tmpw[i], 3), dtype=np.uint8) + try: + tmp[dy[i]:edy[i] + 1, dx[i]:edx[i] + 1, :] = im[y[i]:ey[i] + 1, x[i]:ex[i] + 1, :] + img = cv2.resize(tmp, (24, 24)) + img = img.transpose((2, 0, 1)) + img = (img - 127.5) / 128 + cropped_ims[i, :, :, :] = img + except ValueError: + continue + + cls_scores, reg = predict_rnet(cropped_ims, net) + if cls_scores.ndim < 2: + cls_scores = cls_scores[None, :] + cls_scores = cls_scores[:, 1] + keep_inds = np.where(cls_scores > thresh)[0] + + if keep_inds.size != 0: + boxes = dets[keep_inds] + boxes[:, 4] = cls_scores[keep_inds] + reg = reg[keep_inds] + else: + return None, None + + keep = nms(boxes, 0.6, mode='Union') + boxes = boxes[keep] + + boxes_c = calibrate_box(boxes, reg[keep]) + return boxes, boxes_c + +def crop_48_size_images(min_face_size, scale_factor, p_thresh, r_thresh, pnet, rnet): + """Collect positive, negative, part images and resize to 48*48 as the input of ONet""" + dataset_path = cfg.DATASET_DIR + train_data_dir = cfg.TRAIN_DATA_DIR + anno_file = os.path.join(dataset_path, 'wider_face_train.txt') + + # save positive, part, negative images + pos_save_dir = os.path.join(train_data_dir, '48/positive') + part_save_dir = os.path.join(train_data_dir, '48/part') + neg_save_dir = os.path.join(train_data_dir, '48/negative') + + # save PNet train data + save_dir = os.path.join(train_data_dir, '48') + + save_dir_list = [save_dir, pos_save_dir, part_save_dir, neg_save_dir] + for dir_ in save_dir_list: + if not os.path.exists(dir_): + os.mkdir(dir_) + + # Read annotation data + data = read_annotation(dataset_path, anno_file) + all_boxes = [] + landmarks = [] + empty_array = np.array([]) + + for image_path in tqdm(data['images']): + assert os.path.exists(image_path), 'image not exists' + im = cv2.imread(image_path) + boxes_c = detect_pnet(im, min_face_size, scale_factor, p_thresh, pnet, 0.5) + if boxes_c is None: + all_boxes.append(empty_array) + landmarks.append(empty_array) + continue + + _, boxes_c = detect_rnet(im, boxes_c, r_thresh, rnet) + if boxes_c is None: + all_boxes.append(empty_array) + landmarks.append(empty_array) + continue + + all_boxes.append(boxes_c) + + # Save result to pickle file + save_file = os.path.join(save_dir, 'detections.pkl') + with open(save_file, 'wb') as f: + pickle.dump(all_boxes, f, 1) + + save_hard_example(dataset_path, 48) + +if __name__ == '__main__': + args = parse_args() + pnet_params = load_checkpoint(args.pnet_ckpt) + rnet_params = load_checkpoint(args.rnet_ckpt) + pnet_ = PNet() + rnet_ = RNet() + load_param_into_net(pnet_, pnet_params) + load_param_into_net(rnet_, rnet_params) + pnet_.set_train(False) + rnet_.set_train(False) + + min_face_size_ = cfg.MIN_FACE_SIZE + scale_factor_ = cfg.SCALE_FACTOR + p_thresh_ = cfg.P_THRESH + r_thresh_ = cfg.R_THRESH + + print("Start generating Box images") + if not os.path.exists(cfg.TRAIN_DATA_DIR): + os.mkdir(cfg.TRAIN_DATA_DIR) + crop_48_size_images(min_face_size_, scale_factor_, p_thresh_, r_thresh_, pnet_, rnet_) + + print("Start generating landmark image") + data_list = get_landmark_from_lfw_neg(cfg.DATASET_DIR) + + crop_landmark_image(cfg.TRAIN_DATA_DIR, data_list, 48, argument=True) + + print("Start combine data list") + combine_data_list(os.path.join(cfg.TRAIN_DATA_DIR, '48')) + + data_to_mindrecord(os.path.join(cfg.TRAIN_DATA_DIR, '48'), cfg.MINDRECORD_DIR, 'ONet_train.mindrecord') + delete_old_img(cfg.TRAIN_DATA_DIR, 48) diff --git a/research/cv/MTCNN/src/prepare_data/generate_PNet_data.py b/research/cv/MTCNN/src/prepare_data/generate_PNet_data.py new file mode 100644 index 0000000000000000000000000000000000000000..a5a45d3fb0d7a0b0796b8509333fc20e5b5f2dd3 --- /dev/null +++ b/research/cv/MTCNN/src/prepare_data/generate_PNet_data.py @@ -0,0 +1,181 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +import os +from tqdm import tqdm +import cv2 +import numpy as np +from numpy import random as npr + +from src.utils import IoU, crop_landmark_image, combine_data_list, data_to_mindrecord, get_landmark_from_lfw_neg +from src.utils import delete_old_img, check_dir +import config as cfg + +def write_neg_data(save_dir, anno_file, idx, resized_image): + """Save negative data""" + save_file = os.path.join(save_dir, "%s.jpg" % idx) + anno_file.write(save_dir + '/%s.jpg' % idx + ' 0\n') + cv2.imwrite(save_file, resized_image) + idx += 1 + return idx + +def write_pos_data(save_dir, anno_file, idx, resized_image, x1, y1, x2, y2): + """Save positive data""" + save_file = os.path.join(save_dir, '%s.jpg' % idx) + anno_file.write(save_dir + '/%s.jpg' % idx + ' 1 %.2f %.2f %.2f %.2f\n' % (x1, y1, x2, y2)) + cv2.imwrite(save_file, resized_image) + idx += 1 + return idx + +def write_part_data(save_dir, anno_file, idx, resized_image, x1, y1, x2, y2): + """Save part data""" + save_file = os.path.join(save_dir, '%s.jpg' % idx) + anno_file.write(save_dir + '/%s.jpg' % idx + ' -1 %.2f %.2f %.2f %.2f\n' % (x1, y1, x2, y2)) + cv2.imwrite(save_file, resized_image) + idx += 1 + return idx + +def crop_12_size_images(): + """Collect positive, negative, part images and resize to 12*12 as the input of PNet""" + dataset_path = cfg.DATASET_DIR + train_data_dir = cfg.TRAIN_DATA_DIR + # annotataion file of WIDER dataset + anno_file = os.path.join(dataset_path, 'wider_face_train.txt') + # path of WIDER images + img_dir = os.path.join(dataset_path, 'WIDER_train/images') + + # save positive, part, negative images + pos_save_dir = os.path.join(train_data_dir, '12/positive') + part_save_dir = os.path.join(train_data_dir, '12/part') + neg_save_dir = os.path.join(train_data_dir, '12/negative') + + # save PNet train data + save_dir = os.path.join(train_data_dir, '12') + save_dir_list = [save_dir, pos_save_dir, part_save_dir, neg_save_dir] + check_dir(save_dir_list) + + # Generate annotation files of positive, part, negative images + pos_anno = open(os.path.join(save_dir, 'positive.txt'), 'w') + part_anno = open(os.path.join(save_dir, 'part.txt'), 'w') + neg_anno = open(os.path.join(save_dir, 'negative.txt'), 'w') + + # original dataset + with open(anno_file, 'r') as f: + annotations = f.readlines() + total_num = len(annotations) + print(f"Total images number: {total_num}") + # record number of positive, negative and part images + p_idx, n_idx, d_idx = 0, 0, 0 + # record number of processed images + idx = 0 + for annotation in tqdm(annotations): + annotation = annotation.strip().split(' ') + im_path = annotation[0] + # box data + box = list(map(float, annotation[1:])) + # split box data + boxes = np.array(box, dtype=np.float32).reshape(-1, 4) + # load image + img = cv2.imread(os.path.join(img_dir, im_path + '.jpg')) + idx += 1 + height, width, _ = img.shape + neg_num = 0 + while neg_num < 50: + # random select image size to crop + size = npr.randint(12, min(width, height) / 2) + nx = npr.randint(0, width - size) + ny = npr.randint(0, height - size) + # crop box + crop_box = np.array([nx, ny, nx + size, ny + size]) + iou = IoU(crop_box, boxes) + cropped_im = img[ny:ny + size, nx:nx + size, :] + resized_im = cv2.resize(cropped_im, (12, 12), interpolation=cv2.INTER_LINEAR) + # negative image if IoU < 0.3 + if np.max(iou) < 0.3: + n_idx = write_neg_data(neg_save_dir, neg_anno, n_idx, resized_im) + neg_num += 1 + + for box in boxes: + x1, y1, x2, y2 = box + w = x2 - x1 + 1 + h = y2 - y1 + 1 + if max(w, h) < 20 or x1 < 0 or y1 < 0: + continue + for _ in range(5): + size = npr.randint(12, min(width, height) / 2) + delta_x = npr.randint(max(-size, -x1), w) + delta_y = npr.randint(max(-size, -y1), h) + nx1 = int(max(0, x1 + delta_x)) + ny1 = int(max(0, y1 + delta_y)) + # exclude image which is too large + if nx1 + size > width or ny1 + size > height: + continue + # get crop box + crop_box = np.array([nx1, ny1, nx1 + size, ny1 + size]) + # calculate iou + iou = IoU(crop_box, boxes) + cropped_im = img[ny1:ny1 + size, nx1:nx1 + size, :] + resized_im = cv2.resize(cropped_im, (12, 12), interpolation=cv2.INTER_LINEAR) + # negative image if iou < 0.3 + if np.max(iou) < 0.3: + n_idx = write_neg_data(neg_save_dir, neg_anno, n_idx, resized_im) + + for _ in range(20): + size = npr.randint(int(min(w, h) * 0.8), np.ceil(1.25 * max(w, h))) + if w < 5: + continue + delta_x = npr.randint(-w * 0.2, w * 0.2) + delta_y = npr.randint(-h * 0.2, h * 0.2) + nx1 = int(max(x1 + w / 2 + delta_x - size / 2, 0)) + ny1 = int(max(y1 + h / 2 + delta_y - size / 2, 0)) + nx2 = nx1 + size + ny2 = ny1 + size + if nx2 > width or ny2 > height: + continue + crop_box = np.array([nx1, ny1, nx2, ny2]) + offset_x1 = (x1 - nx1) / float(size) + offset_y1 = (y1 - ny1) / float(size) + offset_x2 = (x2 - nx2) / float(size) + offset_y2 = (y2 - ny2) / float(size) + cropped_im = img[ny1:ny2, nx1:nx2, :] + resized_im = cv2.resize(cropped_im, (12, 12), interpolation=cv2.INTER_LINEAR) + box_ = box.reshape(1, -1) + iou = IoU(crop_box, box_) + # positive image if iou > 0.65 + if iou > 0.65: + p_idx = write_pos_data(pos_save_dir, pos_anno, p_idx, resized_im, offset_x1, offset_y1, + offset_x2, offset_y2) + # part image if iou > 0.4 and iou < 0.65 + elif iou >= 0.4: + d_idx = write_part_data(part_save_dir, part_anno, d_idx, resized_im, offset_x1, offset_y1, + offset_x2, offset_y2) + + print(f"{idx} images processed, pos: {p_idx} part: {d_idx} neg: {n_idx}") + pos_anno.close() + part_anno.close() + neg_anno.close() + +if __name__ == '__main__': + print("Start generating Box images") + if not os.path.exists(cfg.TRAIN_DATA_DIR): + os.mkdir(cfg.TRAIN_DATA_DIR) + crop_12_size_images() + print("Start generating landmark image") + data_list = get_landmark_from_lfw_neg(cfg.DATASET_DIR) + crop_landmark_image(cfg.TRAIN_DATA_DIR, data_list, 12, argument=True) + print("Start combine data list") + combine_data_list(os.path.join(cfg.TRAIN_DATA_DIR, '12')) + data_to_mindrecord(os.path.join(cfg.TRAIN_DATA_DIR, '12'), cfg.MINDRECORD_DIR, 'PNet_train.mindrecord') + delete_old_img(cfg.TRAIN_DATA_DIR, 12) diff --git a/research/cv/MTCNN/src/prepare_data/generate_RNet_data.py b/research/cv/MTCNN/src/prepare_data/generate_RNet_data.py new file mode 100644 index 0000000000000000000000000000000000000000..73af0dd0dbbad5a10f0c6cbea47219af0c6019b3 --- /dev/null +++ b/research/cv/MTCNN/src/prepare_data/generate_RNet_data.py @@ -0,0 +1,150 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +import os +import argparse + +import pickle +import cv2 +import numpy as np +from tqdm import tqdm +from mindspore import load_checkpoint, load_param_into_net + +from src.utils import delete_old_img, nms, combine_data_list, data_to_mindrecord +from src.utils import crop_landmark_image, process_image, generate_box +from src.utils import read_annotation, save_hard_example, get_landmark_from_lfw_neg +from src.models.mtcnn import PNet +from src.models.predict_nets import predict_pnet +import config as cfg + +def parse_args(): + parser = argparse.ArgumentParser(description="Generate RNet data") + parser.add_argument('--pnet_ckpt', type=str, required=True, help="Path of PNet checkpoint to detect") + + + return parser.parse_args() + +def detect_pnet(im, min_face_size, scale_factor, thresh, net, nms_thresh=0.7): + """Filter box and landmark by PNet""" + net_size = 12 + # Ratio of face and input image + current_scale = float(net_size) / min_face_size + im_resized = process_image(im, current_scale) + _, current_height, current_width = im_resized.shape + all_boxes = list() + + # Image pymaid + while min(current_height, current_width) > net_size: + cls, reg = predict_pnet(im_resized, net) + boxes = generate_box(cls[1, :, :], reg, current_scale, thresh) + current_scale *= scale_factor + im_resized = process_image(im, current_scale) + _, current_height, current_width = im_resized.shape + + if boxes.size == 0: + continue + + keep = nms(boxes[:, :5], nms_thresh, mode='Union') + boxes = boxes[keep] + all_boxes.append(boxes) + + if not all_boxes: + return None + + all_boxes = np.vstack(all_boxes) + + keep = nms(all_boxes[:, 0:5], 0.7) + all_boxes = all_boxes[keep] + + bbw = all_boxes[:, 2] - all_boxes[:, 0] + 1 + bbh = all_boxes[:, 3] - all_boxes[:, 1] + 1 + + boxes_c = np.vstack([all_boxes[:, 0] + all_boxes[:, 5] * bbw, + all_boxes[:, 1] + all_boxes[:, 6] * bbh, + all_boxes[:, 2] + all_boxes[:, 7] * bbw, + all_boxes[:, 3] + all_boxes[:, 8] * bbh, + all_boxes[:, 4]]) + + return boxes_c.T + +def crop_24_size_images(min_face_size, scale_factor, thresh, net): + """Collect positive, negative, part images and resize to 24*24 as the input of RNet""" + dataset_path = cfg.DATASET_DIR + train_data_dir = cfg.TRAIN_DATA_DIR + anno_file = os.path.join(dataset_path, 'wider_face_train.txt') + + # save positive, part, negative images + pos_save_dir = os.path.join(train_data_dir, '24/positive') + part_save_dir = os.path.join(train_data_dir, '24/part') + neg_save_dir = os.path.join(train_data_dir, '24/negative') + + # save PNet train data + save_dir = os.path.join(train_data_dir, '24') + + save_dir_list = [save_dir, pos_save_dir, part_save_dir, neg_save_dir] + for dir_ in save_dir_list: + if not os.path.exists(dir_): + os.mkdir(dir_) + + # Read annotation data + data = read_annotation(dataset_path, anno_file) + all_boxes, landmarks = [], [] + empty_array = np.array([]) + + # Rec image with PNet + for image_path in tqdm(data['images']): + assert os.path.exists(image_path), 'image not exists' + im = cv2.imread(image_path) + boxes_c = detect_pnet(im, min_face_size, scale_factor, thresh, net) + if boxes_c is None: + all_boxes.append(empty_array) + landmarks.append(empty_array) + continue + all_boxes.append(boxes_c) + + + # Save result to pickle file + save_file = os.path.join(save_dir, 'detections.pkl') + with open(save_file, 'wb') as f: + pickle.dump(all_boxes, f, 1) + + save_hard_example(dataset_path, 24) + +if __name__ == '__main__': + args = parse_args() + params = load_checkpoint(args.pnet_ckpt) + pnet_ = PNet() + load_param_into_net(pnet_, params) + pnet_.set_train(False) + + min_face_size_ = cfg.MIN_FACE_SIZE + scale_factor_ = cfg.SCALE_FACTOR + p_thresh_ = cfg.P_THRESH + + print("Start generating Box images") + if not os.path.exists(cfg.TRAIN_DATA_DIR): + os.mkdir(cfg.TRAIN_DATA_DIR) + crop_24_size_images(min_face_size_, scale_factor_, p_thresh_, pnet_) + + print("Start generating landmark image") + data_list = get_landmark_from_lfw_neg(cfg.DATASET_DIR) + + crop_landmark_image(cfg.TRAIN_DATA_DIR, data_list, 24, argument=True) + + print("Start combine data list") + combine_data_list(os.path.join(cfg.TRAIN_DATA_DIR, '24')) + + data_to_mindrecord(os.path.join(cfg.TRAIN_DATA_DIR, '24'), cfg.MINDRECORD_DIR, 'RNet_train.mindrecord') + delete_old_img(cfg.TRAIN_DATA_DIR, 24) diff --git a/research/cv/MTCNN/src/train_models/train_o_net.py b/research/cv/MTCNN/src/train_models/train_o_net.py new file mode 100644 index 0000000000000000000000000000000000000000..631cf0fceb5a5e50dc1b61a7ed05d7eb6078784d --- /dev/null +++ b/research/cv/MTCNN/src/train_models/train_o_net.py @@ -0,0 +1,82 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +import os +import time + +from mindspore import context, nn +from mindspore import save_checkpoint +from mindspore.communication import management as D +from mindspore.communication.management import get_group_size, get_rank +from src.models.mtcnn import ONetWithLoss, ONetTrainOneStepCell +from src.dataset import create_train_dataset +from src.utils import MultiEpochsDecayLR + +def train_onet(args): + print("The argument is: ", args) + context.set_context(device_target=args.device_target, mode=context.GRAPH_MODE) + device_id = 0 + device_num = 1 + if args.distribute: + D.init() + device_id = get_rank() + device_num = get_group_size() + + context.reset_auto_parallel_context() + context.set_auto_parallel_context(parallel_mode=context.ParallelMode.DATA_PARALLEL, gradients_mean=True, + device_num=device_num) + + else: + context.set_context(device_id=int(os.getenv('DEVICE_ID', '0'))) + + # Create train dataset + ds_train = create_train_dataset(args.mindrecord_file, args.batch_size, + device_num, device_id, num_workers=args.num_workers) + steps_per_epoch = ds_train.get_dataset_size() + network = ONetWithLoss() + + network.set_train(True) + + # decay lr + if args.distribute: + lr_scheduler = MultiEpochsDecayLR(args.lr, [12, 17, 20], steps_per_epoch) + else: + lr_scheduler = MultiEpochsDecayLR(args.lr, [6, 14, 20], steps_per_epoch) + + # optimizer + optimizer = nn.Adam(params=network.trainable_params(), learning_rate=lr_scheduler, weight_decay=1e-4) + + # train net + train_net = ONetTrainOneStepCell(network, optimizer) + train_net.set_train(True) + + print("Start training ONet") + + for epoch in range(1, args.end_epoch+1): + step = 0 + time_list = [] + for d in ds_train.create_tuple_iterator(): + start_time = time.time() + loss = train_net(*d) + step += 1 + print(f'epoch: {epoch} step: {step}, loss is {loss}') + per_time = time.time() - start_time + time_list.append(per_time) + print('per step time: ', '%.2f' % (sum(time_list) / len(time_list) * 1000), "(ms/step)") + + if args.distribute and device_id == 0: + save_checkpoint(network, os.path.join(args.ckpt_path, 'onet_distribute.ckpt')) + elif not args.distribute: + save_checkpoint(network, os.path.join(args.ckpt_path, 'onet_standalone.ckpt')) diff --git a/research/cv/MTCNN/src/train_models/train_p_net.py b/research/cv/MTCNN/src/train_models/train_p_net.py new file mode 100644 index 0000000000000000000000000000000000000000..badd4c391b327c434f9a83f19ab83c4ddfa1b1bc --- /dev/null +++ b/research/cv/MTCNN/src/train_models/train_p_net.py @@ -0,0 +1,81 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +import os +import time + +from mindspore import context, nn +from mindspore import save_checkpoint +from mindspore.communication import management as D +from mindspore.communication.management import get_group_size, get_rank +from src.models.mtcnn import PNetTrainOneStepCell, PNetWithLoss +from src.dataset import create_train_dataset +from src.utils import MultiEpochsDecayLR + +def train_pnet(args): + print("The argument is: ", args) + context.set_context(device_target=args.device_target, mode=context.GRAPH_MODE) + device_id = 0 + device_num = 1 + if args.distribute: + D.init() + device_id = get_rank() + device_num = get_group_size() + + context.reset_auto_parallel_context() + context.set_auto_parallel_context(parallel_mode=context.ParallelMode.DATA_PARALLEL, gradients_mean=True, + device_num=device_num) + + else: + context.set_context(device_id=int(os.getenv('DEVICE_ID', '0'))) + + # Create train dataset + ds_train = create_train_dataset(args.mindrecord_file, args.batch_size, + device_num, device_id, num_workers=args.num_workers) + steps_per_epoch = ds_train.get_dataset_size() + network = PNetWithLoss() + + network.set_train(True) + + # decay lr + if args.distribute: + lr_scheduler = MultiEpochsDecayLR(args.lr, [7, 15, 20], steps_per_epoch) + else: + lr_scheduler = MultiEpochsDecayLR(args.lr, [6, 14, 20], steps_per_epoch) + + # optimizer + optimizer = nn.Adam(params=network.trainable_params(), learning_rate=lr_scheduler, weight_decay=1e-4) + + # train net + train_net = PNetTrainOneStepCell(network, optimizer) + train_net.set_train(True) + + print("Start training PNet") + for epoch in range(1, args.end_epoch+1): + step = 0 + time_list = [] + for d in ds_train.create_tuple_iterator(): + start_time = time.time() + loss = train_net(*d) + step += 1 + print(f'epoch: {epoch} step: {step}, loss is {loss}') + per_time = time.time() - start_time + time_list.append(per_time) + print('per step time: ', '%.2f' % (sum(time_list) / len(time_list) * 1000), "(ms/step)") + + if args.distribute and device_id == 0: + save_checkpoint(network, os.path.join(args.ckpt_path, 'pnet_distribute.ckpt')) + elif not args.distribute: + save_checkpoint(network, os.path.join(args.ckpt_path, 'pnet_standalone.ckpt')) diff --git a/research/cv/MTCNN/src/train_models/train_r_net.py b/research/cv/MTCNN/src/train_models/train_r_net.py new file mode 100644 index 0000000000000000000000000000000000000000..ddb570d5700267c140da81354cf27e5b83529de3 --- /dev/null +++ b/research/cv/MTCNN/src/train_models/train_r_net.py @@ -0,0 +1,81 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +import os +import time + +from mindspore import context, nn +from mindspore import save_checkpoint +from mindspore.communication import management as D +from mindspore.communication.management import get_group_size, get_rank +from src.models.mtcnn import RNetWithLoss, RNetTrainOneStepCell +from src.dataset import create_train_dataset +from src.utils import MultiEpochsDecayLR + +def train_rnet(args): + print("The argument is: ", args) + context.set_context(device_target=args.device_target, mode=context.GRAPH_MODE) + device_id = 0 + device_num = 1 + if args.distribute: + D.init() + device_id = get_rank() + device_num = get_group_size() + + context.reset_auto_parallel_context() + context.set_auto_parallel_context(parallel_mode=context.ParallelMode.DATA_PARALLEL, gradients_mean=True, + device_num=device_num) + + else: + context.set_context(device_id=int(os.getenv('DEVICE_ID', '0'))) + + # Create train dataset + ds_train = create_train_dataset(args.mindrecord_file, args.batch_size, + device_num, device_id, num_workers=args.num_workers) + steps_per_epoch = ds_train.get_dataset_size() + network = RNetWithLoss() + + network.set_train(True) + + # decay lr + if args.distribute: + lr_scheduler = MultiEpochsDecayLR(args.lr, [7, 15, 20], steps_per_epoch) + else: + lr_scheduler = MultiEpochsDecayLR(args.lr, [6, 14, 20], steps_per_epoch) + + # optimizer + optimizer = nn.Adam(params=network.trainable_params(), learning_rate=lr_scheduler, weight_decay=1e-4) + + # train net + train_net = RNetTrainOneStepCell(network, optimizer) + train_net.set_train(True) + + print("Start training RNet") + for epoch in range(1, args.end_epoch+1): + step = 0 + time_list = [] + for d in ds_train.create_tuple_iterator(): + start_time = time.time() + loss = train_net(*d) + step += 1 + print(f'epoch: {epoch} step: {step}, loss is {loss}') + per_time = time.time() - start_time + time_list.append(per_time) + print('per step time: ', '%.2f' % (sum(time_list) / len(time_list) * 1000), "(ms/step)") + + if args.distribute and device_id == 0: + save_checkpoint(network, os.path.join(args.ckpt_path, 'rnet_distribute.ckpt')) + elif not args.distribute: + save_checkpoint(network, os.path.join(args.ckpt_path, 'rnet_standalone.ckpt')) diff --git a/research/cv/MTCNN/src/utils.py b/research/cv/MTCNN/src/utils.py new file mode 100644 index 0000000000000000000000000000000000000000..f01e90d58c7674e664599cb9bb75442261d0dfce --- /dev/null +++ b/research/cv/MTCNN/src/utils.py @@ -0,0 +1,662 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +import shutil +import pickle +import os +import numpy as np +from numpy import random +import cv2 +from tqdm import tqdm +from mindspore.mindrecord import FileWriter +from mindspore.nn.learning_rate_schedule import LearningRateSchedule +from mindspore import ops, Tensor +from mindspore import dtype as mstype + +def IoU(box, boxes): + """Compute IoU between detect box and gt boxes + Parameters: + ---------- + box: numpy array , shape (5, ): x1, y1, x2, y2, score + input box + boxes: numpy array, shape (n, 4): x1, y1, x2, y2 + input ground truth boxes + Returns: + ------- + ovr: numpy.array, shape (n, ) + IoU + """ + box_area = (box[2] - box[0] + 1) * (box[3] - box[1] + 1) + area = (boxes[:, 2] - boxes[:, 0] + 1) * (boxes[:, 3] - boxes[:, 1] + 1) + xx1 = np.maximum(box[0], boxes[:, 0]) + yy1 = np.maximum(box[1], boxes[:, 1]) + xx2 = np.minimum(box[2], boxes[:, 2]) + yy2 = np.minimum(box[3], boxes[:, 3]) + + # compute the width and height of the bounding box + w = np.maximum(0, xx2 - xx1 + 1) + h = np.maximum(0, yy2 - yy1 + 1) + + inter = w * h + return inter / (box_area + area - inter + 1e-10) + +def convert_to_square(bbox): + """Convert bbox to square + Parameters: + ---------- + bbox: numpy array , shape n x 5 + input bbox + Returns: + ------- + square bbox + """ + square_bbox = bbox.copy() + + h = bbox[:, 3] - bbox[:, 1] + 1 + w = bbox[:, 2] - bbox[:, 0] + 1 + max_side = np.maximum(h, w) + square_bbox[:, 0] = bbox[:, 0] + w*0.5 - max_side*0.5 + square_bbox[:, 1] = bbox[:, 1] + h*0.5 - max_side*0.5 + square_bbox[:, 2] = square_bbox[:, 0] + max_side - 1 + square_bbox[:, 3] = square_bbox[:, 1] + max_side - 1 + return square_bbox + + +def nms(dets, thresh, mode="Union"): + """ + greedily select boxes with high confidence + keep boxes overlap <= thresh + rule out overlap > thresh + :param dets: [[x1, y1, x2, y2 score]] + :param thresh: retain overlap <= thresh + :return: indexes to keep + """ + x1 = dets[:, 0] + y1 = dets[:, 1] + x2 = dets[:, 2] + y2 = dets[:, 3] + scores = dets[:, 4] + + areas = (x2 - x1 + 1) * (y2 - y1 + 1) + order = scores.argsort()[::-1] + + keep = [] + while order.size > 0: + i = order[0] + keep.append(i) + xx1 = np.maximum(x1[i], x1[order[1:]]) + yy1 = np.maximum(y1[i], y1[order[1:]]) + xx2 = np.minimum(x2[i], x2[order[1:]]) + yy2 = np.minimum(y2[i], y2[order[1:]]) + + w = np.maximum(0.0, xx2 - xx1 + 1) + h = np.maximum(0.0, yy2 - yy1 + 1) + inter = w * h + if mode == "Union": + ovr = inter / (areas[i] + areas[order[1:]] - inter) + elif mode == "Minimum": + ovr = inter / np.minimum(areas[i], areas[order[1:]]) + + inds = np.where(ovr <= thresh)[0] + order = order[inds + 1] + + return keep + +def pad(bboxes, w, h): + tmpw, tmph = bboxes[:, 2] - bboxes[:, 0] + 1, bboxes[:, 3] - bboxes[:, 1] + 1 + num_box = bboxes.shape[0] + + dx, dy = np.zeros((num_box,)), np.zeros((num_box,)) + edx, edy = tmpw.copy() - 1, tmph.copy() - 1 + + x, y, ex, ey = bboxes[:, 0], bboxes[:, 1], bboxes[:, 2], bboxes[:, 3] + + tmp_index = np.where(ex > w - 1) + edx[tmp_index] = tmpw[tmp_index] + w - 2 - ex[tmp_index] + ex[tmp_index] = w - 1 + + tmp_index = np.where(ey > h - 1) + edy[tmp_index] = tmph[tmp_index] + h - 2 - ey[tmp_index] + ey[tmp_index] = h - 1 + + tmp_index = np.where(x < 0) + dx[tmp_index] = 0 - x[tmp_index] + x[tmp_index] = 0 + + tmp_index = np.where(y < 0) + dy[tmp_index] = 0 - y[tmp_index] + y[tmp_index] = 0 + + return_list = [dy, edy, dx, edx, y, ey, x, ex, tmpw, tmph] + return_list = [item.astype(np.int32) for item in return_list] + + return return_list + +def calibrate_box(bbox, reg): + bbox_c = bbox.copy() + w = bbox[:, 2] - bbox[:, 0] + 1 + w = np.expand_dims(w, 1) + h = bbox[:, 3] - bbox[:, 1] + 1 + h = np.expand_dims(h, 1) + reg_m = np.hstack([w, h, w, h]) + aug = reg_m * reg + bbox_c[:, 0:4] = bbox_c[:, 0:4] + aug + return bbox_c + +def process_image(img, scale): + """Preprocess image""" + height, width, _ = img.shape + new_height = int(height * scale) + new_width = int(width * scale) + new_dim = (new_width, new_height) + img_resized = cv2.resize(img, new_dim, interpolation=cv2.INTER_LINEAR) + + image = np.array(img_resized).astype(np.float32) + # HWC2CHW + image = image.transpose((2, 0, 1)) + # Normalize + image = (image - 127.5) / 128 + return image + +def generate_box(cls_map, reg, scale, threshold): + """get box""" + stride = 2 + cellsize = 12 + + t_index = np.where(cls_map > threshold) + + # Zero face + if t_index[0].size == 0: + return np.array([]) + + # Offset + dx1, dy1, dx2, dy2 = [reg[i, t_index[0], t_index[1]] for i in range(4)] + reg = np.array([dx1, dy1, dx2, dy2]) + score = cls_map[t_index[0], t_index[1]] + # Box, score, offset + boundingbox = np.vstack([np.round((stride * t_index[1]) / scale), + np.round((stride * t_index[0]) / scale), + np.round((stride * t_index[1] + cellsize) / scale), + np.round((stride * t_index[0] + cellsize) / scale), + score, + reg]) + # shape = [n, 9] + return boundingbox.T + +def read_annotation(data_path, label_path): + """Load image path and box from original dataset""" + data = dict() + images = [] + boxes = [] + with open(label_path, 'r') as f: + lines = f.readlines() + for line in lines: + labels = line.strip().split(' ') + # image path + imagepath = labels[0] + # if has empty line, then break + if not imagepath: + break + # absolute image path + imagepath = os.path.join(data_path, 'WIDER_train/images', imagepath + '.jpg') + images.append(imagepath) + + one_image_boxes = [] + for i in range(0, len(labels) - 1, 4): + xmin = float(labels[1 + i]) + ymin = float(labels[2 + i]) + xmax = float(labels[3 + i]) + ymax = float(labels[4 + i]) + + one_image_boxes.append([xmin, ymin, xmax, ymax]) + + boxes.append(one_image_boxes) + + data['images'] = images + data['boxes'] = boxes + return data + +def save_hard_example(data_path, save_size): + """Save data according to the predicted result""" + filename = os.path.join(data_path, 'wider_face_train.txt') + data = read_annotation(data_path, filename) + + im_idx_list = data['images'] + gt_boxes_list = data['boxes'] + + pos_save_dir = os.path.join(data_path, 'train_data/%d/positive' % save_size) + part_save_dir = os.path.join(data_path, 'train_data/%d/part' % save_size) + neg_save_dir = os.path.join(data_path, 'train_data/%d/negative' % save_size) + + if not os.path.exists(data_path): + os.makedirs(data_path) + if not os.path.exists(pos_save_dir): + os.mkdir(pos_save_dir) + if not os.path.exists(part_save_dir): + os.mkdir(part_save_dir) + if not os.path.exists(neg_save_dir): + os.mkdir(neg_save_dir) + + neg_file = open(os.path.join(data_path, 'train_data/%d/negative.txt' % save_size), 'w') + pos_file = open(os.path.join(data_path, 'train_data/%d/positive.txt' % save_size), 'w') + part_file = open(os.path.join(data_path, 'train_data/%d/part.txt' % save_size), 'w') + + det_boxes = pickle.load(open(os.path.join(data_path, 'train_data/%d/detections.pkl' % save_size), 'rb')) + + assert len(det_boxes) == len(im_idx_list), "Predicted result are not consistent with local data" + + n_idx, p_idx, d_idx = 0, 0, 0 + + pbar = tqdm(total=len(im_idx_list)) + for im_idx, dets, gts in zip(im_idx_list, det_boxes, gt_boxes_list): + pbar.update(1) + gts = np.array(gts, dtype=np.float32).reshape(-1, 4) + + if dets.shape[0] == 0: + continue + img = cv2.imread(im_idx) + + dets = convert_to_square(dets) + dets[:, 0:4] = np.round(dets[:, 0:4]) + + neg_num = 0 + for box in dets: + x_left, y_top, x_right, y_bottom, _ = box.astype(int) + width = x_right - x_left + 1 + height = y_bottom - y_top + 1 + + # delete small object + if width < 20 or x_left < 0 or y_top < 0 or x_right > img.shape[1] - 1 or y_bottom > img.shape[0] - 1: + continue + + # cal iou + iou = IoU(box, gts) + + cropped_im = img[y_top:y_bottom + 1, x_left:x_right + 1, :] + resized_im = cv2.resize(cropped_im, (save_size, save_size), interpolation=cv2.INTER_LINEAR) + + if np.max(iou) < 0.3 and neg_num < 60: + save_file = os.path.join(neg_save_dir, "%s.jpg" % n_idx) + + neg_file.write(save_file + ' 0\n') + cv2.imwrite(save_file, resized_im) + n_idx += 1 + neg_num += 1 + else: + idx = np.argmax(iou) + assigned_gt = gts[idx] + x1, y1, x2, y2 = assigned_gt + + # Offset + offset_x1 = (x1 - x_left) / float(width) + offset_y1 = (y1 - y_top) / float(height) + offset_x2 = (x2 - x_right) / float(width) + offset_y2 = (y2 - y_bottom) / float(height) + + # pos and part + if np.max(iou) >= 0.65: + save_file = os.path.join(pos_save_dir, "%s.jpg" % p_idx) + # label=1 + pos_file.write( + save_file + ' 1 %.2f %.2f %.2f %.2f\n' % (offset_x1, offset_y1, offset_x2, offset_y2)) + cv2.imwrite(save_file, resized_im) + p_idx += 1 + + elif np.max(iou) >= 0.4: + save_file = os.path.join(part_save_dir, "%s.jpg" % d_idx) + # label=-1 + part_file.write( + save_file + ' -1 %.2f %.2f %.2f %.2f\n' % (offset_x1, offset_y1, offset_x2, offset_y2)) + cv2.imwrite(save_file, resized_im) + d_idx += 1 + + pbar.close() + neg_file.close() + part_file.close() + pos_file.close() + +class BBox: + def __init__(self, box): + self.left = box[0] + self.top = box[1] + self.right = box[2] + self.bottom = box[3] + + self.x = box[0] + self.y = box[1] + self.w = box[2] - box[0] + self.h = box[3] - box[1] + + def project(self, point): + x = (point[0] - self.x) / self.w + y = (point[1] - self.y) / self.h + return np.asarray([x, y]) + + def reproject(self, point): + x = self.x + self.w * point[0] + y = self.y + self.h * point[1] + return np.asarray([x, y]) + + def reprojectLandmark(self, landmark): + p = np.zeros((len(landmark), 2)) + for i in range(len(landmark)): + p[i] = self.reproject(landmark[i]) + return p + + def projectLandmark(self, landmark): + p = np.zeros((len(landmark), 2)) + for i in range(len(landmark)): + p[i] = self.project(landmark[i]) + return p + +# flip image +def flip(face, landmark): + face_flipped_by_x = cv2.flip(face, 1) + landmark_ = np.asarray([(1 - x, y) for (x, y) in landmark]) + landmark_[[0, 1]] = landmark_[[1, 0]] + landmark_[[3, 4]] = landmark_[[4, 3]] + return face_flipped_by_x, landmark_ + +# rotate image +def rotate(img, box, landmark, alpha): + center = ((box.left + box.right) / 2, (box.top + box.bottom) / 2) + rot_mat = cv2.getRotationMatrix2D(center, alpha, 1) + img_rotated_by_alpha = cv2.warpAffine(img, rot_mat, (img.shape[1], img.shape[0])) + landmark_ = np.asarray([(rot_mat[0][0] * x + rot_mat[0][1] * y + rot_mat[0][2], + rot_mat[1][0] * x + rot_mat[1][1] * y + rot_mat[1][2]) for (x, y) in landmark]) + face = img_rotated_by_alpha[box.top:box.bottom + 1, box.left:box.right + 1] + return face, landmark_ + +def check_dir(dirs): + """Check directory""" + if isinstance(dirs, list): + for d in dirs: + if not os.path.exists(d): + os.mkdir(d) + else: + if not os.path.exists(dirs): + os.mkdir(dirs) + +def do_argument(image, resized_image, landmark_, box_, size_, F_imgs_, F_landmarks_): + """Flip, rotate image""" + if random.choice([0, 1]) > 0: + face_flipped, landmark_flipped = flip(resized_image, landmark_) + face_flipped = cv2.resize(face_flipped, (size_, size_)) + F_imgs_.append(face_flipped) + F_landmarks_.append(landmark_flipped.reshape(10)) + + if random.choice([0, 1]) > 0: + face_rotated_by_alpha, landmark_rorated = rotate(image, box_, box_.reprojectLandmark(landmark_), 5) + landmark_rorated = box_.projectLandmark(landmark_rorated) + face_rotated_by_alpha = cv2.resize(face_rotated_by_alpha, (size_, size_)) + F_imgs_.append(face_rotated_by_alpha) + F_landmarks_.append(landmark_rorated.reshape(10)) + face_flipped, landmark_flipped = flip(face_rotated_by_alpha, landmark_rorated) + face_flipped = cv2.resize(face_flipped, (size_, size_)) + F_imgs_.append(face_flipped) + F_landmarks_.append(landmark_flipped.reshape(10)) + + if random.choice([0, 1]) > 0: + face_rotated_by_alpha, landmark_rorated = rotate(image, box_, box_.reprojectLandmark(landmark_), -5) + landmark_rorated = box_.projectLandmark(landmark_rorated) + face_rotated_by_alpha = cv2.resize(face_rotated_by_alpha, (size_, size_)) + F_imgs_.append(face_rotated_by_alpha) + F_landmarks_.append(landmark_rorated.reshape(10)) + face_flipped, landmark_flipped = flip(face_rotated_by_alpha, landmark_rorated) + face_flipped = cv2.resize(face_flipped, (size_, size_)) + F_imgs_.append(face_flipped) + F_landmarks_.append(landmark_flipped.reshape(10)) + +def crop_landmark_image(data_dir, data_list, size, argument=True): + """crop and save landmark image""" + npr = np.random + image_id = 0 + output = os.path.join(data_dir, str(size)) + + check_dir(output) + dstdir = os.path.join(output, 'landmark') + + check_dir(dstdir) + f = open(os.path.join(output, 'landmark.txt'), 'w') + + idx = 0 + for (imgPath, box, landmarkGt) in tqdm(data_list): + F_imgs = [] + F_landmarks = [] + img = cv2.imread(imgPath) + img_h, img_w, _ = img.shape + gt_box = np.array([box.left, box.top, box.right, box.bottom]) + f_face = img[box.top:box.bottom + 1, box.left:box.right + 1] + try: + f_face = cv2.resize(f_face, (size, size)) + except ValueError as e: + print(e) + continue + landmark = np.zeros((5, 2)) + for index, one in enumerate(landmarkGt): + rv = ((one[0] - gt_box[0]) / (gt_box[2] - gt_box[0]), (one[1] - gt_box[1]) / (gt_box[3] - gt_box[1])) + landmark[index] = rv + F_imgs.append(f_face) + F_landmarks.append(landmark.reshape(10)) + if argument: + landmark = np.zeros((5, 2)) + idx = idx + 1 + x1, y1, x2, y2 = gt_box + gt_w = x2 - x1 + 1 + gt_h = y2 - y1 + 1 + if max(gt_w, gt_h) < 40 or x1 < 0 or y1 < 0: + continue + for i in range(10): + box_size = npr.randint(int(min(gt_w, gt_h) * 0.8), np.ceil(1.25 * max(gt_w, gt_h))) + try: + delta_x = npr.randint(-gt_w * 0.2, gt_w * 0.2) + delta_y = npr.randint(-gt_h * 0.2, gt_h * 0.2) + except ValueError as e: + print(e) + continue + nx1 = int(max(x1 + gt_w / 2 - box_size / 2 + delta_x, 0)) + ny1 = int(max(y1 + gt_h / 2 - box_size / 2 + delta_y, 0)) + nx2 = nx1 + box_size + ny2 = ny1 + box_size + if nx2 > img_w or ny2 > img_h: + continue + crop_box = np.array([nx1, ny1, nx2, ny2]) + cropped_im = img[ny1:ny2 + 1, nx1:nx2 + 1, :] + resized_im = cv2.resize(cropped_im, (size, size)) + iou = IoU(crop_box, np.expand_dims(gt_box, 0)) + if iou > 0.65: + F_imgs.append(resized_im) + for index, one in enumerate(landmarkGt): + rv = ((one[0] - nx1) / box_size, (one[1] - ny1) / box_size) + landmark[index] = rv + F_landmarks.append(landmark.reshape(10)) + landmark = np.zeros((5, 2)) + landmark_ = F_landmarks[-1].reshape(-1, 2) + box = BBox([nx1, ny1, nx2, ny2]) + do_argument(img, resized_im, landmark_, box, size, F_imgs, F_landmarks) + F_imgs, F_landmarks = np.asarray(F_imgs), np.asarray(F_landmarks) + for i in range(len(F_imgs)): + if np.sum(np.where(F_landmarks[i] <= 0, 1, 0)) > 0: + continue + if np.sum(np.where(F_landmarks[i] >= 1, 1, 0)) > 0: + continue + cv2.imwrite(os.path.join(dstdir, '%d.jpg' % (image_id)), F_imgs[i]) + landmarks = list(map(str, list(F_landmarks[i]))) + f.write(os.path.join(dstdir, '%d.jpg' % (image_id)) + ' -2 ' + ' '.join(landmarks) + '\n') + image_id += 1 + f.close() + + +def combine_data_list(data_dir): + """Combine all data list""" + with open(os.path.join(data_dir, 'positive.txt'), 'r') as f: + pos = f.readlines() + with open(os.path.join(data_dir, 'negative.txt'), 'r') as f: + neg = f.readlines() + with open(os.path.join(data_dir, 'part.txt'), 'r') as f: + part = f.readlines() + with open(os.path.join(data_dir, 'landmark.txt'), 'r') as f: + landmark = f.readlines() + + with open(os.path.join(data_dir, 'all_data_list.txt'), 'w') as f: + base_num = len(pos) // 1000 * 1000 + + print(f"Original: neg {len(neg)} pos {len(pos)} part {len(part)} landmark {len(landmark)} base {base_num}") + + neg_keep = random.choice(len(neg), size=base_num * 3, replace=base_num * 3 > len(neg)) + part_keep = random.choice(len(part), size=base_num, replace=base_num > len(part)) + pos_keep = random.choice(len(pos), size=base_num, replace=base_num > len(pos)) + landmark_keep = random.choice(len(landmark), size=base_num * 2, replace=base_num * 2 > len(landmark)) + + print(f"After sampling: neg {len(neg_keep)} pos {len(pos_keep)} part {len(part_keep)} \ + landmark {len(landmark_keep)}") + + for i in pos_keep: + f.write(pos[i].replace('\\', '/')) + for i in neg_keep: + f.write(neg[i].replace('\\', '/')) + for i in part_keep: + f.write(part[i].replace('\\', '/')) + for i in landmark_keep: + f.write(landmark[i].replace('\\', '/')) + + +def data_to_mindrecord(data_folder, mindrecord_prefix, mindrecord_name): + # Load all data list + data_list_path = os.path.join(data_folder, 'all_data_list.txt') + with open(data_list_path, 'r') as f: + train_list = f.readlines() + + if not os.path.exists(mindrecord_prefix): + os.mkdir(mindrecord_prefix) + mindrecord_path = os.path.join(mindrecord_prefix, mindrecord_name) + writer = FileWriter(mindrecord_path, 1, overwrite=True) + + mtcnn_json = { + "image": {"type": "bytes"}, + "label": {"type": "int32"}, + "box_target": {"type": "float32", "shape": [4]}, + "landmark_target": {"type": "float32", "shape": [10]} + } + + writer.add_schema(mtcnn_json, "mtcnn_json") + + count = 0 + for item in tqdm(train_list): + sample = item.split(' ') + image = sample[0] + label = int(sample[1]) + box = [0] * 4 + landmark = [0] * 10 + + # Only has box + if len(sample) == 6: + box = sample[2:] + + # Only has landmark + if len(sample) == 12: + landmark = sample[2:] + box = np.array(box).astype(np.float32) + landmark = np.array(landmark).astype(np.float32) + img = cv2.imread(image) + _, encoded_img = cv2.imencode('.jpg', img) + + row = { + "image": encoded_img.tobytes(), + "label": label, + "box_target": box, + "landmark_target": landmark + } + writer.write_raw_data([row]) + count += 1 + writer.commit() + print("Total train data: ", count) + print("Create mindrecord done!") + + +def get_landmark_from_lfw_neg(dataset_path, with_landmark=True): + """Get landmark data""" + + anno_file = os.path.join(dataset_path, 'trainImageList.txt') + with open(anno_file, 'r') as f: + annotations = f.readlines() + result = [] + for annotation in annotations: + annotation = annotation.strip().split(' ') + img_path = os.path.join(dataset_path, annotation[0]).replace('\\', '/') + + # box + box = (annotation[1], annotation[3], annotation[2], annotation[4]) + box = [float(_) for _ in box] + box = list(map(int, box)) + + if not with_landmark: + result.append((img_path, BBox(box))) + continue + + # 5 landmark points + landmark = np.zeros((5, 2)) + for index in range(5): + rv = (float(annotation[5 + 2 * index]), float(annotation[5 + 2 * index + 1])) + landmark[index] = rv + result.append((img_path, BBox(box), landmark)) + + return result + +def delete_old_img(old_image_folder, image_size): + """Delete original data""" + shutil.rmtree(os.path.join(old_image_folder, str(image_size), 'positive'), ignore_errors=True) + shutil.rmtree(os.path.join(old_image_folder, str(image_size), 'negative'), ignore_errors=True) + shutil.rmtree(os.path.join(old_image_folder, str(image_size), 'part'), ignore_errors=True) + shutil.rmtree(os.path.join(old_image_folder, str(image_size), 'landmark'), ignore_errors=True) + + os.remove(os.path.join(old_image_folder, str(image_size), 'positive.txt')) + os.remove(os.path.join(old_image_folder, str(image_size), 'negative.txt')) + os.remove(os.path.join(old_image_folder, str(image_size), 'part.txt')) + os.remove(os.path.join(old_image_folder, str(image_size), 'landmark.txt')) + +class MultiEpochsDecayLR(LearningRateSchedule): + """ + Calculate learning rate base on multi epochs decay function. + + Args: + learning_rate(float): Initial learning rate. + multi_steps(list int): The steps corresponding to decay learning rate. + steps_per_epoch(int): How many steps for each epoch. + factor(int): Learning rate decay factor. Default: 10. + + Returns: + Tensor, learning rate. + """ + def __init__(self, learning_rate, multi_epochs, steps_per_epoch, factor=10): + super(MultiEpochsDecayLR, self).__init__() + if not isinstance(multi_epochs, (list, tuple)): + raise TypeError("multi_epochs must be list or tuple.") + self.multi_epochs = Tensor(np.array(multi_epochs, dtype=np.float32) * steps_per_epoch) + self.num = len(multi_epochs) + self.start_learning_rate = learning_rate + self.steps_per_epoch = steps_per_epoch + self.factor = factor + self.pow = ops.Pow() + self.cast = ops.Cast() + self.less_equal = ops.LessEqual() + self.reduce_sum = ops.ReduceSum() + + def construct(self, global_step): + cur_step = self.cast(global_step, mstype.float32) + multi_epochs = self.cast(self.multi_epochs, mstype.float32) + epochs = self.cast(self.less_equal(multi_epochs, cur_step), mstype.float32) + lr = self.start_learning_rate / self.pow(self.factor, self.reduce_sum(epochs, ())) + return lr diff --git a/research/cv/MTCNN/train.py b/research/cv/MTCNN/train.py new file mode 100644 index 0000000000000000000000000000000000000000..159d03b0d22fa233be3a83787ace3db681ffb91a --- /dev/null +++ b/research/cv/MTCNN/train.py @@ -0,0 +1,53 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +import argparse +from mindspore.common import set_seed +from src.train_models import train_p_net, train_r_net, train_o_net +import config as cfg + + +def parse_args(): + parser = argparse.ArgumentParser(description='Train PNet/RNet/ONet') + parser.add_argument('--model', required=True, type=str, choices=['pnet', 'rnet', 'onet'], + help="Choose model to train") + parser.add_argument('--mindrecord_file', dest='mindrecord_file', + required=True, help='mindrecord file for training', type=str) + parser.add_argument('--ckpt_path', dest='ckpt_path', default=cfg.CKPT_DIR, + help='save checkpoint directory', type=str) + parser.add_argument('--save_ckpt_steps', default=1000, type=int, help='steps to save checkpoint') + parser.add_argument('--end_epoch', dest='end_epoch', help='end epoch of training', + default=cfg.END_EPOCH, type=int) + parser.add_argument('--lr', dest='lr', help='learning rate', + default=cfg.TRAIN_LR, type=float) + parser.add_argument('--batch_size', dest='batch_size', help='train batch size', + default=cfg.TRAIN_BATCH_SIZE, type=int) + parser.add_argument('--device_target', dest='device_target', help='device for training', choices=['GPU', 'Ascend'], + default='GPU', type=str) + parser.add_argument('--distribute', dest='distribute', default=False, action='store_true') + parser.add_argument('--num_workers', type=int, default=8) + + args_ = parser.parse_args() + return args_ + +if __name__ == '__main__': + args = parse_args() + set_seed(66) + if args.model == 'pnet': + train_p_net.train_pnet(args) + if args.model == 'rnet': + train_r_net.train_rnet(args) + if args.model == 'onet': + train_o_net.train_onet(args)