diff --git a/research/cv/RefineDet/README_CN.md b/research/cv/RefineDet/README_CN.md new file mode 100644 index 0000000000000000000000000000000000000000..07bd5813ee7ac53b6461aced419b384247c3b439 --- /dev/null +++ b/research/cv/RefineDet/README_CN.md @@ -0,0 +1,445 @@ +# 目录 + +<!-- TOC --> + +- [目录](#目录) +- [模型说明](#RefineDet说明) +- [模型架构](#模型架构) +- [数据集](#数据集) +- [环境要求](#环境要求) +- [快速入门](#快速入门) +- [脚本说明](#脚本说明) + - [脚本及样例代码](#脚本及样例代码) + - [脚本参数](#脚本参数) + - [训练过程](#训练过程) + - [Ascend上训练](#ascend上训练) + <!-- - [GPU训练](#gpu训练)--> + - [评估过程](#评估过程) + - [Ascend处理器环境评估](#ascend处理器环境评估) + +<!-- + + - [GPU处理器环境评估](#gpu处理器环境评估) + - [推理过程](#推理过程) + - [导出MindIR](#导出mindir) + - [在Ascend310执行推理](#在ascend310执行推理) + - [结果](#结果) +- [模型描述](#模型描述) + - [性能](#性能) + - [评估性能](#评估性能) + - [推理性能](#推理性能) +- [随机情况说明](#随机情况说明) + +--> +<!-- /TOC --> + +# RefineDet说明 + +RefineDet是CVPR 2018中提出的一种目标检测模型。它融合one-stage方法和two-stage方法的优点(前者更快,后者更准)并克服了它们的缺点。通过使用ARM对随机生成的检测框进行先一步的回归,再使用TCB模块融合多尺度的特征,最后使用类似SSD的回归和分类结构大大提高了目标检测的速度和精度。 + +[论文](https://arxiv.org/pdf/1711.06897.pdf): S. Zhang, L. Wen, X. Bian, Z. Lei and S. Z. Li, "Single-shot refinement neural network for object detection", Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., pp. 4203-4212, Jun. 2018. + +# 模型架构 + +RefineDet的结构由三部分组成————负责预回归检测框的ARM,检测目标的ODM和连接两者的TCB + + +RefineDet的结构,图片来自原论文 + +特征提取部分使用VGG-16作为backbone + +# 数据集 + +使用的数据集: [COCO2017](<http://images.cocodataset.org/>) + +- 数据集大小:19 GB + - 训练集:18 GB,118000张图像 + - 验证集:1 GB,5000张图像 + - 标注:241 MB,实例,字幕,person_keypoints等 +- 数据格式:图像和json文件 + - 注意:数据在dataset.py中处理 + +# 环境要求 + +- 安装[MindSpore](https://www.mindspore.cn/install)。 + +- 下载数据集COCO2017。 + +- 本示例默认使用COCO2017作为训练数据集,您也可以使用自己的数据集。 + + 1. 如果使用coco数据集。**执行脚本时选择数据集coco。** + 安装Cython、pycocotool和opencv进行数据处理。 + + ```python + pip install Cython + + pip install pycocotools + + pip install opencv-python + ``` + + 并在`config.py`中更改COCO_ROOT和其他您需要的设置。目录结构如下: + + ```text + . + └─cocodataset + ├─annotations + ├─instance_train2017.json + └─instance_val2017.json + ├─val2017 + └─train2017 + + ``` + + 2. 如果使用自己的数据集。**执行脚本时选择数据集为other。** + 将数据集信息整理成TXT文件,每行如下: + + ```text + train2017/0000001.jpg 0,259,401,459,7 35,28,324,201,2 0,30,59,80,2 + + ``` + + 每行是按空间分割的图像标注,第一列是图像的相对路径,其余为[xmin,ymin,xmax,ymax,class]格式的框和类信息。我们从`IMAGE_DIR`(数据集目录)和`ANNO_PATH`(TXT文件路径)的相对路径连接起来的图像路径中读取图像。在`config.py`中设置`IMAGE_DIR`和`ANNO_PATH`。 + +# 快速入门 + +通过官方网站安装MindSpore后,您可以按照如下步骤进行训练和评估: + +- Ascend处理器环境运行 + +```shell script +# Ascend单卡直接训练示例 +python train.py --device_id=0 --epoch_size=500 --dataset=coco +# 或者 +bash run_standardalone_train.sh [DEVICE_ID] [EPOCH_SIZE] [LR] [DATASET] +# 示例 +bash run_standardalone_train.sh 0 500 0.05 coco +``` + +```shell script +# Ascend分布式训练 +bash run_distribute_train.sh [DEVICE_NUM] [EPOCH_SIZE] [LR] [DATASET] [RANK_TABLE_FILE] +# 示例 +bash run_distribute_train.sh 8 500 0.05 coco ./hccl_rank_tabel_8p.json +``` + +在modelarts上训练请运行train_modelarts.py,参数设置除data_url与train_url外与直接运行单卡的参数相同 + +```shell script +# Ascend处理器环境运行eval +bash run_eval.sh [DATASET] [CHECKPOINT_PATH] [DEVICE_ID] +# 示例 +bash run_eval.sh coco ./ckpt/refinedet.ckpt 0 +# 或直接运行eval.py,示例如下 +python eval.py --dataset=coco --device_id=0 --checkpoint_path=./ckpt/refinedet.ckpt +``` + +<!--- + +- GPU处理器环境运行 + +```shell script +# GPU分布式训练 +sh run_distribute_train_gpu.sh [DEVICE_NUM] [EPOCH_SIZE] [LR] [DATASET] +``` + +```shell script +# GPU处理器环境运行eval +sh run_eval_gpu.sh [DATASET] [CHECKPOINT_PATH] [DEVICE_ID] +``` + +--> + +# 脚本说明 + +## 脚本及样例代码 + +```text +. +└─ cv + └─ RefineDet + ├─ README.md ## SSD相关说明 + ├─ scripts + ├─ run_distribute_train.sh ## Ascend分布式shell脚本 + ├─ run_distribute_train_gpu.sh ## GPU分布式shell脚本 + ├─ run_eval.sh ## Ascend评估shell脚本 + └─ run_eval_gpu.sh ## GPU评估shell脚本 + ├─ src + ├─ anchor_generator.py ## 生成初始的随机检测框的脚本 + ├─ box_utils.py ## bbox处理脚本 + ├─ config.py ## 总的config文件 + ├─ dataset.py ## 处理并生成数据集的脚本 + ├─ eval_utils.py ## 评估函数的脚本 + ├─ init_params.py ## 初始化网络参数的脚本 + ├─ __init__.py + ├─ l2norm.py ## 实现L2 Normalization的脚本 + ├─ lr_schedule.py ## 实现动态学习率的脚本 + ├─ multibox.py ## 实现多检测框回归的脚本 + ├─ refinedet_loss_cell.py ## 实现loss函数的脚本 + ├─ refinedet.py ## 定义了整个网络框架的脚本 + ├─ resnet101_for_refinedet.py ## 实现了resnet101作为backbone + └─ vgg16_for_refinedet.py ## 实现了vgg16作为backbone + ├─ eval.py ## 评估脚本 + ├─ train.py ## 训练脚本 + └─ train_modelarts.py ## 用于在modelarts云环境上训练的脚本 +``` + +## 脚本参数 + + ```text + train.py和config.py中主要参数如下: + + "device_num": 1 # 使用设备数量 + "lr": 0.05 # 学习率初始值 + "dataset": coco # 数据集名称 + "epoch_size": 500 # 轮次大小 + "batch_size": 32 # 输入张量的批次大小 + "pre_trained": None # 预训练检查点文件路径 + "pre_trained_epoch_size": 0 # 预训练轮次大小 + "save_checkpoint_epochs": 10 # 两个检查点之间的轮次间隔。默认情况下,每10个轮次都会保存检查点。 + "loss_scale": 1024 # 损失放大 + + "class_num": 81 # 数据集类数 + "image_shape": [320, 320] # 作为模型输入的图像高和宽 + "mindrecord_dir": "/data/MindRecord" # MindRecord路径 + "coco_root": "/data/coco2017" # COCO2017数据集路径 + "voc_root": "" # VOC原始数据集路径 + "image_dir": "" # 其他数据集图片路径,如果使用coco或voc,此参数无效。 + "anno_path": "" # 其他数据集标注路径,如果使用coco或voc,此参数无效。 + + ``` + +## 训练过程 + +运行`train.py`训练模型。如果`mindrecord_dir`为空,则会通过`coco_root`(coco数据集)或`image_dir`和`anno_path`(自己的数据集)生成[MindRecord](https://www.mindspore.cn/tutorial/training/zh-CN/master/advanced_use/convert_dataset.html)文件。**注意,如果mindrecord_dir不为空,将使用mindrecord_dir代替原始图像。** + +### Ascend上训练 + +- 分布式 + +```shell script + bash run_distribute_train.sh [DEVICE_NUM] [EPOCH_SIZE] [LR] [DATASET] [RANK_TABLE_FILE] [PRE_TRAINED](optional) [PRE_TRAINED_EPOCH_SIZE](optional) +``` + +此脚本需要五或七个参数。 + +- `DEVICE_NUM`:分布式训练的设备数。 +- `EPOCH_NUM`:分布式训练的轮次数。 +- `LR`:分布式训练的学习率初始值。 +- `DATASET`:分布式训练的数据集模式。 +- `RANK_TABLE_FILE`:[rank_table.json](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/utils/hccl_tools)的路径。最好使用绝对路径。 +- `PRE_TRAINED`:预训练检查点文件的路径。最好使用绝对路径。 +- `PRE_TRAINED_EPOCH_SIZE`:预训练的轮次数。 + + 训练结果保存在当前路径中,文件夹名称以"LOG"开头。 您可在此文件夹中找到检查点文件以及结果,如下所示。 + +```text +epoch: 1 step: 458, loss is 3.1681802 +epoch time: 228752.4654865265, per step time: 499.4595316299705 +epoch: 2 step: 458, loss is 2.8847265 +epoch time: 38912.93382644653, per step time: 84.96273761232868 +epoch: 3 step: 458, loss is 2.8398118 +epoch time: 38769.184827804565, per step time: 84.64887516987896 +... + +epoch: 498 step: 458, loss is 0.70908034 +epoch time: 38771.079778671265, per step time: 84.65301261718616 +epoch: 499 step: 458, loss is 0.7974688 +epoch time: 38787.413120269775, per step time: 84.68867493508685 +epoch: 500 step: 458, loss is 0.5548882 +epoch time: 39064.8467540741, per step time: 85.29442522723602 +``` + +<!--- +### GPU训练 + +- 分布式 + +```shell script + sh run_distribute_train_gpu.sh [DEVICE_NUM] [EPOCH_SIZE] [LR] [DATASET] [PRE_TRAINED](optional) [PRE_TRAINED_EPOCH_SIZE](optional) +``` + +此脚本需要五或七个参数。 + +- `DEVICE_NUM`:分布式训练的设备数。 +- `EPOCH_NUM`:分布式训练的轮次数。 +- `LR`:分布式训练的学习率初始值。 +- `DATASET`:分布式训练的数据集模式。 +- `PRE_TRAINED`:预训练检查点文件的路径。最好使用绝对路径。 +- `PRE_TRAINED_EPOCH_SIZE`:预训练的轮次数。 + + 训练结果保存在当前路径中,文件夹名称以"LOG"开头。 您可在此文件夹中找到检查点文件以及结果,如下所示。 + +```text +epoch: 1 step: 1, loss is 420.11783 +epoch: 1 step: 2, loss is 434.11032 +epoch: 1 step: 3, loss is 476.802 +... +epoch: 1 step: 458, loss is 3.1283689 +epoch time: 150753.701, per step time: 329.157 +... + +``` + +--> + +## 评估过程 + +### Ascend处理器环境评估 + +```shell script +bash run_eval.sh [DATASET] [CHECKPOINT_PATH] [DEVICE_ID] +``` + +此脚本需要两个参数。 + +- `DATASET`:评估数据集的模式。 +- `CHECKPOINT_PATH`:检查点文件的绝对路径。 +- `DEVICE_ID`: 评估的设备ID。 + +> 在训练过程中可以生成检查点。 + +推理结果保存在示例路径中,文件夹名称以“eval”开头。您可以在日志中找到类似以下的结果。 + +```text +Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.238 +Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.400 +Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.240 +Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.039 +Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.198 +Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.438 +Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.250 +Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.389 +Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.424 +Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.122 +Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.434 +Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.697 + +======================================== + +mAP: 0.23808886505483504 +``` + +<!-- +### GPU处理器环境评估 + +```shell script +sh run_eval_gpu.sh [DATASET] [CHECKPOINT_PATH] [DEVICE_ID] +``` + +此脚本需要两个参数。 + +- `DATASET`:评估数据集的模式。 +- `CHECKPOINT_PATH`:检查点文件的绝对路径。 +- `DEVICE_ID`: 评估的设备ID。 + +> 在训练过程中可以生成检查点。 + +推理结果保存在示例路径中,文件夹名称以“eval”开头。您可以在日志中找到类似以下的结果。 + +```text +Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.224 +Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.375 +Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.228 +Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.034 +Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.189 +Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.407 +Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.243 +Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.382 +Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.417 +Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.120 +Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.425 +Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.686 + +======================================== + +mAP: 0.2244936111705981 +``` + +## 推理过程 + +### [导出MindIR](#contents) + +```shell +python export.py --ckpt_file [CKPT_PATH] --file_name [FILE_NAME] --file_format [FILE_FORMAT] +``` + +参数ckpt_file为必填项, +`EXPORT_FORMAT` 必须在 ["AIR", "MINDIR"]中选择。 + +### 在Ascend310执行推理 + +在执行推理前,mindir文件必须通过`export.py`脚本导出。以下展示了使用minir模型执行推理的示例。 +目前仅支持batch_Size为1的推理。精度计算过程需要70G+的内存,否则进程将会因为超出内存被系统终止。 + +```shell +# Ascend310 inference +bash run_infer_310.sh [MINDIR_PATH] [DATA_PATH] [DVPP] [DEVICE_ID] +``` + +- `DVPP` 为必填项,需要在["DVPP", "CPU"]选择,大小写均可。需要注意的是ssd_vgg16执行推理的图片尺寸为[300, 300],由于DVPP硬件限制宽为16整除,高为2整除,因此,这个网络需要通过CPU算子对图像进行前处理。 +- `DEVICE_ID` 可选,默认值为0。 + +### 结果 + +推理结果保存在脚本执行的当前路径,你可以在acc.log中看到以下精度计算结果。 + +```bash +Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.339 +Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.521 +Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.370 +Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.168 +Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.386 +Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.461 +Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.310 +Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.481 +Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.515 +Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.293 +Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.659 +mAP: 0.33880018942412393 +``` + +# 模型描述 + +## 性能 + +### 评估性能 + +| 参数 | Ascend | GPU | +| -------------------------- | -------------------------------------------------------------| -------------------------------------------------------------| +| 模型版本 | SSD V1 | SSD V1 | +| 资源 | Ascend 910;CPU: 2.60GHz,192核;内存:755 GB | NV SMX2 V100-16G | +| 上传日期 | 2021-06-01 | 2021-09-24 | +| MindSpore版本 | 0.3.0-alpha | 1.0.0 | +| 数据集 | COCO2017 | COCO2017 | +| 训练参数 | epoch = 500, batch_size = 32 | epoch = 800, batch_size = 32 | +| 优化器 | Momentum | Momentum | +| 损失函数 | Sigmoid交叉熵,SmoothL1Loss | Sigmoid交叉熵,SmoothL1Loss | +| 速度 | 8卡:90毫秒/步 | 8卡:121毫秒/步 | +| 总时长 | 8卡:4.81小时 | 8卡:12.31小时 | +| 参数(M) | 34 | 34 | +|脚本 | https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/ssd | https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/ssd | + +### 推理性能 + +| 参数 | Ascend | GPU | +| ------------------- | ----------------------------| ----------------------------| +| 模型版本 | SSD V1 | SSD V1 | +| 资源 | Ascend 910 | GPU | +| 上传日期 | 2021-06-01 | 2021-09-24 | +| MindSpore版本 | 0.3.0-alpha | 1.0.0 | +| 数据集 | COCO2017 | COCO2017 | +| batch_size | 1 | 1 | +| 输出 | mAP | mAP | +| 准确率 | IoU=0.50: 23.8% | IoU=0.50: 22.4% | +| 推理模型 | 34M(.ckpt文件) | 34M(.ckpt文件) | + +# 随机情况说明 + +dataset.py中设置了“create_dataset”函数内的种子,同时还使用了train.py中的随机种子。 + +# ModelZoo主页 + + 请浏览官网[主页](https://gitee.com/mindspore/mindspore/tree/master/model_zoo)。 + + --> \ No newline at end of file diff --git a/research/cv/RefineDet/eval.py b/research/cv/RefineDet/eval.py new file mode 100644 index 0000000000000000000000000000000000000000..b0af49138b7bef74f202768a8d1065b1fa31b66b --- /dev/null +++ b/research/cv/RefineDet/eval.py @@ -0,0 +1,118 @@ +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +"""Evaluation for RefineDet""" + +import os +import argparse +import time +import numpy as np +from mindspore import context, Tensor +from mindspore.train.serialization import load_checkpoint, load_param_into_net +from src.eval_utils import coco_metrics +from src.eval_utils import voc_metrics +from src.box_utils import box_init +from src.config import get_config +from src.dataset import create_refinedet_dataset, create_mindrecord +from src.refinedet import refinedet_vgg16, refinedet_resnet101, RefineDetInferWithDecoder + +def refinedet_eval(net_config, dataset_path, ckpt_path, anno_json, net_metrics): + """RefineDet evaluation.""" + batch_size = 1 + ds = create_refinedet_dataset(net_config, dataset_path, batch_size=batch_size, repeat_num=1, + is_training=False, use_multiprocessing=False) + if net_config.model == "refinedet_vgg16": + net = refinedet_vgg16(net_config, is_training=False) + elif net_config.model == "refinedet_resnet101": + net = refinedet_resnet101(net_config, is_training=False) + else: + raise ValueError(f'config.model: {net_config.model} is not supported') + default_boxes = box_init(net_config) + net = RefineDetInferWithDecoder(net, Tensor(default_boxes), net_config) + + print("Load Checkpoint!") + param_dict = load_checkpoint(ckpt_path) + net.init_parameters_data() + load_param_into_net(net, param_dict) + + net.set_train(False) + i = batch_size + total = ds.get_dataset_size() * batch_size + start = time.time() + pred_data = [] + print("\n========================================\n") + print("total images num: ", total) + print("Processing, please wait a moment.") + for data in ds.create_dict_iterator(output_numpy=True, num_epochs=1): + img_id = data['img_id'] + img_np = data['image'] + image_shape = data['image_shape'] + + output = net(Tensor(img_np)) + for batch_idx in range(img_np.shape[0]): + pred_data.append({"boxes": output[0].asnumpy()[batch_idx], + "box_scores": output[1].asnumpy()[batch_idx], + "img_id": int(np.squeeze(img_id[batch_idx])), + "image_shape": image_shape[batch_idx]}) + percent = round(i / total * 100., 2) + + print(f' {str(percent)} [{i}/{total}]', end='\r') + i += batch_size + cost_time = int((time.time() - start) * 1000) + print(f' 100% [{total}/{total}] cost {cost_time} ms') + mAP = net_metrics(pred_data, anno_json, net_config) + print("\n========================================\n") + print(f"mAP: {mAP}") + +def get_eval_args(): + """Get args for eval""" + parser = argparse.ArgumentParser(description='RefineDet evaluation') + parser.add_argument("--using_mode", type=str, default="refinedet_vgg16_320", + choices=("refinedet_vgg16_320", "refinedet_vgg16_512", + "refinedet_resnet101_320", "refinedet_resnet101_512"), + help="using mode, same as training.") + parser.add_argument("--device_id", type=int, default=0, help="Device id, default is 0.") + parser.add_argument("--dataset", type=str, default="coco", help="Dataset, default is coco.") + parser.add_argument("--checkpoint_path", type=str, required=True, help="Checkpoint file path.") + parser.add_argument("--run_platform", type=str, default="Ascend", choices=("Ascend", "GPU", "CPU"), + help="run platform, support Ascend ,GPU and CPU.") + parser.add_argument('--debug', type=str, default="0", choices=["0", "1"], + help="Active the debug mode. Under debug mode, the network would be run as PyNative mode.") + return parser.parse_args() + +if __name__ == '__main__': + args_opt = get_eval_args() + config = get_config(args_opt.using_mode, args_opt.dataset) + box_init(config) + if args_opt.dataset == "coco": + json_path = os.path.join(config.coco_root, config.instances_set.format(config.val_data_type)) + elif args_opt.dataset[:3] == "voc": + json_path = os.path.join(config.voc_root, config.voc_json) + else: + json_path = config.instances_set + + if args_opt.debug == "1": + network_mode = context.PYNATIVE_MODE + else: + network_mode = context.GRAPH_MODE + + context.set_context(mode=network_mode, device_target=args_opt.run_platform, device_id=args_opt.device_id) + + mindrecord_file = create_mindrecord(config, args_opt.dataset, + "refinedet_eval.mindrecord", False, + file_num=1) + + print("Start Eval!") + metrics = coco_metrics if args_opt.dataset == 'coco' else voc_metrics + refinedet_eval(config, mindrecord_file, args_opt.checkpoint_path, json_path, metrics) diff --git a/research/cv/RefineDet/export.py b/research/cv/RefineDet/export.py new file mode 100644 index 0000000000000000000000000000000000000000..1ee443be25ff8f76c7b385911632ce48733067da --- /dev/null +++ b/research/cv/RefineDet/export.py @@ -0,0 +1,63 @@ +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +"""Export mindir or air model for refinedet""" +import argparse +import numpy as np + +import mindspore +from mindspore import context, Tensor +from mindspore.train.serialization import load_checkpoint, load_param_into_net, export +from src.refinedet import refinedet_vgg16, refinedet_resnet101, RefineDetInferWithDecoder +from src.config import get_config +from src.box_utils import box_init + +parser = argparse.ArgumentParser(description='RefineDet export') +parser.add_argument("--device_id", type=int, default=0, help="Device id") +parser.add_argument("--batch_size", type=int, default=1, help="batch size") +parser.add_argument("--ckpt_file", type=str, required=True, help="Checkpoint file path.") +parser.add_argument("--dataset", type=str, default="coco", help="Dataset, default is coco.") +parser.add_argument("--using_mode", type=str, default="refinedet_vgg16_320", + choices=("refinedet_vgg16_320", "refinedet_vgg16_512", + "refinedet_resnet101_320", "refinedet_resnet101_512"), + help="using mode, same as training.") +parser.add_argument("--file_name", type=str, default="refinedet", help="output file name.") +parser.add_argument('--file_format', type=str, choices=["AIR", "MINDIR", "ONNX"], default='MINDIR', help='file format') +parser.add_argument("--device_target", type=str, choices=["Ascend", "GPU", "CPU"], default="Ascend", + help="device target") +args = parser.parse_args() + +context.set_context(mode=context.GRAPH_MODE, device_target=args.device_target) +if args.device_target == "Ascend": + context.set_context(device_id=args.device_id) + +if __name__ == '__main__': + config = get_config(args.using_mode, args.dataset) + default_boxes = box_init(config) + if config.model == "refinedet_vgg16": + net = refinedet_vgg16(config=config) + elif config.model == "refinedet_resnet101": + net = refinedet_resnet101(config=config) + else: + raise ValueError(f'config.model: {config.model} is not supported') + net = RefineDetInferWithDecoder(net, Tensor(default_boxes), config) + + param_dict = load_checkpoint(args.ckpt_file) + net.init_parameters_data() + load_param_into_net(net, param_dict) + net.set_train(False) + + input_shp = [args.batch_size, 3] + config.img_shape + input_array = Tensor(np.random.uniform(-1.0, 1.0, size=input_shp), mindspore.float32) + export(net, input_array, file_name=args.file_name, file_format=args.file_format) diff --git a/research/cv/RefineDet/scripts/run_distribute_train.sh b/research/cv/RefineDet/scripts/run_distribute_train.sh new file mode 100644 index 0000000000000000000000000000000000000000..3da21418c37cd248318979de2aed7355b360f528 --- /dev/null +++ b/research/cv/RefineDet/scripts/run_distribute_train.sh @@ -0,0 +1,83 @@ +#!/bin/bash +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +echo "==============================================================================================================" +echo "Please run the script as: " +echo "sh run_distribute_train.sh DEVICE_NUM EPOCH_SIZE LR DATASET RANK_TABLE_FILE PRE_TRAINED PRE_TRAINED_EPOCH_SIZE" +echo "for example: sh run_distribute_train.sh 8 500 0.2 coco /data/hccl.json /opt/ssd-300.ckpt(optional) 200(optional)" +echo "It is better to use absolute path." +echo "=================================================================================================================" + +if [ $# != 5 ] && [ $# != 7 ] +then + echo "Usage: sh run_distribute_train.sh [DEVICE_NUM] [EPOCH_SIZE] [LR] [DATASET] \ +[RANK_TABLE_FILE] [PRE_TRAINED](optional) [PRE_TRAINED_EPOCH_SIZE](optional)" + exit 1 +fi + +# Before start distribute train, first create mindrecord files. +BASE_PATH=$(cd "`dirname $0`" || exit; pwd) +cd $BASE_PATH/../ || exit +python train.py --only_create_dataset=True --dataset=$4 + +echo "After running the script, the network runs in the background. The log will be generated in LOGx/log.txt" + +export RANK_SIZE=$1 +EPOCH_SIZE=$2 +LR=$3 +DATASET=$4 +PRE_TRAINED=$6 +PRE_TRAINED_EPOCH_SIZE=$7 +export RANK_TABLE_FILE=$5 + +for((i=0;i<RANK_SIZE;i++)) +do + export DEVICE_ID=$i + rm -rf LOG$i + mkdir -p ./LOG$i/data + cp ./*.py ./LOG$i + cp -r ./src ./LOG$i + ln -s $BASE_PATH/../data/MindRecord/ ./LOG$i/data/MindRecord + cd ./LOG$i || exit + export RANK_ID=$i + echo "start training for rank $i, device $DEVICE_ID" + env > env.log + if [ $# == 5 ] + then + python train.py \ + --distribute=True \ + --lr=$LR \ + --dataset=$DATASET \ + --device_num=$RANK_SIZE \ + --device_id=$DEVICE_ID \ + --epoch_size=$EPOCH_SIZE > log.txt 2>&1 & + fi + + if [ $# == 7 ] + then + python train.py \ + --distribute=True \ + --lr=$LR \ + --dataset=$DATASET \ + --device_num=$RANK_SIZE \ + --device_id=$DEVICE_ID \ + --pre_trained=$PRE_TRAINED \ + --pre_trained_epoch_size=$PRE_TRAINED_EPOCH_SIZE \ + --epoch_size=$EPOCH_SIZE > log.txt 2>&1 & + fi + + cd ../ +done diff --git a/research/cv/RefineDet/scripts/run_eval.sh b/research/cv/RefineDet/scripts/run_eval.sh new file mode 100644 index 0000000000000000000000000000000000000000..77054ad87f642ad9de43b22634450784ee691a32 --- /dev/null +++ b/research/cv/RefineDet/scripts/run_eval.sh @@ -0,0 +1,65 @@ +#!/bin/bash +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +if [ $# != 3 ] +then + echo "Usage: sh run_eval.sh [DATASET] [CHECKPOINT_PATH] [DEVICE_ID]" +exit 1 +fi + +get_real_path(){ + if [ "${1:0:1}" == "/" ]; then + echo "$1" + else + echo "$(realpath -m $PWD/$1)" + fi +} + +DATASET=$1 +CHECKPOINT_PATH=$(get_real_path $2) +echo $DATASET +echo $CHECKPOINT_PATH + +if [ ! -f $CHECKPOINT_PATH ] +then + echo "error: CHECKPOINT_PATH=$PATH2 is not a file" +exit 1 +fi + +export DEVICE_NUM=1 +export DEVICE_ID=$3 +export RANK_SIZE=$DEVICE_NUM +export RANK_ID=0 + +BASE_PATH=$(cd "`dirname $0`" || exit; pwd) +cd $BASE_PATH/../ || exit + +if [ -d "eval$3" ]; +then + rm -rf ./eval$3 +fi + +mkdir ./eval$3 +cp ./*.py ./eval$3 +cp -r ./src ./eval$3 +cd ./eval$3 || exit +env > env.log +echo "start inferring for device $DEVICE_ID" +python eval.py \ + --dataset=$DATASET \ + --checkpoint_path=$CHECKPOINT_PATH \ + --device_id=$3 > log.txt 2>&1 & +cd .. diff --git a/research/cv/RefineDet/scripts/run_standardalone_train.sh b/research/cv/RefineDet/scripts/run_standardalone_train.sh new file mode 100644 index 0000000000000000000000000000000000000000..1219066672bf02beec2aeff1eb09c6a43a71768f --- /dev/null +++ b/research/cv/RefineDet/scripts/run_standardalone_train.sh @@ -0,0 +1,73 @@ +#!/bin/bash +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +echo "==============================================================================================================" +echo "Please run the script as: " +echo "sh run_distribute_train.sh DEVICE_ID EPOCH_SIZE LR DATASET PRE_TRAINED PRE_TRAINED_EPOCH_SIZE" +echo "for example: sh run_distribute_train.sh 0 500 0.2 coco /opt/ssd-300.ckpt(optional) 200(optional)" +echo "It is better to use absolute path." +echo "=================================================================================================================" + +if [ $# != 4 ] && [ $# != 6 ] +then + echo "Usage: sh run_distribute_train.sh [DEVICE_ID] [EPOCH_SIZE] [LR] [DATASET] \ + [PRE_TRAINED](optional) [PRE_TRAINED_EPOCH_SIZE](optional)" + exit 1 +fi + +# Before start distribute train, first create mindrecord files. +BASE_PATH=$(cd "`dirname $0`" || exit; pwd) +cd $BASE_PATH/../ || exit +python train.py --only_create_dataset=True --dataset=$4 + +echo "After running the script, the network runs in the background. The log will be generated in LOGx/log.txt" +DEVICE_ID=$1 +EPOCH_SIZE=$2 +LR=$3 +DATASET=$4 +PRE_TRAINED=$5 +PRE_TRAINED_EPOCH_SIZE=$6 + +export DEVICE_ID=$DEVICE_ID +rm -rf LOG$DEVICE_ID +mkdir ./LOG$DEVICE_ID +cp ./*.py ./LOG$DEVICE_ID +cp -r ./src ./LOG$DEVICE_ID +cd ./LOG$DEVICE_ID || exit + +echo "start training with device $DEVICE_ID" +env > env.log +if [ $# == 4 ] +then + python train.py \ + --lr=$LR \ + --dataset=$DATASET \ + --device_id=$DEVICE_ID \ + --epoch_size=$EPOCH_SIZE > log.txt 2>&1 & +fi + +if [ $# == 6 ] +then + python train.py \ + --lr=$LR \ + --dataset=$DATASET \ + --device_id=$DEVICE_ID \ + --pre_trained=$PRE_TRAINED \ + --pre_trained_epoch_size=$PRE_TRAINED_EPOCH_SIZE \ + --epoch_size=$EPOCH_SIZE > log.txt 2>&1 & +fi + +cd ../ diff --git a/research/cv/RefineDet/src/__init__.py b/research/cv/RefineDet/src/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/research/cv/RefineDet/src/anchor_generator.py b/research/cv/RefineDet/src/anchor_generator.py new file mode 100644 index 0000000000000000000000000000000000000000..63e0e402f29e711b2a0d3875368b8d0f887c312e --- /dev/null +++ b/research/cv/RefineDet/src/anchor_generator.py @@ -0,0 +1,93 @@ +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +"""Anchor Generator""" + +import numpy as np + + +class GridAnchorGenerator: + """ + Anchor Generator + """ + def __init__(self, image_shape, scale, scales_per_octave, aspect_ratios): + super(GridAnchorGenerator, self).__init__() + self.scale = scale + self.scales_per_octave = scales_per_octave + self.aspect_ratios = aspect_ratios + self.image_shape = image_shape + + + def generate(self, step): + """generate default anchors""" + scales = np.array([2 ** (float(scale) / self.scales_per_octave) + for scale in range(self.scales_per_octave)]).astype(np.float32) + aspects = np.array(list(self.aspect_ratios)).astype(np.float32) + + scales_grid, aspect_ratios_grid = np.meshgrid(scales, aspects) + scales_grid = scales_grid.reshape([-1]) + aspect_ratios_grid = aspect_ratios_grid.reshape([-1]) + + feature_size = [self.image_shape[0] / step, self.image_shape[1] / step] + grid_height, grid_width = feature_size + + base_size = np.array([self.scale * step, self.scale * step]).astype(np.float32) + anchor_offset = step / 2.0 + + ratio_sqrt = np.sqrt(aspect_ratios_grid) + heights = scales_grid / ratio_sqrt * base_size[0] + widths = scales_grid * ratio_sqrt * base_size[1] + + y_centers = np.arange(grid_height).astype(np.float32) + y_centers = y_centers * step + anchor_offset + x_centers = np.arange(grid_width).astype(np.float32) + x_centers = x_centers * step + anchor_offset + x_centers, y_centers = np.meshgrid(x_centers, y_centers) + + x_centers_shape = x_centers.shape + y_centers_shape = y_centers.shape + + widths_grid, x_centers_grid = np.meshgrid(widths, x_centers.reshape([-1])) + heights_grid, y_centers_grid = np.meshgrid(heights, y_centers.reshape([-1])) + + x_centers_grid = x_centers_grid.reshape(*x_centers_shape, -1) + y_centers_grid = y_centers_grid.reshape(*y_centers_shape, -1) + widths_grid = widths_grid.reshape(-1, *x_centers_shape) + heights_grid = heights_grid.reshape(-1, *y_centers_shape) + + + bbox_centers = np.stack([y_centers_grid, x_centers_grid], axis=3) + bbox_sizes = np.stack([heights_grid, widths_grid], axis=3) + bbox_centers = bbox_centers.reshape([-1, 2]) + bbox_sizes = bbox_sizes.reshape([-1, 2]) + bbox_corners = np.concatenate([bbox_centers - 0.5 * bbox_sizes, bbox_centers + 0.5 * bbox_sizes], axis=1) + self.bbox_corners = bbox_corners / np.array([*self.image_shape, *self.image_shape]).astype(np.float32) + self.bbox_centers = np.concatenate([bbox_centers, bbox_sizes], axis=1) + self.bbox_centers = self.bbox_centers / np.array([*self.image_shape, *self.image_shape]).astype(np.float32) + + print(self.bbox_centers.shape) + return self.bbox_centers, self.bbox_corners + + def generate_multi_levels(self, steps): + """generate multi levels anchors""" + bbox_centers_list = [] + bbox_corners_list = [] + for step in steps: + bbox_centers, bbox_corners = self.generate(step) + bbox_centers_list.append(bbox_centers) + bbox_corners_list.append(bbox_corners) + + self.bbox_centers = np.concatenate(bbox_centers_list, axis=0) + self.bbox_corners = np.concatenate(bbox_corners_list, axis=0) + return self.bbox_centers, self.bbox_corners diff --git a/research/cv/RefineDet/src/box_utils.py b/research/cv/RefineDet/src/box_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..341cc0566783859b1c0c19264b5fd77173032903 --- /dev/null +++ b/research/cv/RefineDet/src/box_utils.py @@ -0,0 +1,172 @@ +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +"""Bbox utils""" + +import math +import itertools as it +import numpy as np +from .anchor_generator import GridAnchorGenerator + +is_init = False + +class GeneratDefaultBoxes(): + """ + Generate Default boxes for SSD, follows the order of (W, H, archor_sizes). + `self.default_boxes` has a shape of [archor_sizes, H, W, 4], the last dimension is [y, x, h, w]. + `self.default_boxes_tlbr` has a shape as `self.default_boxes`, the last dimension is [y1, x1, y2, x2]. + """ + def __init__(self, config): + fk = config.img_shape[0] / np.array(config.steps) + scale_rate = (config.max_scale - config.min_scale) / (len(config.num_default) - 1) + scales = [config.min_scale + scale_rate * i for i in range(len(config.num_default))] + [1.0] + self.default_boxes = [] + for idex, feature_size in enumerate(config.feature_size): + sk1 = scales[idex] + if idex == 0 and not config.aspect_ratios[idex]: + w, h = sk1 * math.sqrt(2), sk1 / math.sqrt(2) + all_sizes = [(0.1, 0.1), (w, h), (h, w)] + else: + all_sizes = [(sk1, sk1)] + for aspect_ratio in config.aspect_ratios[idex]: + w, h = sk1 * math.sqrt(aspect_ratio), sk1 / math.sqrt(aspect_ratio) + all_sizes.append((w, h)) + all_sizes.append((h, w)) + + assert len(all_sizes) == config.num_default[idex] + + for i, j in it.product(range(feature_size), repeat=2): + for w, h in all_sizes: + cx, cy = (j + 0.5) / fk[idex], (i + 0.5) / fk[idex] + self.default_boxes.append([cy, cx, h, w]) + + def to_tlbr(cy, cx, h, w): + return cy - h / 2, cx - w / 2, cy + h / 2, cx + w / 2 + + # For IoU calculation + self.default_boxes_tlbr = np.array(tuple(to_tlbr(*i) for i in self.default_boxes), dtype='float32') + self.default_boxes = np.array(self.default_boxes, dtype='float32') + +default_boxes = matching_threshold = vol_anchors = y1 = x1 = y2 = x2 = None + +def box_init(config): + """init default boxes""" + global is_init, default_boxes, matching_threshold, vol_anchors, y1, x1, y2, x2 + if is_init: + return default_boxes + is_init = True + if 'use_anchor_generator' in config and config.use_anchor_generator: + generator = GridAnchorGenerator(config.img_shape, 4, 2, [1.0, 2.0, 0.5]) + default_boxes, default_boxes_tlbr = generator.generate_multi_levels(config.steps) + else: + default_boxes_tlbr = GeneratDefaultBoxes(config).default_boxes_tlbr + default_boxes = GeneratDefaultBoxes(config).default_boxes + y1, x1, y2, x2 = np.split(default_boxes_tlbr[:, :4], 4, axis=-1) + vol_anchors = (x2 - x1) * (y2 - y1) + matching_threshold = config.match_threshold + return default_boxes + +def refinedet_bboxes_encode(config, boxes): + """ + Labels anchors with ground truth inputs. + + Args: + boxes: ground truth with shape [N, 5], for each row, it stores [y, x, h, w, cls]. + + Returns: + gt_loc: location ground truth with shape [num_anchors, 4]. + gt_label: class ground truth with shape [num_anchors, 1]. + num_matched_boxes: number of positives in an image. + """ + box_init(config) + def jaccard_with_anchors(bbox): + """Compute jaccard score a box and the anchors.""" + # Intersection bbox and volume. + ymin = np.maximum(y1, bbox[0]) + xmin = np.maximum(x1, bbox[1]) + ymax = np.minimum(y2, bbox[2]) + xmax = np.minimum(x2, bbox[3]) + w = np.maximum(xmax - xmin, 0.) + h = np.maximum(ymax - ymin, 0.) + + # Volumes. + inter_vol = h * w + union_vol = vol_anchors + (bbox[2] - bbox[0]) * (bbox[3] - bbox[1]) - inter_vol + jaccard = inter_vol / union_vol + return np.squeeze(jaccard) + + pre_scores = np.zeros((config.num_ssd_boxes), dtype=np.float32) + t_boxes = np.zeros((config.num_ssd_boxes, 4), dtype=np.float32) + t_label = np.zeros((config.num_ssd_boxes), dtype=np.int64) + for bbox in boxes: + label = int(bbox[4]) + scores = jaccard_with_anchors(bbox) + idx = np.argmax(scores) + scores[idx] = 2.0 + mask = (scores > matching_threshold) + mask = mask & (scores > pre_scores) + pre_scores = np.maximum(pre_scores, scores * mask) + t_label = mask * label + (1 - mask) * t_label + for i in range(4): + t_boxes[:, i] = mask * bbox[i] + (1 - mask) * t_boxes[:, i] + + index = np.nonzero(t_label) + + # Transform to tlbr. + bboxes = np.zeros((config.num_ssd_boxes, 4), dtype=np.float32) + bboxes[:, [0, 1]] = (t_boxes[:, [0, 1]] + t_boxes[:, [2, 3]]) / 2 + bboxes[:, [2, 3]] = t_boxes[:, [2, 3]] - t_boxes[:, [0, 1]] + + # Encode features. + bboxes_t = bboxes[index] + default_boxes_t = default_boxes[index] + bboxes_t[:, :2] = (bboxes_t[:, :2] - default_boxes_t[:, :2]) / (default_boxes_t[:, 2:] * config.prior_scaling[0]) + tmp = np.maximum(bboxes_t[:, 2:4] / default_boxes_t[:, 2:4], 0.000001) + bboxes_t[:, 2:4] = np.log(tmp) / config.prior_scaling[1] + bboxes[index] = bboxes_t + + num_match = np.array([len(np.nonzero(t_label)[0])], dtype=np.int32) + return bboxes, t_label.astype(np.int32), num_match + + +def refinedet_bboxes_decode(boxes): + """Decode predict boxes to [y, x, h, w]""" + boxes_t = boxes.copy() + default_boxes_t = default_boxes.copy() + boxes_t[:, :2] = boxes_t[:, :2] * config.prior_scaling[0] * default_boxes_t[:, 2:] + default_boxes_t[:, :2] + boxes_t[:, 2:4] = np.exp(boxes_t[:, 2:4] * config.prior_scaling[1]) * default_boxes_t[:, 2:4] + + bboxes = np.zeros((len(boxes_t), 4), dtype=np.float32) + + bboxes[:, [0, 1]] = boxes_t[:, [0, 1]] - boxes_t[:, [2, 3]] / 2 + bboxes[:, [2, 3]] = boxes_t[:, [0, 1]] + boxes_t[:, [2, 3]] / 2 + + return np.clip(bboxes, 0, 1) + + +def intersect(box_a, box_b): + """Compute the intersect of two sets of boxes.""" + max_yx = np.minimum(box_a[:, 2:4], box_b[2:4]) + min_yx = np.maximum(box_a[:, :2], box_b[:2]) + inter = np.clip((max_yx - min_yx), a_min=0, a_max=np.inf) + return inter[:, 0] * inter[:, 1] + + +def jaccard_numpy(box_a, box_b): + """Compute the jaccard overlap of two sets of boxes.""" + inter = intersect(box_a, box_b) + area_a = ((box_a[:, 2] - box_a[:, 0]) * (box_a[:, 3] - box_a[:, 1])) + area_b = ((box_b[2] - box_b[0]) * (box_b[3] - box_b[1])) + union = area_a + area_b - inter + return inter / union diff --git a/research/cv/RefineDet/src/config.py b/research/cv/RefineDet/src/config.py new file mode 100644 index 0000000000000000000000000000000000000000..3f7a3d3452a524a3afe3e475b0b87175b4501a5a --- /dev/null +++ b/research/cv/RefineDet/src/config.py @@ -0,0 +1,59 @@ +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +"""Config parameters for RefineDet models.""" + +from .config_vgg16 import config_320 as config_vgg16_320 +from .config_vgg16 import config_512 as config_vgg16_512 +from .config_resnet101 import config_320 as config_resnet_320 +from .config_resnet101 import config_512 as config_resnet_512 + +config = None + +config_map = { + "refinedet_vgg16_320": config_vgg16_320, + "refinedet_vgg16_512": config_vgg16_512, + "refinedet_resnet101_320": config_resnet_320, + "refinedet_resnet101_512": config_resnet_512, +} + +def get_config(using_model="refinedet_vgg16_320", using_dataset="voc_test"): + """init config according to args""" + global config + if config is not None: + return config + config = config_map[using_model] + if using_dataset == "voc0712": + config.voc_root = config.voc0712_root + config.num_classes = config.voc_num_classes + config.classes = config.voc_classes + elif using_dataset == "voc0712plus": + config.voc_root = config.voc0712plus_root + config.num_classes = config.voc_num_classes + config.classes = config.voc_classes + elif using_dataset == "voc_test": + config.voc_root = config.voc_test_root + config.num_classes = config.voc_num_classes + config.classes = config.voc_classes + elif using_dataset == "coco": + config.num_classes = config.coco_num_classes + config.classes = config.coco_classes + # calculate the boxes number + if config.num_ssd_boxes == -1: + num = 0 + h, w = config.img_shape + for i in range(len(config.steps)): + num += (h // config.steps[i]) * (w // config.steps[i]) * config.num_default[i] + config.num_ssd_boxes = num + return config diff --git a/research/cv/RefineDet/src/config_resnet101.py b/research/cv/RefineDet/src/config_resnet101.py new file mode 100644 index 0000000000000000000000000000000000000000..53d15193fcf0959d0f8cb1df68736344b5df2588 --- /dev/null +++ b/research/cv/RefineDet/src/config_resnet101.py @@ -0,0 +1,172 @@ +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +"""Basic config parameters for RefineDet models.""" +from easydict import EasyDict as ed + +config_320 = ed({ + "model": "refinedet_resnet101", + "img_shape": [320, 320], + "num_ssd_boxes": -1, + "match_threshold": 0.5, + "nms_threshold": 0.6, + "min_score": 0.1, + "max_boxes": 100, + + # learing rate settings + "lr_init": 0.001, + "lr_end_rate": 0.001, + "warmup_epochs": 2, + "momentum": 0.9, + "weight_decay": 1.5e-4, + + # network + # vgg16 config + "num_default": [3, 3, 3, 3], + "extra_arm_channels": [512, 1024, 2048, 512], + "extra_odm_channels": [256, 256, 256, 256], + "L2normalizations": [10, 8, -1, -1], + "arm_source": ["b4", "b5", "fc7", "b6_2"], # four source layers, last one is the end of backbone + + # box utils config + "feature_size": [40, 20, 10, 5], + "min_scale": 0.2, + "max_scale": 0.95, + "aspect_ratios": [(), (2,), (2,), (2,)], + "steps": (8, 16, 32, 64), + "prior_scaling": (0.1, 0.2), + "gamma": 2.0, + "alpha": 0.75, + + # `mindrecord_dir` and `coco_root` are better to use absolute path. + "feature_extractor_base_param": "", + "pretrain_vgg_bn": False, + "checkpoint_filter_list": ['multi_loc_layers', 'multi_cls_layers'], + "mindrecord_dir": "./data/MindRecord", + "coco_root": "./data/COCO2017", + "train_data_type": "train2017", + # The annotation.json position of voc validation dataset. + "voc_json": "annotations/voc_instances_val.json", + # voc original dataset. + "voc_root": "", + "voc_test_root": "./data/voc_test", + "voc0712_root": "./data/VOC0712", + "voc0712plus_root": "./data/VOC0712Plus", + # if coco or voc used, `image_dir` and `anno_path` are useless. + "image_dir": "", + "anno_path": "", + "val_data_type": "val2017", + "instances_set": "annotations/instances_{}.json", + "voc_classes": ('background', 'aeroplane', 'bicycle', 'bird', + 'boat', 'bottle', 'bus', 'car', 'cat', 'chair', + 'cow', 'diningtable', 'dog', 'horse', 'motorbike', + 'person', 'pottedplant', 'sheep', 'sofa', 'train', 'tvmonitor'), + "voc_num_classes": 21, + "coco_classes": ('background', 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', + 'train', 'truck', 'boat', 'traffic light', 'fire hydrant', + 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', + 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', + 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', + 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', + 'kite', 'baseball bat', 'baseball glove', 'skateboard', + 'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup', + 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', + 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', + 'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', + 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', + 'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink', + 'refrigerator', 'book', 'clock', 'vase', 'scissors', + 'teddy bear', 'hair drier', 'toothbrush'), + "coco_num_classes": 81, + "classes": (), + "num_classes": () +}) + +config_512 = ed({ + "model": "refinedet_resnet101", + "img_shape": [512, 512], + "num_ssd_boxes": -1, + "match_threshold": 0.5, + "nms_threshold": 0.6, + "min_score": 0.1, + "max_boxes": 100, + + # learing rate settings + "lr_init": 0.001, + "lr_end_rate": 0.001, + "warmup_epochs": 2, + "momentum": 0.9, + "weight_decay": 1.5e-4, + + # network + # vgg16 config + "num_default": [3, 3, 3, 3], + "extra_arm_channels": [512, 1024, 2048, 512], + "extra_odm_channels": [256, 256, 256, 256], + "L2normalizations": [10, 8, -1, -1], + "arm_source": ["b4", "b5", "fc7", "b6_2"], # four source layers, last one is the end of backbone + + # box utils config + "feature_size": [64, 32, 16, 8], + "min_scale": 0.2, + "max_scale": 0.95, + "aspect_ratios": [(), (2,), (2,), (2,)], + "steps": (8, 16, 32, 64), + "prior_scaling": (0.1, 0.2), + "gamma": 2.0, + "alpha": 0.75, + + # `mindrecord_dir` and `coco_root` are better to use absolute path. + "feature_extractor_base_param": "", + "pretrain_vgg_bn": False, + "checkpoint_filter_list": ['multi_loc_layers', 'multi_cls_layers'], + "mindrecord_dir": "./data/MindRecord", + "coco_root": "./data/COCO2017", + "train_data_type": "train2017", + # The annotation.json position of voc validation dataset. + "voc_json": "annotations/voc_instances_val.json", + # voc original dataset. + "voc_root": "", + "voc_test_root": "./data/voc_test", + "voc0712_root": "./data/VOC0712", + "voc0712plus_root": "./data/VOC0712Plus", + # if coco or voc used, `image_dir` and `anno_path` are useless. + "image_dir": "", + "anno_path": "", + "val_data_type": "val2017", + "instances_set": "annotations/instances_{}.json", + "voc_classes": ('background', 'aeroplane', 'bicycle', 'bird', + 'boat', 'bottle', 'bus', 'car', 'cat', 'chair', + 'cow', 'diningtable', 'dog', 'horse', 'motorbike', + 'person', 'pottedplant', 'sheep', 'sofa', 'train', 'tvmonitor'), + "voc_num_classes": 21, + "coco_classes": ('background', 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', + 'train', 'truck', 'boat', 'traffic light', 'fire hydrant', + 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', + 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', + 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', + 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', + 'kite', 'baseball bat', 'baseball glove', 'skateboard', + 'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup', + 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', + 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', + 'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', + 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', + 'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink', + 'refrigerator', 'book', 'clock', 'vase', 'scissors', + 'teddy bear', 'hair drier', 'toothbrush'), + "coco_num_classes": 81, + "classes": (), + "num_classes": () +}) diff --git a/research/cv/RefineDet/src/config_vgg16.py b/research/cv/RefineDet/src/config_vgg16.py new file mode 100644 index 0000000000000000000000000000000000000000..76bf06556ef256976495ddbd40b014e75bce5cb2 --- /dev/null +++ b/research/cv/RefineDet/src/config_vgg16.py @@ -0,0 +1,174 @@ +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +"""Basic config parameters for RefineDet models.""" +from easydict import EasyDict as ed + +config_320 = ed({ + "model": "refinedet_vgg16", + "img_shape": [320, 320], + "num_ssd_boxes": -1, + "match_threshold": 0.5, + "nms_threshold": 0.6, + "min_score": 0.1, + "max_boxes": 100, + "objectness_thre": 0.01, + + # learing rate settings + "lr_init": 0.001, + "lr_end_rate": 0.001, + "warmup_epochs": 2, + "momentum": 0.9, + "weight_decay": 1.5e-4, + + # network + # vgg16 config + "num_default": [3, 3, 3, 3], + "extra_arm_channels": [512, 512, 1024, 512], + "extra_odm_channels": [256, 256, 256, 256], + "L2normalizations": [10, 8, -1, -1], + "arm_source": ["b4", "b5", "fc7", "b6_2"], # four source layers + + # box utils config + "feature_size": [40, 20, 10, 5], + "min_scale": 0.2, + "max_scale": 0.95, + "aspect_ratios": [(), (2,), (2,), (2,)], + "steps": (8, 16, 32, 64), + "prior_scaling": (0.1, 0.2), + "gamma": 2.0, + "alpha": 0.75, + + # `mindrecord_dir` and `coco_root` are better to use absolute path. + "feature_extractor_base_param": "", + "pretrain_vgg_bn": False, + "checkpoint_filter_list": ['multi_loc_layers', 'multi_cls_layers'], + "mindrecord_dir": "./data/MindRecord", + "coco_root": "./data/COCO2017", + "train_data_type": "train2017", + # The annotation.json position of voc validation dataset. + "voc_json": "annotations/voc_instances_val.json", + # voc original dataset. + "voc_root": "", + "voc_test_root": "./data/voc_test", + "voc0712_root": "./data/VOC0712", + "voc0712plus_root": "./data/VOC0712Plus", + # if coco or voc used, `image_dir` and `anno_path` are useless. + "image_dir": "", + "anno_path": "", + "val_data_type": "val2017", + "instances_set": "annotations/instances_{}.json", + "voc_classes": ('background', 'aeroplane', 'bicycle', 'bird', + 'boat', 'bottle', 'bus', 'car', 'cat', 'chair', + 'cow', 'diningtable', 'dog', 'horse', 'motorbike', + 'person', 'pottedplant', 'sheep', 'sofa', 'train', 'tvmonitor'), + "voc_num_classes": 21, + "coco_classes": ('background', 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', + 'train', 'truck', 'boat', 'traffic light', 'fire hydrant', + 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', + 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', + 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', + 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', + 'kite', 'baseball bat', 'baseball glove', 'skateboard', + 'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup', + 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', + 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', + 'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', + 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', + 'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink', + 'refrigerator', 'book', 'clock', 'vase', 'scissors', + 'teddy bear', 'hair drier', 'toothbrush'), + "coco_num_classes": 81, + "classes": (), + "num_classes": () +}) + +config_512 = ed({ + "model": "refinedet_vgg16", + "img_shape": [512, 512], + "num_ssd_boxes": -1, + "match_threshold": 0.5, + "nms_threshold": 0.6, + "min_score": 0.1, + "max_boxes": 100, + "objectness_thre": 0.01, + + # learing rate settings + "lr_init": 0.001, + "lr_end_rate": 0.001, + "warmup_epochs": 2, + "momentum": 0.9, + "weight_decay": 1.5e-4, + + # network + # vgg16 config + "num_default": [3, 3, 3, 3], + "extra_arm_channels": [512, 512, 1024, 512], + "extra_odm_channels": [256, 256, 256, 256], + "L2normalizations": [10, 8, -1, -1], + "arm_source": ["b4", "b5", "fc7", "b6_2"], # four source layers, last one is the end of backbone + + # box utils config + "feature_size": [64, 32, 16, 8], + "min_scale": 0.2, + "max_scale": 0.95, + "aspect_ratios": [(), (2,), (2,), (2,)], + "steps": (8, 16, 32, 64), + "prior_scaling": (0.1, 0.2), + "gamma": 2.0, + "alpha": 0.75, + + # `mindrecord_dir` and `coco_root` are better to use absolute path. + "feature_extractor_base_param": "", + "pretrain_vgg_bn": False, + "checkpoint_filter_list": ['multi_loc_layers', 'multi_cls_layers'], + "mindrecord_dir": "./data/MindRecord", + "coco_root": "./data/COCO2017", + "train_data_type": "train2017", + # The annotation.json position of voc validation dataset. + "voc_json": "annotations/voc_instances_val.json", + # voc original dataset. + "voc_root": "", + "voc_test_root": "./data/voc_test", + "voc0712_root": "./data/VOC0712", + "voc0712plus_root": "./data/VOC0712Plus", + # if coco or voc used, `image_dir` and `anno_path` are useless. + "image_dir": "", + "anno_path": "", + "val_data_type": "val2017", + "instances_set": "annotations/instances_{}.json", + "voc_classes": ('background', 'aeroplane', 'bicycle', 'bird', + 'boat', 'bottle', 'bus', 'car', 'cat', 'chair', + 'cow', 'diningtable', 'dog', 'horse', 'motorbike', + 'person', 'pottedplant', 'sheep', 'sofa', 'train', 'tvmonitor'), + "voc_num_classes": 21, + "coco_classes": ('background', 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', + 'train', 'truck', 'boat', 'traffic light', 'fire hydrant', + 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', + 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', + 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', + 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', + 'kite', 'baseball bat', 'baseball glove', 'skateboard', + 'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup', + 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', + 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', + 'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', + 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', + 'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink', + 'refrigerator', 'book', 'clock', 'vase', 'scissors', + 'teddy bear', 'hair drier', 'toothbrush'), + "coco_num_classes": 81, + "classes": (), + "num_classes": () +}) diff --git a/research/cv/RefineDet/src/dataset.py b/research/cv/RefineDet/src/dataset.py new file mode 100644 index 0000000000000000000000000000000000000000..9a400aa3f9ac24198c8d6c4ecd2c90950b93791b --- /dev/null +++ b/research/cv/RefineDet/src/dataset.py @@ -0,0 +1,475 @@ +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +"""Create RefineDet dataset""" + +from __future__ import division + +import os +import json +import xml.etree.ElementTree as et +import numpy as np +import cv2 + +import mindspore.dataset as de +import mindspore.dataset.vision.c_transforms as C +from mindspore.mindrecord import FileWriter +from .box_utils import jaccard_numpy, refinedet_bboxes_encode, box_init + +def _rand(a=0., b=1.): + """Generate random.""" + return np.random.rand() * (b - a) + a + + +def get_imageId_from_fileName(filename, id_iter): + """Get imageID from fileName if fileName is int, else return id_iter.""" + filename = os.path.splitext(filename)[0] + if filename.isdigit(): + return int(filename) + return id_iter + + +def random_sample_crop(image, boxes): + """Random Crop the image and boxes""" + height, width, _ = image.shape + min_iou = np.random.choice([None, 0.1, 0.3, 0.5, 0.7, 0.9]) + + if min_iou is None: + return image, boxes + + # max trails (50) + for _ in range(50): + image_t = image + + w = _rand(0.3, 1.0) * width + h = _rand(0.3, 1.0) * height + + # aspect ratio constraint b/t .5 & 2 + if h / w < 0.5 or h / w > 2: + continue + + left = _rand() * (width - w) + top = _rand() * (height - h) + + rect = np.array([int(top), int(left), int(top + h), int(left + w)]) + overlap = jaccard_numpy(boxes, rect) + + # dropout some boxes + drop_mask = overlap > 0 + if not drop_mask.any(): + continue + + if overlap[drop_mask].min() < min_iou and overlap[drop_mask].max() > (min_iou + 0.2): + continue + + image_t = image_t[rect[0]:rect[2], rect[1]:rect[3], :] + + centers = (boxes[:, :2] + boxes[:, 2:4]) / 2.0 + + m1 = (rect[0] < centers[:, 0]) * (rect[1] < centers[:, 1]) + m2 = (rect[2] > centers[:, 0]) * (rect[3] > centers[:, 1]) + + # mask in that both m1 and m2 are true + mask = m1 * m2 * drop_mask + + # have any valid boxes? try again if not + if not mask.any(): + continue + + # take only matching gt boxes + boxes_t = boxes[mask, :].copy() + + boxes_t[:, :2] = np.maximum(boxes_t[:, :2], rect[:2]) + boxes_t[:, :2] -= rect[:2] + boxes_t[:, 2:4] = np.minimum(boxes_t[:, 2:4], rect[2:4]) + boxes_t[:, 2:4] -= rect[:2] + + return image_t, boxes_t + return image, boxes + + +def preprocess_fn(config, img_id, image, box, is_training): + """Preprocess function for dataset.""" + cv2.setNumThreads(2) + + def _infer_data(image, input_shape): + img_h, img_w, _ = image.shape + input_h, input_w = input_shape + + image = cv2.resize(image, (input_w, input_h)) + + # When the channels of image is 1 + if len(image.shape) == 2: + image = np.expand_dims(image, axis=-1) + image = np.concatenate([image, image, image], axis=-1) + + return img_id, image, np.array((img_h, img_w), np.float32) + + def _data_aug(image, box, is_training, image_size=(300, 300)): + """Data augmentation function.""" + ih, iw, _ = image.shape + h, w = image_size + + if not is_training: + return _infer_data(image, image_size) + + # Random crop + box = box.astype(np.float32) + image, box = random_sample_crop(image, box) + ih, iw, _ = image.shape + + # Resize image + image = cv2.resize(image, (w, h)) + + # Flip image or not + flip = _rand() < .5 + if flip: + image = cv2.flip(image, 1, dst=None) + + # When the channels of image is 1 + if len(image.shape) == 2: + image = np.expand_dims(image, axis=-1) + image = np.concatenate([image, image, image], axis=-1) + + box[:, [0, 2]] = box[:, [0, 2]] / ih + box[:, [1, 3]] = box[:, [1, 3]] / iw + + if flip: + box[:, [1, 3]] = 1 - box[:, [3, 1]] + + box, label, num_match = refinedet_bboxes_encode(config, box) + return image, box, label, num_match + + return _data_aug(image, box, is_training, image_size=config.img_shape) + + +def create_voc_label(config, is_training): + """Get image path and annotation from VOC.""" + print("Create VOC label") + voc_root = config.voc_root + cls_map = {name: i for i, name in enumerate(config.classes)} + sub_dir = 'train' if is_training else 'eval' + voc_dir = os.path.join(voc_root, sub_dir) + if not os.path.isdir(voc_dir): + raise ValueError(f'Cannot find {sub_dir} dataset path.') + + image_dir = anno_dir = voc_dir + if os.path.isdir(os.path.join(voc_dir, 'Images')): + image_dir = os.path.join(voc_dir, 'Images') + if os.path.isdir(os.path.join(voc_dir, 'Annotations')): + anno_dir = os.path.join(voc_dir, 'Annotations') + + if not is_training: + json_file = os.path.join(config.voc_root, config.voc_json) + file_dir = os.path.split(json_file)[0] + if not os.path.isdir(file_dir): + os.makedirs(file_dir) + json_dict = {"images": [], "type": "instances", "annotations": [], + "categories": []} + bnd_id = 1 + + image_files_dict = {} + image_anno_dict = {} + images = [] + id_iter = 0 + for anno_file in os.listdir(anno_dir): + if not anno_file.endswith('xml'): + continue + tree = et.parse(os.path.join(anno_dir, anno_file)) + root_node = tree.getroot() + folder = root_node.find('folder').text + file_name = root_node.find('filename').text + img_id = get_imageId_from_fileName(file_name, id_iter) + id_iter += 1 + image_path = os.path.join(image_dir, folder + '_' + file_name) + if not os.path.isfile(image_path): + print(f'Cannot find image {file_name} according to annotations.') + continue + + labels = [] + for obj in root_node.iter('object'):#diffcut processing + cls_name = obj.find('name').text + #difficult = int(obj.find('difficult').text) + #if difficult > 0: + # continue + if cls_name not in cls_map: + print(f'Label "{cls_name}" not in "{config.classes}"') + continue + bnd_box = obj.find('bndbox') + x_min = int(float(bnd_box.find('xmin').text)) - 1 + y_min = int(float(bnd_box.find('ymin').text)) - 1 + x_max = int(float(bnd_box.find('xmax').text)) - 1 + y_max = int(float(bnd_box.find('ymax').text)) - 1 + labels.append([y_min, x_min, y_max, x_max, cls_map[cls_name]]) + + if not is_training: + o_width = abs(x_max - x_min) + o_height = abs(y_max - y_min) + ann = {'area': o_width * o_height, 'iscrowd': 0, 'image_id': \ + img_id, 'bbox': [x_min, y_min, o_width, o_height], \ + 'category_id': cls_map[cls_name], 'id': bnd_id, \ + 'ignore': 0, \ + 'segmentation': []} + json_dict['annotations'].append(ann) + bnd_id = bnd_id + 1 + + if labels: + images.append(img_id) + image_files_dict[img_id] = image_path + image_anno_dict[img_id] = np.array(labels) + + if not is_training: + size = root_node.find("size") + width = int(size.find('width').text) + height = int(size.find('height').text) + image = {'file_name': file_name, 'height': height, 'width': width, + 'id': img_id} + json_dict['images'].append(image) + + if not is_training: + for cls_name, cid in cls_map.items(): + cat = {'supercategory': 'none', 'id': cid, 'name': cls_name} + json_dict['categories'].append(cat) + json_fp = open(json_file, 'w') + json_str = json.dumps(json_dict) + json_fp.write(json_str) + json_fp.close() + + return images, image_files_dict, image_anno_dict + + +def create_coco_label(config, is_training): + """Get image path and annotation from COCO.""" + print("Create COCO label") + from pycocotools.coco import COCO + + coco_root = config.coco_root + data_type = config.val_data_type + if is_training: + data_type = config.train_data_type + + # Classes need to train or test. + train_cls = config.classes + train_cls_dict = {} + for i, cls in enumerate(train_cls): + train_cls_dict[cls] = i + + anno_json = os.path.join(coco_root, config.instances_set.format(data_type)) + + coco = COCO(anno_json) + classs_dict = {} + cat_ids = coco.loadCats(coco.getCatIds()) + for cat in cat_ids: + classs_dict[cat["id"]] = cat["name"] + + image_ids = coco.getImgIds() + images = [] + image_path_dict = {} + image_anno_dict = {} + + for img_id in image_ids: + image_info = coco.loadImgs(img_id) + file_name = image_info[0]["file_name"] + anno_ids = coco.getAnnIds(imgIds=img_id, iscrowd=None) + anno = coco.loadAnns(anno_ids) + image_path = os.path.join(coco_root, data_type, file_name) + annos = [] + iscrowd = False + for label in anno: + bbox = label["bbox"] + class_name = classs_dict[label["category_id"]] + iscrowd = iscrowd or label["iscrowd"] + if class_name in train_cls: + x_min, x_max = bbox[0], bbox[0] + bbox[2] + y_min, y_max = bbox[1], bbox[1] + bbox[3] + annos.append(list(map(round, [y_min, x_min, y_max, x_max])) + [train_cls_dict[class_name]]) + + if not is_training and iscrowd: + continue + if len(annos) >= 1: + images.append(img_id) + image_path_dict[img_id] = image_path + image_anno_dict[img_id] = np.array(annos) + + return images, image_path_dict, image_anno_dict + + +def anno_parser(annos_str): + """Parse annotation from string to list.""" + annos = [] + for anno_str in annos_str: + anno = list(map(int, anno_str.strip().split(','))) + annos.append(anno) + return annos + + +def filter_valid_data(image_dir, anno_path, dataset='coco'): + """Filter valid image file, which both in image_dir and anno_path.""" + images = [] + image_path_dict = {} + image_anno_dict = {} + if not os.path.isdir(image_dir): + raise RuntimeError("Path given is not valid.") + if not os.path.isfile(anno_path): + raise RuntimeError("Annotation file is not valid.") + + with open(anno_path, "rb") as f: + lines = f.readlines() + if dataset == 'coco': + for img_id, line in enumerate(lines): + line_str = line.decode("utf-8").strip() + line_split = str(line_str).split(' ') + file_name = line_split[0] + image_path = os.path.join(image_dir, file_name) + if os.path.isfile(image_path): + images.append(img_id) + image_path_dict[img_id] = image_path + image_anno_dict[img_id] = anno_parser(line_split[1:]) + else: + for line in lines: + line_str = line.decode("utf-8").strip() + line_split = str(line_str).split(' ') + file_name = line_split[0] + image_path = os.path.join(image_dir, file_name) + if os.path.isfile(image_path): + img_id = int(os.path.basename(file_name).split('.')[0]) + images.append(img_id) + image_path_dict[img_id] = image_path + image_anno_dict[img_id] = anno_parser(line_split[1:]) + + return images, image_path_dict, image_anno_dict + + +def voc_data_to_mindrecord(config, mindrecord_dir, is_training, prefix="refinedet.mindrecord", file_num=8): + """Create MindRecord file by image_dir and anno_path.""" + mindrecord_path = os.path.join(mindrecord_dir, prefix) + writer = FileWriter(mindrecord_path, file_num) + images, image_path_dict, image_anno_dict = create_voc_label(config, is_training) + print("create voc label finished") + data_json = { + "img_id": {"type": "int32", "shape": [1]}, + "image": {"type": "bytes"}, + "annotation": {"type": "int32", "shape": [-1, 5]}, + } + writer.add_schema(data_json, "data_json") + + for img_id in images: + image_path = image_path_dict[img_id] + with open(image_path, 'rb') as f: + img = f.read() + annos = np.array(image_anno_dict[img_id], dtype=np.int32) + img_id = np.array([img_id], dtype=np.int32) + row = {"img_id": img_id, "image": img, "annotation": annos} + writer.write_raw_data([row]) + writer.commit() + + +def data_to_mindrecord_byte_image(config, dataset="coco", is_training=True, prefix="refinedet.mindrecord", file_num=8): + """Create MindRecord file.""" + mindrecord_dir = config.mindrecord_dir + mindrecord_path = os.path.join(mindrecord_dir, prefix) + writer = FileWriter(mindrecord_path, file_num) + if dataset == "coco": + images, image_path_dict, image_anno_dict = create_coco_label(config, is_training) + else: + images, image_path_dict, image_anno_dict = filter_valid_data(config.image_dir, + config.anno_path, dataset=dataset) + + data_json = { + "img_id": {"type": "int64", "shape": [1]}, + "image": {"type": "bytes"}, + "annotation": {"type": "int32", "shape": [-1, 5]}, + } + writer.add_schema(data_json, "data_json") + + for img_id in images: + image_path = image_path_dict[img_id] + with open(image_path, 'rb') as f: + img = f.read() + annos = np.array(image_anno_dict[img_id], dtype=np.int32) + img_id = np.array([img_id], dtype=np.int64) + row = {"img_id": img_id, "image": img, "annotation": annos} + writer.write_raw_data([row]) + writer.commit() + +def create_refinedet_dataset(config, mindrecord_file, batch_size=32, repeat_num=10, device_num=1, rank=0, + is_training=True, num_parallel_workers=6, use_multiprocessing=True): + """Create RefineDet dataset with MindDataset.""" + # init box_utils first, this is because the config can't be changed while running + box_init(config) + print("loading dataset to minddataset...") + ds = de.MindDataset(mindrecord_file, columns_list=["img_id", "image", "annotation"], num_shards=device_num, + shard_id=rank, num_parallel_workers=num_parallel_workers, shuffle=is_training) + decode = C.Decode() + ds = ds.map(operations=decode, input_columns=["image"]) + change_swap_op = C.HWC2CHW() + normalize_op = C.Normalize(mean=[0.485 * 255, 0.456 * 255, 0.406 * 255], + std=[0.229 * 255, 0.224 * 255, 0.225 * 255]) + color_adjust_op = C.RandomColorAdjust(brightness=0.4, contrast=0.4, saturation=0.4) + compose_map_func = (lambda img_id, image, annotation: preprocess_fn(config, img_id, image, annotation, is_training)) + if is_training: + output_columns = ["image", "box", "label", "num_match"] + trans = [color_adjust_op, normalize_op, change_swap_op] + else: + output_columns = ["img_id", "image", "image_shape"] + trans = [normalize_op, change_swap_op] + ds = ds.map(operations=compose_map_func, input_columns=["img_id", "image", "annotation"], + output_columns=output_columns, column_order=output_columns, + python_multiprocessing=use_multiprocessing, + num_parallel_workers=num_parallel_workers) + ds = ds.map(operations=trans, input_columns=["image"], python_multiprocessing=use_multiprocessing, + num_parallel_workers=num_parallel_workers) + ds = ds.batch(batch_size, drop_remainder=True) + ds = ds.repeat(repeat_num) + return ds + + +def create_mindrecord(config, dataset="coco", prefix="refinedet.mindrecord", + is_training=True, file_num=8): + """create mindrecord file""" + print("Start create dataset!") + + # It will generate mindrecord file in config.mindrecord_dir, + # and the file name is refinedet.mindrecord0, 1, ... file_num. + + mindrecord_dir = config.mindrecord_dir + num_suffix = "0" if file_num > 1 else "" + mindrecord_file = os.path.join(mindrecord_dir, prefix + num_suffix) + if not os.path.exists(mindrecord_file): + if not os.path.isdir(mindrecord_dir): + os.makedirs(mindrecord_dir) + if dataset == "coco": + if os.path.isdir(config.coco_root): + print("Create Mindrecord.") + data_to_mindrecord_byte_image(config, "coco", is_training, prefix, + file_num=file_num) + print("Create Mindrecord Done, at {}".format(mindrecord_dir)) + else: + print("coco_root not exits.") + elif dataset[:3] == "voc": + if os.path.isdir(config.voc_root): + print("Create Mindrecord.") + voc_data_to_mindrecord(config, mindrecord_dir, is_training, prefix, file_num=file_num) + print("Create Mindrecord Done, at {}".format(mindrecord_dir)) + else: + print("voc_root not exits.") + else: + if os.path.isdir(config.image_dir) and os.path.exists(config.anno_path): + print("Create Mindrecord.") + data_to_mindrecord_byte_image(config, "other", is_training, prefix, + file_num=file_num) + print("Create Mindrecord Done, at {}".format(mindrecord_dir)) + else: + print("image_dir or anno_path not exits.") + return mindrecord_file diff --git a/research/cv/RefineDet/src/eval_utils.py b/research/cv/RefineDet/src/eval_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..69fa941a7a1ecc7cd969d49e155d44185f247e7d --- /dev/null +++ b/research/cv/RefineDet/src/eval_utils.py @@ -0,0 +1,359 @@ +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +"""eval metrics utils""" + +import json +import xml.etree.ElementTree as et +import os +import numpy as np + +def apply_nms(all_boxes, all_scores, thres, max_boxes): + """Apply NMS to bboxes.""" + y1 = all_boxes[:, 0] + x1 = all_boxes[:, 1] + y2 = all_boxes[:, 2] + x2 = all_boxes[:, 3] + areas = (x2 - x1 + 1) * (y2 - y1 + 1) + + order = all_scores.argsort()[::-1] + keep = [] + + while order.size > 0: + i = order[0] + keep.append(i) + + if len(keep) >= max_boxes: + break + + xx1 = np.maximum(x1[i], x1[order[1:]]) + yy1 = np.maximum(y1[i], y1[order[1:]]) + xx2 = np.minimum(x2[i], x2[order[1:]]) + yy2 = np.minimum(y2[i], y2[order[1:]]) + + w = np.maximum(0.0, xx2 - xx1 + 1) + h = np.maximum(0.0, yy2 - yy1 + 1) + inter = w * h + + ovr = inter / (areas[i] + areas[order[1:]] - inter) + + inds = np.where(ovr <= thres)[0] + + order = order[inds + 1] + return keep + + +def coco_metrics(pred_data, anno_json, config): + """Calculate mAP of predicted bboxes.""" + from pycocotools.coco import COCO + from pycocotools.cocoeval import COCOeval + num_classes = config.num_classes + + #Classes need to train or test. + val_cls = config.classes + val_cls_dict = {} + for i, cls in enumerate(val_cls): + val_cls_dict[i] = cls + coco_gt = COCO(anno_json) + classs_dict = {} + cat_ids = coco_gt.loadCats(coco_gt.getCatIds()) + for cat in cat_ids: + classs_dict[cat["name"]] = cat["id"] + + predictions = [] + img_ids = [] + + for sample in pred_data: + pred_boxes = sample['boxes'] + box_scores = sample['box_scores'] + img_id = sample['img_id'] + h, w = sample['image_shape'] + + final_boxes = [] + final_label = [] + final_score = [] + img_ids.append(img_id) + + for c in range(1, num_classes): + class_box_scores = box_scores[:, c] + score_mask = class_box_scores > config.min_score + class_box_scores = class_box_scores[score_mask] + class_boxes = pred_boxes[score_mask] * [h, w, h, w] + + if score_mask.any(): + nms_index = apply_nms(class_boxes, class_box_scores, config.nms_threshold, config.max_boxes) + class_boxes = class_boxes[nms_index] + class_box_scores = class_box_scores[nms_index] + + final_boxes += class_boxes.tolist() + final_score += class_box_scores.tolist() + final_label += [classs_dict[val_cls_dict[c]]] * len(class_box_scores) + + for loc, label, score in zip(final_boxes, final_label, final_score): + res = {} + res['image_id'] = img_id + res['bbox'] = [loc[1], loc[0], loc[3] - loc[1], loc[2] - loc[0]] + res['score'] = score + res['category_id'] = label + predictions.append(res) + if not os.path.exists('./eval_out'): + os.makedirs('./eval_out') + with open('./eval_out/predictions.json', 'w') as f: + json.dump(predictions, f) + + coco_dt = coco_gt.loadRes('./eval_out/predictions.json') + E = COCOeval(coco_gt, coco_dt, iouType='bbox') + E.params.imgIds = img_ids + E.evaluate() + E.accumulate() + E.summarize() + return E.stats[0] + + +def parse_rec(filename): + """ Parse a PASCAL VOC xml file """ + tree = et.parse(filename) + objects = [] + for obj in tree.findall('object'): + obj_struct = {} + obj_struct['name'] = obj.find('name').text + obj_struct['pose'] = obj.find('pose').text + obj_struct['truncated'] = int(obj.find('truncated').text) + obj_struct['difficult'] = int(obj.find('difficult').text) + bbox = obj.find('bndbox') + obj_struct['bbox'] = [int(bbox.find('xmin').text) - 1, + int(bbox.find('ymin').text) - 1, + int(bbox.find('xmax').text) - 1, + int(bbox.find('ymax').text) - 1] + objects.append(obj_struct) + + return objects + +def voc_metrics(pred_data, annojson, config, use_07=True): + """calc voc ap""" + aps = [] + # The PASCAL VOC metric changed in 2010 + use_07_metric = use_07 + print('VOC07 metric? ' + ('Yes' if use_07_metric else 'No')) + aps = voc_eval(pred_data, config, ovthresh=0.5, use_07_metric=use_07_metric) + print('Mean AP = {:.4f}'.format(np.mean(aps))) + print('~~~~~~~~') + print('Results:') + for ap in aps: + print('{:.3f}'.format(ap)) + print('{:.3f}'.format(np.mean(aps))) + print('~~~~~~~~') + print('') + print('--------------------------------------------------------------') + print('Results computed with the **unofficial** Python eval code.') + print('Results should be very close to the official MATLAB eval code.') + print('--------------------------------------------------------------') + return np.mean(aps) + + +def voc_ap(rec, prec, use_07_metric=True): + """ ap = voc_ap(rec, prec, [use_07_metric]) + Compute VOC AP given precision and recall. + If use_07_metric is true, uses the + VOC 07 11 point method (default:True). + """ + if use_07_metric: + # 11 point metric + ap = 0. + for t in np.arange(0., 1.1, 0.1): + if np.sum(rec >= t) == 0: + p = 0 + else: + p = np.max(prec[rec >= t]) + ap = ap + p / 11. + else: + # correct AP calculation + # first append sentinel values at the end + mrec = np.concatenate(([0.], rec, [1.])) + mpre = np.concatenate(([0.], prec, [0.])) + + # compute the precision envelope + for i in range(mpre.size - 1, 0, -1): + mpre[i - 1] = np.maximum(mpre[i - 1], mpre[i]) + + # to calculate area under PR curve, look for points + # where X axis (recall) changes value + i = np.where(mrec[1:] != mrec[:-1])[0] + + # and sum (\Delta recall) * prec + ap = np.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1]) + return ap + + +def voc_pred_process(pred_data, val_cls, recs): + """process pred data for voc""" + num_classes = config.num_classes + cls_img_ids = {} + cls_bboxes = {} + cls_scores = {} + classes = {} + cls_npos = {} + for cls in val_cls: + if cls == 'background': + continue + class_recs = {} + npos = 0 + for imagename in imagenames: + R = [obj for obj in recs[imagename] if obj['name'] == cls] + bbox = np.array([x['bbox'] for x in R]) + difficult = np.array([x['difficult'] for x in R]).astype(np.bool) + det = [False] * len(R) + npos = npos + sum(~difficult) + class_recs[imagename] = {'bbox': bbox, + 'difficult': difficult, + 'det': det} + cls_npos[cls] = npos + classes[cls] = class_recs + cls_img_ids[cls] = [] + cls_bboxes[cls] = [] + cls_scores[cls] = [] + + for sample in pred_data: + pred_boxes = sample['boxes'] + box_scores = sample['box_scores'] + img_id = sample['img_id'] + h, w = sample['image_shape'] + + final_boxes = [] + final_label = [] + final_score = [] + + for c in range(1, num_classes): + class_box_scores = box_scores[:, c] + score_mask = class_box_scores > config.min_score + class_box_scores = class_box_scores[score_mask] + class_boxes = pred_boxes[score_mask] * [h, w, h, w] + + if score_mask.any(): + nms_index = apply_nms(class_boxes, class_box_scores, config.nms_threshold, config.max_boxes) + class_boxes = class_boxes[nms_index] + class_box_scores = class_box_scores[nms_index] + + final_boxes += class_boxes.tolist() + final_score += class_box_scores.tolist() + final_label += [c] * len(class_box_scores) + + for loc, label, score in zip(final_boxes, final_label, final_score): + cls_img_ids[val_cls[label]].append(img_id) + cls_bboxes[val_cls[label]].append([loc[1], loc[0], loc[3], loc[2]]) + cls_scores[val_cls[label]].append(score) + return classes, cls_img_ids, cls_bboxes, cls_scores, cls_npos + +def voc_eval(pred_data, config, ovthresh=0.5, use_07_metric=False): + """VOC metric utils""" + # first load gt + # load annots + print("Create VOC label") + val_cls = config.classes + voc_root = config.voc_root + sub_dir = 'eval' + voc_dir = os.path.join(voc_root, sub_dir) + if not os.path.isdir(voc_dir): + raise ValueError(f'Cannot find {sub_dir} dataset path.') + + image_dir = anno_dir = voc_dir + if os.path.isdir(os.path.join(voc_dir, 'Images')): + image_dir = os.path.join(voc_dir, 'Images') + if os.path.isdir(os.path.join(voc_dir, 'Annotations')): + anno_dir = os.path.join(voc_dir, 'Annotations') + print("finding dir ", image_dir, anno_dir) + imagenames = [] + image_paths = [] + for anno_file in os.listdir(anno_dir): + if not anno_file.endswith('xml'): + continue + tree = et.parse(os.path.join(anno_dir, anno_file)) + root_node = tree.getroot() + file_name = root_node.find('filename').text + imagenames.append(int(file_name[:-4])) + image_paths.append(os.path.join(anno_dir, anno_file)) + + recs = {} + for i, imagename in enumerate(imagenames): + recs[imagename] = parse_rec(image_paths[i]) + + # extract gt objects for this class + classes = {} + cls_img_ids = {} + cls_bboxes = {} + cls_scores = {} + cls_npos = {} + #pred data + classes, cls_img_ids, cls_bboxes, cls_scores, cls_npos = voc_pred_process(pred_data, val_cls, recs) + aps = [] + for cls in val_cls: + if cls == 'background': + continue + npos = cls_npos[cls] + class_recs = classes[cls] + image_ids = cls_img_ids[cls] + confidence = np.array(cls_scores[cls]) + BB = np.array(cls_bboxes[cls]) + # sort by confidence + sorted_ind = np.argsort(-confidence) + #sorted_scores = np.sort(-confidence) + BB = BB[sorted_ind, :] + image_ids = [image_ids[x] for x in sorted_ind] + + # go down dets and mark TPs and FPs + nd = len(image_ids) + tp = np.zeros(nd) + fp = np.zeros(nd) + for d in range(nd): + R = class_recs[image_ids[d]] + bb = BB[d, :].astype(float) + ovmax = -np.inf + BBGT = R['bbox'].astype(float) + if BBGT.size > 0: + # compute overlaps + # intersection + ixmin = np.maximum(BBGT[:, 0], bb[0]) + iymin = np.maximum(BBGT[:, 1], bb[1]) + ixmax = np.minimum(BBGT[:, 2], bb[2]) + iymax = np.minimum(BBGT[:, 3], bb[3]) + iw = np.maximum(ixmax - ixmin, 0.) + ih = np.maximum(iymax - iymin, 0.) + inters = iw * ih + uni = ((bb[2] - bb[0]) * (bb[3] - bb[1]) + + (BBGT[:, 2] - BBGT[:, 0]) * (BBGT[:, 3] - BBGT[:, 1]) - inters) + overlaps = inters / uni + ovmax = np.max(overlaps) + jmax = np.argmax(overlaps) + + if ovmax > ovthresh: + if not R['difficult'][jmax]: + if not R['det'][jmax]: + tp[d] = 1. + R['det'][jmax] = 1 + else: + fp[d] = 1. + else: + fp[d] = 1. + + # compute precision recall + fp = np.cumsum(fp) + tp = np.cumsum(tp) + #print(npos, nd, fp[-1], tp[-1]) + rec = tp / float(npos) + # avoid divide by zero in case the first detection matches a difficult + # ground truth + prec = tp / np.maximum(tp + fp, np.finfo(np.float64).eps) + ap = voc_ap(rec, prec, use_07_metric) + aps.append(ap) + return np.array(aps) diff --git a/research/cv/RefineDet/src/init_params.py b/research/cv/RefineDet/src/init_params.py new file mode 100644 index 0000000000000000000000000000000000000000..64833e798657d6c11a49f80f62be9f78646174bc --- /dev/null +++ b/research/cv/RefineDet/src/init_params.py @@ -0,0 +1,50 @@ +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +"""Parameters utils""" + +from mindspore.common.initializer import initializer, TruncatedNormal + +def init_net_param(network, initialize_mode='TruncatedNormal'): + """Init the parameters in net.""" + params = network.trainable_params() + for p in params: + if 'beta' not in p.name and 'gamma' not in p.name and 'bias' not in p.name: + if initialize_mode == 'TruncatedNormal': + p.set_data(initializer(TruncatedNormal(0.02), p.data.shape, p.data.dtype)) + else: + p.set_data(initialize_mode, p.data.shape, p.data.dtype) + + +def load_backbone_params(network, param_dict): + """Init the parameters from pre-train model, default is mobilenetv2.""" + for _, param in network.parameters_and_names(): + param_name = param.name.replace('network.backbone.', '') + name_split = param_name.split('.') + if 'features_1' in param_name: + param_name = param_name.replace('features_1', 'features') + if 'features_2' in param_name: + param_name = '.'.join(['features', str(int(name_split[1]) + 14)] + name_split[2:]) + if param_name in param_dict: + param.set_data(param_dict[param_name].data) + + +def filter_checkpoint_parameter_by_list(param_dict, filter_list): + """remove useless parameters according to filter_list""" + for key in list(param_dict.keys()): + for name in filter_list: + if name in key: + print("Delete parameter from checkpoint: ", key) + del param_dict[key] + break diff --git a/research/cv/RefineDet/src/l2norm.py b/research/cv/RefineDet/src/l2norm.py new file mode 100644 index 0000000000000000000000000000000000000000..46ab5c5082ff70c1775a02e01e6b386bc7765e82 --- /dev/null +++ b/research/cv/RefineDet/src/l2norm.py @@ -0,0 +1,38 @@ +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +"""L2Normalization for RefineDet""" + +import mindspore as ms +import mindspore.nn as nn +from mindspore import Tensor +from mindspore.ops import operations as P +from mindspore.common.initializer import Constant + +class L2Norm(nn.Cell): + """L2 Normalization for refinedet""" + def __init__(self, n_channels, scale): + super(L2Norm, self).__init__() + self.n_channels = n_channels + self.gamma = scale + self.eps = 1e-10 + self.weight = ms.Parameter(Tensor(shape=self.n_channels, dtype=ms.float32, init=Constant(self.gamma))) + self.norm = P.L2Normalize(axis=1, epsilon=self.eps) + self.expand_dims = P.ExpandDims() + + def construct(self, x): + """construct network""" + x = self.norm(x) + out = self.expand_dims(self.expand_dims(self.expand_dims(self.weight, 0), 2), 3).expand_as(x) * x + return out diff --git a/research/cv/RefineDet/src/lr_schedule.py b/research/cv/RefineDet/src/lr_schedule.py new file mode 100644 index 0000000000000000000000000000000000000000..893ccbe2300076d03dd78b04896c4bf4b6e66a42 --- /dev/null +++ b/research/cv/RefineDet/src/lr_schedule.py @@ -0,0 +1,55 @@ +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +"""Learning rate schedule""" + +import math +import numpy as np + + +def get_lr(global_step, lr_init, lr_end, lr_max, warmup_epochs, total_epochs, steps_per_epoch): + """ + generate learning rate array + + Args: + global_step(int): total steps of the training + lr_init(float): init learning rate + lr_end(float): end learning rate + lr_max(float): max learning rate + warmup_epochs(float): number of warmup epochs + total_epochs(int): total epoch of training + steps_per_epoch(int): steps of one epoch + + Returns: + np.array, learning rate array + """ + lr_each_step = [] + total_steps = steps_per_epoch * total_epochs + warmup_steps = steps_per_epoch * warmup_epochs + for i in range(total_steps): + if i < warmup_steps: + lr = lr_init + (lr_max - lr_init) * i / warmup_steps + else: + lr = lr_end + \ + (lr_max - lr_end) * \ + (1. + math.cos(math.pi * (i - warmup_steps) / (total_steps - warmup_steps))) / 2. + if lr < 0.0: + lr = 0.0 + lr_each_step.append(lr) + + current_step = global_step + lr_each_step = np.array(lr_each_step).astype(np.float32) + learning_rate = lr_each_step[current_step:] + + return learning_rate diff --git a/research/cv/RefineDet/src/multibox.py b/research/cv/RefineDet/src/multibox.py new file mode 100644 index 0000000000000000000000000000000000000000..fcf82f28e8bc1797ab03d0479065e44effb5a395 --- /dev/null +++ b/research/cv/RefineDet/src/multibox.py @@ -0,0 +1,99 @@ +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +"""Multibox layers from SSD""" + +import mindspore.nn as nn +from mindspore.ops import operations as P +from mindspore.ops import functional as F + + +def _make_divisible(v, divisor, min_value=None): + """nsures that all layers have a channel number that is divisible by 8.""" + if min_value is None: + min_value = divisor + new_v = max(min_value, int(v + divisor / 2) // divisor * divisor) + # Make sure that round down does not go down by more than 10%. + if new_v < 0.9 * v: + new_v += divisor + return new_v + + +def _bn(channel): + return nn.BatchNorm2d(channel, eps=1e-3, momentum=0.97, + gamma_init=1, beta_init=0, moving_mean_init=0, moving_var_init=1) + + +def _last_conv2d(in_channels, out_channels, kernel_size=3, stride=1, pad_mode='same', pad=0): + depthwise_conv = nn.Conv2d(in_channels=in_channels, out_channels=in_channels, kernel_size=kernel_size, + stride=stride, pad_mode=pad_mode, padding=pad, group=in_channels) + conv = nn.Conv2d(in_channels=in_channels, out_channels=out_channels, kernel_size=1, pad_mode="same", has_bias=True) + return nn.SequentialCell([depthwise_conv, _bn(in_channels), nn.ReLU6(), conv]) + +class FlattenConcat(nn.Cell): + """ + Concatenate predictions into a single tensor. + + Args: + num_ssd_boxes (int): The number of boxes. + + Returns: + Tensor, flatten predictions. + """ + def __init__(self, config): + super(FlattenConcat, self).__init__() + self.num_ssd_boxes = config.num_ssd_boxes + self.concat = P.Concat(axis=1) + self.transpose = P.Transpose() + def construct(self, inputs): + """construct network""" + output = () + batch_size = F.shape(inputs[0])[0] + for x in inputs: + x = self.transpose(x, (0, 2, 3, 1)) + output += (F.reshape(x, (batch_size, -1)),) + res = self.concat(output) + return F.reshape(res, (batch_size, self.num_ssd_boxes, -1)) + + +class MultiBox(nn.Cell): + """ + Multibox conv layers. Each multibox layer contains class conf scores and localization predictions. + """ + def __init__(self, config, num_classes, out_channels): + super(MultiBox, self).__init__() + num_classes = num_classes + out_channels = out_channels + num_default = config.num_default + + loc_layers = [] + cls_layers = [] + for k, out_channel in enumerate(out_channels): + loc_layers += [_last_conv2d(out_channel, 4 * num_default[k], + kernel_size=3, stride=1, pad_mode='same', pad=0)] + cls_layers += [_last_conv2d(out_channel, num_classes * num_default[k], + kernel_size=3, stride=1, pad_mode='same', pad=0)] + + self.multi_loc_layers = nn.layer.CellList(loc_layers) + self.multi_cls_layers = nn.layer.CellList(cls_layers) + self.flatten_concat = FlattenConcat(config) + + def construct(self, inputs): + """construct network""" + loc_outputs = () + cls_outputs = () + for i in range(len(self.multi_loc_layers)): + loc_outputs += (self.multi_loc_layers[i](inputs[i]),) + cls_outputs += (self.multi_cls_layers[i](inputs[i]),) + return self.flatten_concat(loc_outputs), self.flatten_concat(cls_outputs) diff --git a/research/cv/RefineDet/src/refinedet.py b/research/cv/RefineDet/src/refinedet.py new file mode 100644 index 0000000000000000000000000000000000000000..e536a367f7045bbe91615d5789cdb65bd0fb02f1 --- /dev/null +++ b/research/cv/RefineDet/src/refinedet.py @@ -0,0 +1,224 @@ +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +"""RefineDet network structure""" + +import mindspore as ms +import mindspore.nn as nn +from mindspore.ops import operations as P +from mindspore.ops import functional as F + +from .vgg16_for_refinedet import vgg16 +from .resnet101_for_refinedet import resnet +from .multibox import MultiBox +from .l2norm import L2Norm + +def _make_conv_layer(channels, use_bn=False, use_relu=True, kernel_size=3, padding=0): + """make convolution layer for refinedet""" + in_channels = channels[0] + layers = [] + for out_channels in channels[1:]: + layers.append(nn.Conv2d(in_channels=in_channels, out_channels=out_channels, + kernel_size=kernel_size, pad_mode="pad", padding=padding)) + if use_bn: + layers.append(nn.BatchNorm2d(out_channels)) + if use_relu: + layers.append(nn.ReLU()) + in_channels = out_channels + return layers + +def _make_deconv_layer(channels, use_bn=False, use_relu=True, kernel_size=3, padding=0, stride=1): + """make deconvolution layer for TCB""" + in_channels = channels[0] + layers = [] + for out_channels in channels[1:]: + layers.append(nn.Conv2dTranspose(in_channels=in_channels, out_channels=out_channels, + kernel_size=kernel_size, pad_mode="pad", padding=padding, stride=stride)) + if use_bn: + layers.append(nn.BatchNorm2d(out_channels)) + if use_relu: + layers.append(nn.ReLU()) + in_channels = out_channels + return nn.SequentialCell(layers) + +class TCB(nn.Cell): + """TCB block for transport features from ARM to ODM""" + def __init__(self, arm_source_num, in_channels, normalization, use_bn=False): + super(TCB, self).__init__() + self.layers = [] + self.t_num = arm_source_num + self.add = P.Add() + for idx in range(self.t_num): + self.layers.append([]) + if normalization: + if normalization[idx] != -1: + self.layers[idx] += [L2Norm(in_channels[idx], normalization[idx])] + + self.layers[idx] += _make_conv_layer([in_channels[idx], 256], use_bn=use_bn, padding=1) + if idx + 1 == self.t_num: + self.layers[idx] += [nn.SequentialCell(_make_conv_layer([256, 256, 256], use_bn=use_bn, padding=1))] + else: + self.layers[idx] += _make_conv_layer([256, 256], use_bn=use_bn, use_relu=False, padding=1) + self.layers[idx] += [nn.SequentialCell(_make_conv_layer([256, 256, 256], use_bn=use_bn, padding=1))] + self.tcb0 = nn.SequentialCell(self.layers[0][:-1]) + self.deconv0 = _make_deconv_layer([256, 256], use_bn=use_bn, kernel_size=2, stride=2) + self.p0 = self.layers[0][-1] + self.tcb1 = nn.SequentialCell(self.layers[1][:-1]) + self.deconv1 = _make_deconv_layer([256, 256], use_bn=use_bn, kernel_size=2, stride=2) + self.p1 = self.layers[1][-1] + self.tcb2 = nn.SequentialCell(self.layers[2][:-1]) + self.deconv2 = _make_deconv_layer([256, 256], use_bn=use_bn, kernel_size=2, stride=2) + self.p2 = self.layers[2][-1] + self.tcb3 = nn.SequentialCell(self.layers[3][:-1]) + self.p3 = self.layers[3][-1] + + def construct(self, x): + """construct network""" + outputs = () + tmp = x[3] + tmp = self.tcb3(tmp) + tmp = self.p3(tmp) + outputs += (tmp,) + tmp = x[2] + tmp = self.tcb2(tmp) + tmp = self.add(tmp, self.deconv2(outputs[0])) + tmp = self.p2(tmp) + outputs = (tmp,) + outputs + tmp = x[1] + tmp = self.tcb1(tmp) + tmp = self.add(tmp, self.deconv1(outputs[0])) + tmp = self.p1(tmp) + outputs = (tmp,) + outputs + tmp = x[0] + tmp = self.tcb0(tmp) + tmp = self.add(tmp, self.deconv0(outputs[0])) + tmp = self.p0(tmp) + outputs = (tmp,) + outputs + return outputs + +class ARM(nn.Cell): + """anchor refined module""" + def __init__(self, backbone, config, is_training=True): + super(ARM, self).__init__() + self.layer = [] + self.layers = {} + self.backbone = backbone + self.multi_box = MultiBox(config, 2, config.extra_arm_channels) + self.is_training = is_training + if not is_training: + self.activation = P.Sigmoid() + + def construct(self, x): + """construct network""" + outputs = self.backbone(x) + multi_feature = outputs + pred_loc, pred_label = self.multi_box(multi_feature) + if not self.is_training: + pred_label = self.activation(pred_label) + pred_loc = F.cast(pred_loc, ms.float32) + pred_label = F.cast(pred_label, ms.float32) + return outputs, pred_loc, pred_label + +class ODM(nn.Cell): + """object detecion module""" + def __init__(self, config, is_training=True): + super(ODM, self).__init__() + self.layer = [] + self.layers = {} + self.multi_box = MultiBox(config, config.num_classes, config.extra_odm_channels) + self.is_training = is_training + if not is_training: + self.activation = P.Sigmoid() + + def construct(self, x): + """construct network""" + outputs = x + multi_feature = outputs + pred_loc, pred_label = self.multi_box(multi_feature) + if not self.is_training: + pred_label = self.activation(pred_label) + pred_loc = F.cast(pred_loc, ms.float32) + pred_label = F.cast(pred_label, ms.float32) + return pred_loc, pred_label + +class RefineDet(nn.Cell): + """refinedet network""" + def __init__(self, backbone, config, is_training=True): + super(RefineDet, self).__init__() + self.backbone = backbone + self.is_training = is_training + self.arm = ARM(backbone, config, is_training) + self.odm = ODM(config, is_training) + self.tcb = TCB(len(config.arm_source), config.extra_arm_channels, config.L2normalizations) + + def construct(self, x): + """construct network""" + arm_out, arm_pre_loc, arm_pre_label = self.arm(x) + tcb_out = self.tcb(arm_out) + odm_pre_loc, odm_pre_label = self.odm(tcb_out) + return arm_pre_loc, arm_pre_label, odm_pre_loc, odm_pre_label, arm_out + +def refinedet_vgg16(config, is_training=True): + """return refinedet with vgg16""" + return RefineDet(backbone=vgg16(), config=config, is_training=is_training) + + +def refinedet_resnet101(config, is_training=True): + """return refinedet with resnet101""" + return RefineDet(backbone=resnet(), config=config, is_training=is_training) + +class RefineDetInferWithDecoder(nn.Cell): + """ + RefineDet Infer wrapper to decode the bbox locations. (As detection layers in other forms) + Args: + network (Cell): the origin ssd infer network without bbox decoder. + default_boxes (Tensor): the default_boxes from anchor generator + config (dict): network config + Returns: + Tensor, the locations for bbox after decoder representing (y0,x0,y1,x1) + Tensor, the prediction labels. + """ + def __init__(self, network, default_boxes, config): + super(RefineDetInferWithDecoder, self).__init__() + self.network = network + self.default_boxes = default_boxes + self.prior_scaling_xy = config.prior_scaling[0] + self.prior_scaling_wh = config.prior_scaling[1] + self.objectness_thre = config.objectness_thre + self.softmax1 = nn.Softmax() + self.softmax2 = nn.Softmax() + + def construct(self, x): + """construct network""" + _, arm_label, odm_loc, odm_label, _ = self.network(x) + + arm_label = self.softmax1(arm_label) + pred_loc = odm_loc + pred_label = self.softmax2(odm_label) + pred_label = odm_label + arm_object_conf = arm_label[:, :, 1:] + no_object_index = F.cast(arm_object_conf > self.objectness_thre, ms.float32) + pred_label = pred_label * no_object_index.expand_as(pred_label) + + default_bbox_xy = self.default_boxes[..., :2] + default_bbox_wh = self.default_boxes[..., 2:] + pred_xy = pred_loc[..., :2] * self.prior_scaling_xy * default_bbox_wh + default_bbox_xy + pred_wh = P.Exp()(pred_loc[..., 2:] * self.prior_scaling_wh) * default_bbox_wh + + pred_xy_0 = pred_xy - pred_wh / 2.0 + pred_xy_1 = pred_xy + pred_wh / 2.0 + pred_xy = P.Concat(-1)((pred_xy_0, pred_xy_1)) + pred_xy = P.Maximum()(pred_xy, 0) + pred_xy = P.Minimum()(pred_xy, 1) + return pred_xy, pred_label diff --git a/research/cv/RefineDet/src/refinedet_loss_cell.py b/research/cv/RefineDet/src/refinedet_loss_cell.py new file mode 100644 index 0000000000000000000000000000000000000000..dfc0e7d59f553c07423c184bdfac836f5ace0309 --- /dev/null +++ b/research/cv/RefineDet/src/refinedet_loss_cell.py @@ -0,0 +1,185 @@ +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +"""RefineDet loss cell and training wrapper""" + +import mindspore as ms +import mindspore.nn as nn +from mindspore import context, Tensor +from mindspore.context import ParallelMode +from mindspore.parallel._auto_parallel_context import auto_parallel_context +from mindspore.communication.management import get_group_size +from mindspore.ops import operations as P +from mindspore.ops import functional as F +from mindspore.ops import composite as C + + + +class SigmoidFocalClassificationLoss(nn.Cell): + """" + Sigmoid focal-loss for classification. + + Args: + gamma (float): Hyper-parameter to balance the easy and hard examples. Default: 2.0 + alpha (float): Hyper-parameter to balance the positive and negative example. Default: 0.25 + + Returns: + Tensor, the focal loss. + """ + def __init__(self, gamma=2.0, alpha=0.25): + super(SigmoidFocalClassificationLoss, self).__init__() + self.sigmiod_cross_entropy = P.SigmoidCrossEntropyWithLogits() + self.sigmoid = P.Sigmoid() + self.pow = P.Pow() + self.onehot = P.OneHot() + self.on_value = Tensor(1.0, ms.float32) + self.off_value = Tensor(0.0, ms.float32) + self.gamma = gamma + self.alpha = alpha + + def construct(self, logits, label): + """construct network""" + label = self.onehot(label, F.shape(logits)[-1], self.on_value, self.off_value) + sigmiod_cross_entropy = self.sigmiod_cross_entropy(logits, label) + sigmoid = self.sigmoid(logits) + label = F.cast(label, ms.float32) + p_t = label * sigmoid + (1 - label) * (1 - sigmoid) + modulating_factor = self.pow(1 - p_t, self.gamma) + alpha_weight_factor = label * self.alpha + (1 - label) * (1 - self.alpha) + focal_loss = modulating_factor * alpha_weight_factor * sigmiod_cross_entropy + return focal_loss + + +class MultiBoxLoss(nn.Cell): + """" + Provide multibox loss through network. + + Args: + network (Cell): The training network. + config (dict): RefineDet config. + + Returns: + Tensor, the loss of the network. + """ + def __init__(self, config): + super(MultiBoxLoss, self).__init__() + self.less = P.Less() + self.tile = P.Tile() + self.reduce_sum = P.ReduceSum() + self.expand_dims = P.ExpandDims() + self.class_loss = SigmoidFocalClassificationLoss(config.gamma, config.alpha) + self.loc_loss = nn.SmoothL1Loss() + self.softmax = nn.Softmax(axis=2) + + def construct(self, x, gt_loc, gt_label, num_matched_boxes, arm_label=None, theta=0.01, use_hard=0): + """construct network""" + pred_loc, pred_label = x + mask = F.cast(self.less(0, gt_label), ms.float32) + if arm_label is not None: + p = self.softmax(arm_label) + hard_negative = F.cast(p[:, :, 1] > theta, ms.float32) + mask = (1 - use_hard) * mask + use_hard * mask * hard_negative + num_matched_boxes = self.reduce_sum(F.cast(num_matched_boxes, ms.float32)) + + # Localization Loss + mask_loc = self.tile(self.expand_dims(mask, -1), (1, 1, 4)) + smooth_l1 = self.loc_loss(pred_loc, gt_loc) * mask_loc + loss_loc = self.reduce_sum(self.reduce_sum(smooth_l1, -1), -1) + + # Classification Loss + loss_cls = self.class_loss(pred_label, gt_label) + loss_cls = self.reduce_sum(loss_cls, (1, 2)) + + return self.reduce_sum((loss_cls + loss_loc) / num_matched_boxes) + + +class RefineDetLossCell(nn.Cell): + """" + Provide RefineDet training loss through network. + + Args: + network (Cell): The training network. + config (dict): RefineDet config. + + Returns: + Tensor, the loss of the network. + """ + def __init__(self, network, config): + super(RefineDetLossCell, self).__init__() + self.multiboxloss = MultiBoxLoss(config) + self.network = network + + def construct(self, x, gt_loc, gt_label, num_matched_boxes): + """construct network""" + arm_pre_loc, arm_pre_label, odm_pre_loc, odm_pre_label, _ = self.network(x) + arm_loss = self.multiboxloss((arm_pre_loc, arm_pre_label), gt_loc, gt_label, num_matched_boxes) + odm_loss = self.multiboxloss((odm_pre_loc, odm_pre_label), gt_loc, gt_label, num_matched_boxes, arm_pre_label) + return arm_loss + odm_loss + + +grad_scale = C.MultitypeFuncGraph("grad_scale") +@grad_scale.register("Tensor", "Tensor") +def tensor_grad_scale(scale, grad): + return grad * P.Reciprocal()(scale) + + +class TrainingWrapper(nn.Cell): + """ + Encapsulation class of SSD network training. + + Append an optimizer to the training network after that the construct + function can be called to create the backward graph. + + Args: + network (Cell): The training network. Note that loss function should have been added. + optimizer (Optimizer): Optimizer for updating the weights. + sens (Number): The adjust parameter. Default: 1.0. + use_global_nrom(bool): Whether apply global norm before optimizer. Default: False + """ + def __init__(self, network, optimizer, sens=1.0, use_global_norm=False): + super(TrainingWrapper, self).__init__(auto_prefix=False) + self.network = network + self.network.set_grad() + self.weights = ms.ParameterTuple(network.trainable_params()) + self.optimizer = optimizer + self.grad = C.GradOperation(get_by_list=True, sens_param=True) + self.sens = sens + self.reducer_flag = False + self.grad_reducer = None + self.use_global_norm = use_global_norm + self.parallel_mode = context.get_auto_parallel_context("parallel_mode") + if self.parallel_mode in [ParallelMode.DATA_PARALLEL, ParallelMode.HYBRID_PARALLEL]: + self.reducer_flag = True + if self.reducer_flag: + mean = context.get_auto_parallel_context("gradients_mean") + if auto_parallel_context().get_device_num_is_set(): + degree = context.get_auto_parallel_context("device_num") + else: + degree = get_group_size() + self.grad_reducer = nn.DistributedGradReducer(optimizer.parameters, mean, degree) + self.hyper_map = C.HyperMap() + + def construct(self, *args): + """construct network""" + weights = self.weights + loss = self.network(*args) + sens = P.Fill()(P.DType()(loss), P.Shape()(loss), self.sens) + grads = self.grad(self.network, weights)(*args, sens) + if self.reducer_flag: + # apply grad reducer on grads + grads = self.grad_reducer(grads) + if self.use_global_norm: + grads = self.hyper_map(F.partial(grad_scale, F.scalar_to_array(self.sens)), grads) + grads = C.clip_by_global_norm(grads) + return F.depend(loss, self.optimizer(grads)) diff --git a/research/cv/RefineDet/src/resnet101_for_refinedet.py b/research/cv/RefineDet/src/resnet101_for_refinedet.py new file mode 100644 index 0000000000000000000000000000000000000000..0ee3b6497efa7d372297f520b08e8f476910201a --- /dev/null +++ b/research/cv/RefineDet/src/resnet101_for_refinedet.py @@ -0,0 +1,241 @@ +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +"""ResNet101 for RefineDet""" +import mindspore.nn as nn +from mindspore.ops import operations as P + + +def _conv3x3(in_channel, out_channel, stride=1): + return nn.Conv2d(in_channel, out_channel, + kernel_size=3, stride=stride, padding=0, pad_mode='same') + + +def _conv1x1(in_channel, out_channel, stride=1): + return nn.Conv2d(in_channel, out_channel, kernel_size=1, stride=stride, padding=0, pad_mode='same') + + +def _conv7x7(in_channel, out_channel, stride=1): + return nn.Conv2d(in_channel, out_channel, kernel_size=7, stride=stride, padding=0, pad_mode='same') + + +def _bn(channel): + return nn.BatchNorm2d(channel, eps=1e-3, momentum=0.997, + gamma_init=1, beta_init=0, moving_mean_init=0, moving_var_init=1) + + +def _bn_last(channel): + return nn.BatchNorm2d(channel, eps=1e-3, momentum=0.997, + gamma_init=0, beta_init=0, moving_mean_init=0, moving_var_init=1) + +class ResidualBlock(nn.Cell): + """ + ResNet V1 residual block definition. + + Args: + in_channel (int): Input channel. + out_channel (int): Output channel. + stride (int): Stride size for the first convolutional layer. Default: 1. + + Returns: + Tensor, output tensor. + + Examples: + >>> ResidualBlock(3, 256, stride=2) + """ + + def __init__(self, + in_channel, + out_channel, + stride=1, expansion=4): + super(ResidualBlock, self).__init__() + self.expansion = expansion + self.stride = stride + channel = out_channel // self.expansion + self.conv1 = _conv1x1(in_channel, channel, stride=1) + self.bn1 = _bn(channel) + self.conv2 = _conv3x3(channel, channel, stride=stride) + self.bn2 = _bn(channel) + + self.conv3 = _conv1x1(channel, out_channel, stride=1) + self.bn3 = _bn_last(out_channel) + self.relu = nn.ReLU() + + self.down_sample = False + + if stride != 1 or in_channel != out_channel: + self.down_sample = True + self.down_sample_layer = None + + if self.down_sample: + self.down_sample_layer = nn.SequentialCell([_conv1x1(in_channel, out_channel, stride), _bn(out_channel)]) + self.add = P.Add() + + def construct(self, x): + """construct network""" + identity = x + out = self.conv1(x) + out = self.bn1(out) + out = self.relu(out) + out = self.conv2(out) + out = self.bn2(out) + out = self.relu(out) + out = self.conv3(out) + out = self.bn3(out) + + if self.down_sample: + identity = self.down_sample_layer(identity) + + out = self.add(out, identity) + out = self.relu(out) + + return out + + +class ResNet(nn.Cell): + """ + ResNet architecture. + + Args: + block (Cell): Block for network. + layer_nums (list): Numbers of block in different layers. + in_channels (list): Input channel in each layer. + out_channels (list): Output channel in each layer. + strides (list): Stride size in each layer. + Returns: + Tensor, output tensor. + + Examples: + >>> ResNet(ResidualBlock, + >>> [3, 4, 6, 3], + >>> [64, 256, 512, 1024], + >>> [256, 512, 1024, 2048], + >>> [1, 2, 2, 2] + """ + + def __init__(self, + block, + layer_nums, + in_channels, + out_channels, + strides): + super(ResNet, self).__init__() + + if not len(layer_nums) == len(in_channels) == len(out_channels) == 4: + raise ValueError("the length of layer_num, in_channels, out_channels list must be 4!") + self.conv1 = _conv7x7(3, 64, stride=2) + self.bn1 = _bn(64) + self.relu = P.ReLU() + self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, pad_mode="same") + self.layer1 = self._make_layer(block, + layer_nums[0], + in_channel=in_channels[0], + out_channel=out_channels[0], + stride=strides[0]) + self.layer2 = self._make_layer(block, + layer_nums[1], + in_channel=in_channels[1], + out_channel=out_channels[1], + stride=strides[1]) + self.layer3 = self._make_layer(block, + layer_nums[2], + in_channel=in_channels[2], + out_channel=out_channels[2], + stride=strides[2]) + self.layer4 = self._make_layer(block, + layer_nums[3], + in_channel=in_channels[3], + out_channel=out_channels[3], + stride=strides[3]) + + def _make_layer(self, block, layer_num, in_channel, out_channel, stride): + """ + Make stage network of ResNet. + + Args: + block (Cell): Resnet block. + layer_num (int): Layer number. + in_channel (int): Input channel. + out_channel (int): Output channel. + stride (int): Stride size for the first convolutional layer. + Returns: + SequentialCell, the output layer. + + Examples: + >>> _make_layer(ResidualBlock, 3, 128, 256, 2) + """ + layers = [] + + resnet_block = block(in_channel, out_channel, stride=stride) + layers.append(resnet_block) + for _ in range(1, layer_num): + resnet_block = block(out_channel, out_channel, stride=1) + layers.append(resnet_block) + return nn.SequentialCell(layers) + + def construct(self, x): + """construct network""" + x = self.conv1(x) + x = self.bn1(x) + x = self.relu(x) + c1 = self.maxpool(x) + + c2 = self.layer1(c1) + c3 = self.layer2(c2) + c4 = self.layer3(c3) + c5 = self.layer4(c4) + return c1, c2, c3, c4, c5 + + +def resnet50(): + """ + Get ResNet50 neural network. + + Returns: + Cell, cell instance of ResNet50 neural network. + + Examples: + >>> net = resnet50() + """ + return ResNet(ResidualBlock, + [3, 4, 6, 3], + [64, 256, 512, 1024], + [256, 512, 1024, 2048], + [1, 2, 2, 2]) + +def resnet101(): + """ + Get ResNet101 neural network. + """ + return ResNet(ResidualBlock, + [3, 4, 23, 3], + [64, 256, 512, 1024], + [256, 512, 1024, 2048], + [1, 2, 2, 2]) + +class ResNet101_for_RefineDet(nn.Cell): + """build up resnet101""" + def __init__(self): + super(ResNet101_for_RefineDet, self).__init__() + self.base = resnet101() + self.extra = ResidualBlock(2048, 512, 2, 16) + + def construct(self, x): + """construct network""" + _, _, c3, c4, c5 = self.base(x) + c6 = self.extra(c5) + return c3, c4, c5, c6 + +def resnet(): + return ResNet101_for_RefineDet() diff --git a/research/cv/RefineDet/src/vgg16_for_refinedet.py b/research/cv/RefineDet/src/vgg16_for_refinedet.py new file mode 100644 index 0000000000000000000000000000000000000000..cb3b505b935f9f8fb8e53abc59153669818e7b1f --- /dev/null +++ b/research/cv/RefineDet/src/vgg16_for_refinedet.py @@ -0,0 +1,80 @@ +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +"""VGG16 backbone for RefineDet""" + +import mindspore.nn as nn + +def _make_conv_layer(channels, use_bn=False, kernel_size=3, stride=1, padding=0): + """make convolution layers for vgg16""" + in_channels = channels[0] + layers = [] + for out_channels in channels[1:]: + layers.append(nn.Conv2d(in_channels=in_channels, out_channels=out_channels, + kernel_size=kernel_size, stride=stride, pad_mode="pad", padding=padding)) + if use_bn: + layers.append(nn.BatchNorm2d(out_channels)) + layers.append(nn.ReLU()) + in_channels = out_channels + return nn.SequentialCell(layers) + +class VGG16_for_RefineDet(nn.Cell): + """ + VGG-16 network body, reference to caffe model_libs + """ + def __init__(self): + super(VGG16_for_RefineDet, self).__init__() + self.b1 = _make_conv_layer([3, 64, 64], padding=1) + self.b2 = _make_conv_layer([64, 128, 128], padding=1) + self.b3 = _make_conv_layer([128, 256, 256, 256], padding=1) + self.b4 = _make_conv_layer([256, 512, 512, 512], padding=1) + self.b5 = _make_conv_layer([512, 512, 512, 512], padding=1) + self.m1 = nn.MaxPool2d(kernel_size=2, stride=2, pad_mode="same") + self.m2 = nn.MaxPool2d(kernel_size=2, stride=2, pad_mode="same") + self.m3 = nn.MaxPool2d(kernel_size=2, stride=2, pad_mode="same") + self.m4 = nn.MaxPool2d(kernel_size=2, stride=2, pad_mode="same") + self.m5 = nn.MaxPool2d(kernel_size=2, stride=2, pad_mode="same") + self.fc6 = nn.Conv2d(in_channels=512, out_channels=1024, pad_mode="pad", padding=3, kernel_size=3, dilation=3) + self.relu6 = nn.ReLU() + self.fc7 = nn.Conv2d(in_channels=1024, out_channels=1024, kernel_size=1) + self.relu7 = nn.ReLU() + self.b6_1 = _make_conv_layer([1024, 256], kernel_size=1) + self.b6_2 = _make_conv_layer([256, 512], stride=2, padding=1) + + def construct(self, x): + """construct network""" + outputs = () + x = self.b1(x) + x = self.m1(x) + x = self.b2(x) + x = self.m2(x) + x = self.b3(x) + x = self.m3(x) + x = self.b4(x) + outputs += (x,) + x = self.m4(x) + x = self.b5(x) + outputs += (x,) + x = self.m5(x) + x = self.fc6(x) + x = self.relu6(x) + x = self.fc7(x) + outputs += (x,) + x = self.relu7(x) + x = self.b6_1(x) + x = self.b6_2(x) + return outputs + (x,) + +def vgg16(): + return VGG16_for_RefineDet() diff --git a/research/cv/RefineDet/train.py b/research/cv/RefineDet/train.py new file mode 100644 index 0000000000000000000000000000000000000000..56bb8b60f97eb59b841c699a3d1b942f199b8532 --- /dev/null +++ b/research/cv/RefineDet/train.py @@ -0,0 +1,205 @@ +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +"""Train RefineDet and get checkpoint files.""" + +import argparse +import ast +import os +import mindspore.nn as nn +from mindspore import context, Tensor +from mindspore.communication.management import init, get_rank +from mindspore.train.callback import CheckpointConfig, ModelCheckpoint, LossMonitor, TimeMonitor +from mindspore.train import Model +from mindspore.context import ParallelMode +from mindspore.train.serialization import load_checkpoint, load_param_into_net +from mindspore.common import set_seed, dtype +from src.config import get_config +from src.dataset import create_refinedet_dataset, create_mindrecord +from src.lr_schedule import get_lr +from src.init_params import init_net_param +from src.refinedet import refinedet_vgg16, refinedet_resnet101 +from src.refinedet_loss_cell import RefineDetLossCell, TrainingWrapper + +set_seed(1) + +def get_args(): + """get args for train""" + parser = argparse.ArgumentParser(description="RefineDet training script") + parser.add_argument("--using_mode", type=str, default="refinedet_vgg16_320", + choices=("refinedet_vgg16_320", "refinedet_vgg16_512", + "refinedet_resnet101_320", "refinedet_resnet101_512"), + help="which network you want to train, we present four networks: " + "using vgg16 as backbone with 320x320 image size" + "using vgg16 as backbone with 512x512 image size" + "using resnet101 as backbone with 320x320 image size" + "using resnet101 as backbone with 512x512 image size") + parser.add_argument("--run_online", type=ast.literal_eval, default=False, + help="Run on Modelarts platform, need data_url, train_url if true, default is False.") + parser.add_argument("--data_url", type=str, + help="using for OBS file system") + parser.add_argument("--train_url", type=str, + help="using for OBS file system") + parser.add_argument("--pre_trained_url", type=str, default=None, help="Pretrained Checkpoint file url for OBS.") + parser.add_argument("--run_platform", type=str, default="Ascend", choices=("Ascend", "GPU", "CPU"), + help="run platform, support Ascend, GPU and CPU.") + parser.add_argument("--only_create_dataset", type=ast.literal_eval, default=False, + help="If set it true, only create Mindrecord, default is False.") + parser.add_argument("--distribute", type=ast.literal_eval, default=False, + help="Run distribute, default is False.") + parser.add_argument("--device_id", type=int, default=0, help="Device id, default is 0.") + parser.add_argument("--device_num", type=int, default=1, help="Use device nums, default is 1.") + parser.add_argument("--lr", type=float, default=0.05, help="Learning rate, default is 0.05.") + parser.add_argument("--mode", type=str, default="sink", help="Run sink mode or not, default is sink.") + parser.add_argument("--dataset", type=str, default="coco", + help="Dataset, default is coco." + "Now we have coco, voc0712, voc0712plus") + parser.add_argument("--epoch_size", type=int, default=500, help="Epoch size, default is 500.") + parser.add_argument("--batch_size", type=int, default=32, help="Batch size, default is 32.") + parser.add_argument("--pre_trained", type=str, default=None, help="Pretrained Checkpoint file path.") + parser.add_argument("--pre_trained_epoch_size", type=int, default=0, help="Pretrained epoch size.") + parser.add_argument("--save_checkpoint_epochs", type=int, default=10, help="Save checkpoint epochs, default is 10.") + parser.add_argument("--loss_scale", type=int, default=1024, help="Loss scale, default is 1024.") + parser.add_argument("--filter_weight", type=ast.literal_eval, default=False, + help="Filter head weight parameters, default is False.") + parser.add_argument('--debug', type=str, default="0", choices=["0", "1", "2", "3"], + help="Active the debug mode. 0 for no debug mode," + "Under debug mode 1, the network would be run in PyNative mode," + "Under debug mode 2, all ascend log would be print on stdout," + "Under debug mode 3, all ascend log would be print on stdout." + "And network will run in PyNative mode.") + parser.add_argument("--check_point", type=str, default="./ckpt", + help="The directory path to save check point files") + args_opt = parser.parse_args() + return args_opt + +def refinedet_model_build(config, args_opt): + """build refinedet network""" + if config.model == "refinedet_vgg16": + refinedet = refinedet_vgg16(config=config) + init_net_param(refinedet) + elif config.model == "refinedet_resnet101": + refinedet = refinedet_resnet101(config=config) + init_net_param(refinedet) + else: + raise ValueError(f'config.model: {config.model} is not supported') + return refinedet + +def train_main(args_opt): + """main code for train refinedet""" + rank = 0 + device_num = 1 + # config with args + config = get_config(args_opt.using_mode, args_opt.dataset) + + # run mode config + if args_opt.debug == "1" or args_opt.debug == "3": + network_mode = context.PYNATIVE_MODE + else: + network_mode = context.GRAPH_MODE + + # set run platform + if args_opt.run_platform == "CPU": + context.set_context(mode=network_mode, device_target="CPU") + else: + context.set_context(mode=network_mode, device_target=args_opt.run_platform, device_id=args_opt.device_id) + if args_opt.distribute: + device_num = args_opt.device_num + context.reset_auto_parallel_context() + context.set_auto_parallel_context(parallel_mode=ParallelMode.DATA_PARALLEL, gradients_mean=True, + device_num=device_num) + init() + rank = get_rank() + + mindrecord_file = create_mindrecord(config, args_opt.dataset, "refinedet.mindrecord", True) + + if args_opt.only_create_dataset: + return + + loss_scale = float(args_opt.loss_scale) + if args_opt.run_platform == "CPU": + loss_scale = 1.0 + + # When create MindDataset, using the fitst mindrecord file, such as + # refinedet.mindrecord0. + use_multiprocessing = (args_opt.run_platform != "CPU") + dataset = create_refinedet_dataset(config, mindrecord_file, repeat_num=1, batch_size=args_opt.batch_size, + device_num=device_num, rank=rank, use_multiprocessing=use_multiprocessing) + + dataset_size = dataset.get_dataset_size() + print(f"Create dataset done! dataset size is {dataset_size}") + refinedet = refinedet_model_build(config, args_opt) + if ("use_float16" in config and config.use_float16) or args_opt.run_platform == "GPU": + refinedet.to_float(dtype.float16) + net = RefineDetLossCell(refinedet, config) + + # checkpoint + ckpt_config = CheckpointConfig(save_checkpoint_steps=dataset_size * args_opt.save_checkpoint_epochs) + ckpt_prefix = args_opt.check_point + '/ckpt_' + save_ckpt_path = ckpt_prefix + str(rank) + '/' + ckpoint_cb = ModelCheckpoint(prefix="refinedet", directory=save_ckpt_path, config=ckpt_config) + + if args_opt.pre_trained: + param_dict = load_checkpoint(args_opt.pre_trained) + load_param_into_net(net, param_dict, True) + + lr = Tensor(get_lr(global_step=args_opt.pre_trained_epoch_size * dataset_size, + lr_init=config.lr_init, lr_end=config.lr_end_rate * args_opt.lr, lr_max=args_opt.lr, + warmup_epochs=config.warmup_epochs, + total_epochs=args_opt.epoch_size, + steps_per_epoch=dataset_size)) + + if "use_global_norm" in config and config.use_global_norm: + opt = nn.Momentum(filter(lambda x: x.requires_grad, net.get_parameters()), lr, + config.momentum, config.weight_decay, 1.0) + net = TrainingWrapper(net, opt, loss_scale, True) + else: + opt = nn.Momentum(filter(lambda x: x.requires_grad, net.get_parameters()), lr, + config.momentum, config.weight_decay, loss_scale) + net = TrainingWrapper(net, opt, loss_scale) + + + callback = [TimeMonitor(data_size=dataset_size), LossMonitor(), ckpoint_cb] + model = Model(net) + dataset_sink_mode = False + if args_opt.mode == "sink" and args_opt.run_platform != "CPU": + print("In sink mode, one epoch return a loss.") + dataset_sink_mode = True + print("Start train RefineDet, the first epoch will be slower because of the graph compilation.") + model.train(args_opt.epoch_size, dataset, callbacks=callback, dataset_sink_mode=dataset_sink_mode) + +def main(): + args_opt = get_args() + # copy files if online + if args_opt.run_online: + import moxing as mox + args_opt.device_id = int(os.getenv('DEVICE_ID')) + args_opt.device_num = int(os.getenv('RANK_SIZE')) + dir_root = os.getcwd() + data_root = os.path.join(dir_root, "data") + ckpt_root = os.path.join(dir_root, args_opt.check_point) + mox.file.copy_parallel(args_opt.data_url, data_root) + if args_opt.pre_trained: + mox.file.copy_parallel(args_opt.pre_trained_url, args_opt.pre_trained) + # print log to stdout + if args_opt.debug == "2" or args_opt.debug == "3": + os.environ["SLOG_PRINT_TO_STDOUT"] = "1" + os.environ["ASCEND_SLOG_PRINT_TO_STDOUT"] = "1" + os.environ["ASCEND_GLOBAL_LOG_LEVEL"] = "1" + train_main(args_opt) + if args_opt.run_online: + mox.file.copy_parallel(ckpt_root, args_opt.train_url) + +if __name__ == '__main__': + main()