diff --git a/research/cv/EGnet/README_CN.md b/research/cv/EGnet/README_CN.md new file mode 100644 index 0000000000000000000000000000000000000000..7de4223ba18f5a00d3e3d9fd3a9dd5200342014d --- /dev/null +++ b/research/cv/EGnet/README_CN.md @@ -0,0 +1,281 @@ + +# 目录 + +- [目录](#目录) +- [EGNet描述](#EGNet描述) +- [模型架构](#模型架构) +- [数据集](#数据集) +- [环境要求](#环境要求) +- [快速入门](#快速入门) +- [脚本说明](#脚本说明) + - [脚本及样例代码](#脚本及样例代码) + - [脚本参数](#脚本参数) +- [训练过程](#训练过程) + - [训练](#训练) + - [分布式训练](#分布式训练) +- [评估过程](#评估过程) + - [评估](#评估) +- [导出过程](#导出过程) + - [导出](#导出) +- [模型描述](#模型描述) + - [性能](#性能) + - [评估性能](#评估性能) + - [DUTS-TR上的EGNet](#DUTS-TR上的EGNet) + - [推理性能](#推理性能) + - [显著性检测数据集上的EGNet](#显著性检测数据集上的EGNet) +- [ModelZoo主页](#modelzoo主页) + +# EGNet描述 + +EGNet是用来解决静态目标检测问题,它由边缘特征提取部分、显著性目标特征提取部分以及一对一的导向模块三部分构成,利用边缘特征帮助显著性目标特征定位目标,使目标的边界更加准确。在6个不同的数据集中与15种目前最好的方法进行对比,实验结果表明EGNet性能最优。 + +[论文](https://arxiv.org/abs/1908.08297): Zhao J X, Liu J J, Fan D P, et al. EGNet: Edge guidance network for salient object detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019: 8779-8788. + +# 模型架构 + +EGNet网络由三个部分组成,NLSEM(边缘提取模块)、PSFEM(目标特征提取模块)、O2OGM(一对一指导模块),原始图片通过两次卷积输出图片边缘信息,与此同时,对原始图像进行更深层次的卷积操作提取salient object,然后将边缘信息与不同深度提取出来的显著目标在一对一指导模块中分别FF(融合),再分别经过卷积操作得到不同程度的显著性图像,最终输出了一张融合后的显著性检测图像。 + +# 数据集 + +使用的数据集:[显著性检测数据集](<https://blog.csdn.net/studyeboy/article/details/102383922?ops_request_misc=%257B%2522request%255Fid%2522%253A%2522163031601316780274127035%2522%252C%2522scm%2522%253A%252220140713.130102334.pc%255Fall.%2522%257D&request_id=163031601316780274127035&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2~all~first_rank_ecpm_v1~hot_rank-5-102383922.first_rank_v2_pc_rank_v29&utm_term=DUTS-TE%E6%95%B0%E6%8D%AE%E9%9B%86%E4%B8%8B%E8%BD%BD&spm=1018.2226.3001.4187>) + +- 数据集大小: + - 训练集:DUTS-TR数据集,210MB,共10533张最大边长为400像素的彩色图像,均从ImageNet DET训练/验证集中收集。 + - 测试集:SOD数据集,21.2MB,共300张最大边长为400像素的彩色图像,此数据集是基于Berkeley Segmentation Dataset(BSD)的显著对象边界的集合。 + - 测试集:ECSSD数据集,64.6MB,共1000张最大边长为400像素的彩色图像。 + - 测试集:PASCAL-S数据集,175 MB,共10个类、850张32*32彩色图像。该数据集与其他显著物体检测数据集区别较大, 没有非常明显的显著物体, 并主要根据人类的眼动进行标注数据集, 因此该数据集难度较大。 + - 测试集:DUTS-OMRON数据集,107 MB,共5168张最大边长为400像素的彩色图像。数据集中具有一个或多个显著对象和相对复杂的背景,具有眼睛固定、边界框和像素方面的大规模真实标注的数据集。 + - 测试集:HKU-IS数据集,893MB,共4447张最大边长为400像素的彩色图像。数据集中每张图像至少满足以下的3个标准之一:1)含有多个分散的显著物体; 2)至少有1个显著物体在图像边界; 3)显著物体与背景表观相似。 +- 数据格式:二进制文件 + - 注:数据将在src/dataset.py中处理。 + +# 环境要求 + +- 硬件(Ascend/GPU/CPU) + - 使用Ascend/GPU/CPU处理器来搭建硬件环境。 +- 框架 + - [MindSpore](https://www.mindspore.cn/install/en) +- 如需查看详情,请参见如下资源: + - [MindSpore教程](https://www.mindspore.cn/tutorials/zh-CN/master/index.html) + - [MindSpore Python API](https://www.mindspore.cn/docs/api/zh-CN/master/index.html) + +# 快速入门 + +通过官方网站安装MindSpore后,您可以按照如下步骤进行训练和评估: + +- Ascend处理器环境运行 + +```shell +# 数据集进行裁剪 +python data_crop.py --data_name=[DATA_NAME] --data_root=[DATA_ROOT] --output_path=[OUTPUT_PATH] + +# 运行训练示例 +bash run_standalone_train.sh + +# 运行分布式训练示例 +bash run_distribute_train.sh 8 [RANK_TABLE_FILE] + +# 运行评估示例 +bash run_eval.sh +``` + +训练集路径在default_config.yaml中的data项设置 + +# 脚本说明 + +## 脚本及样例代码 + +```bash +├── model_zoo + ├── EGNet + ├── README.md # EGNet相关说明 + ├── model_utils # config,modelarts等配置脚本文件夹 + │ ├──config.py # 解析参数配置文件 + ├── scripts + │ ├──run_train.sh # 启动Ascend单机训练(单卡) + │ ├──run_distribute_train.sh # 启动Ascend分布式训练(8卡) + │ ├──run_eval.sh # 启动Ascend评估 + ├── src + │ ├──dataset.py # 加载数据集 + │ ├──egnet.py # EGNet的网络结构 + │ ├──vgg.py # vgg的网络结构 + │ ├──resnet.py # resnet的网络结构 + │ ├──sal_edge_loss.py # 损失定义 + │ ├──train_forward_backward.py # 前向传播和反向传播定义 + ├── sal2edge.py # 预处理,把显著图像转化为边缘图像 + ├── data_crop.py # 数据裁剪 + ├── train.py # 训练脚本 + ├── eval.py # 评估脚本 + ├── export.py # 模型导出脚本 + ├── default_config.yaml # 参数配置文件 +``` + +## 脚本参数 + +在config.py中可以同时配置训练参数和评估参数。 + +- 配置EGNet和DUTS-TR数据集。 + +```text +dataset_name: "DUTS-TR" # 数据集名称 +name: "egnet" # 网络名称 +pre_trained: Ture # 是否基于预训练模型训练 +lr_init: 5e-5(resnet) or 2e-5(vgg) # 初始学习率 +batch_size: 10 # 训练批次大小 +epoch_size: 30 # 总计训练epoch数 +momentum: 0.1 # 动量 +weight_decay:5e-4 # 权重衰减值 +image_height: 200 # 输入到模型的图像高度 +image_width: 200 # 输入到模型的图像宽度 +train_data_path: "./data/DUTS-TR/" # 训练数据集的相对路径 +eval_data_path: "./data/SOD/" # 评估数据集的相对路径 +checkpoint_path: "./EGNet/run-nnet/models/" # checkpoint文件保存的相对路径 +``` + +更多配置细节请参考 src/config.py。 + +# 训练过程 + +## 训练 + +- 数据集进行裁剪: + +```bash +python data_crop.py --data_name=[DATA_NAME] --data_root=[DATA_ROOT] --output_path=[OUTPUT_PATH] +``` + +- Ascend处理器环境运行 + +```bash +python train.py --mode=train --base_model=vgg --vgg=[PRETRAINED_PATH] +python train.py --mode=train --base_model=resnet --resnet=[PRETRAINED_PATH] +``` + +- 线上modelarts训练 + +线上单卡训练需要配置如下参数 + +online_train_path(obs桶中训练集DUTS-TR的存储路径) + +```bash +├──DUTS-TR + ├──DUTS-TR-Image + ├──DUTS-TR-Mask + ├──train_pair.lst + ├──train_pair_edge.lst +``` + +online_pretrained_path(obs桶中预训练模型的存储路径) + +```bash +├──pretrained + ├──resnet_pretrained.ckpt + ├──vgg_pretrained.ckpt +``` + +base_model(选择的预训练模型(vgg or resnet)) + +train_online = True(设定为线上训练) + +上述python命令将在后台运行,您可以通过./EGNet/run-nnet/logs/log.txt文件查看结果。 + +训练结束后,您可在默认./EGNet/run-nnet/models/文件夹下找到检查点文件。 + +## 分布式训练 + +- Ascend处理器环境运行 + +```bash +bash run_distribute_train.sh 8 [RANK_TABLE_FILE] +``` + +线下运行分布式训练请参照[mindspore分布式并行训练基础样例(Ascend)](https://www.mindspore.cn/docs/programming_guide/zh-CN/master/distributed_training_ascend.html) + +- 线上modelarts分布式训练 + +线上训练需要的参数配置与单卡训练基本一致,只需要新增参数is_distributed = True + +上述shell脚本将在后台运行分布训练。您可以通过train/train.log文件查看结果。 + +# 评估过程 + +## 评估 + +- Ascend处理器环境运行 + +```bash +python eval.py --model=[MODEL_PATH] --sal_mode=[DATA_NAME] --test_fold=[TEST_DATA_PATH] --base_model=vgg +python eval.py --model=[MODEL_PATH] --sal_mode=[DATA_NAME] --test_fold=[TEST_DATA_PATH] --base_model=resnet +``` + +数据集文件结构 + +```bash +├──NAME + ├──ground_truth_mask + ├──images + ├──test.lst +``` + +# 导出过程 + +## 导出 + +在导出之前需要修改default_config.ymal配置文件.需要修改的配置项为ckpt_file. + +```shell +python export.py --ckpt_file=[CKPT_FILE] +``` + +# 模型描述 + +## 性能 + +### 评估性能 + +#### DUTS-TR上的EGNet + +| 参数 | Ascend | Ascend | +| -------------------------- | ----------------------------------------------------------- | ---------------------- | +| 模型版本 | EGNet(VGG) | EGNet(resnet) | +| 资源 | Ascend 910(单卡/8卡) | Ascend 910(单卡/8卡) | +| 上传日期 | 2021-12-25 | 2021-12-25 | +| MindSpore版本 | 1.3.0 | 1.3.0 | +| 数据集 | DUTS-TR | DUTS-TR | +| 训练参数 | epoch=30, steps=1050, batch_size = 10, lr=2e-5 | epoch=30, steps=1050, batch_size=10, lr=5e-5 | +| 优化器 | Adam | Adam | +| 损失函数 | Binary交叉熵 | Binary交叉熵 | +| 速度 | 单卡:593.460毫秒/步 ; 8卡:460.952毫秒/步 | 单卡:569.524毫秒/步; 8卡:466.667毫秒/步 | +| 总时长 | 单卡:5h3m ; 8卡: 4h2m | 单卡:4h59m ; 8卡:4h5m | +| 微调检查点 | 412M (.ckpt文件) | 426M (.ckpt文件) | +| 脚本 | [EGNnet脚本]() | [EGNet 脚本]() | + +### 推理性能 + +#### 显著性检测数据集上的EGNet + +| 参数 | Ascend | Ascend | +| ------------------- | --------------------------- | --------------------------- | +| 模型版本 | EGNet(VGG) | EGNet(resnet) | +| 资源 | Ascend 910 | Ascend 910 | +| 上传日期 | 2021-12-25 | 2021-12-25 | +| MindSpore 版本 | 1.3.0 | 1.3.0 | +| 数据集 | SOD, 300张图像 | SOD, 300张图像 | +| 评估指标(单卡) | MaxF:0.8659637 ; MAE:0.1540910 ; S:0.7317967 | MaxF:0.8763882 ; MAE:0.1453154 ; S:0.7388669 | +| 评估指标(多卡) | MaxF:0.8667928 ; MAE:0.1532886 ; S:0.7360025 | MaxF:0.8798361 ; MAE:0.1448086 ; S:0.74030272 | +| 数据集 | ECSSD, 1000张图像 | ECSSD, 1000张图像 | +| 评估指标(单卡) | MaxF:0.9365406 ; MAE:0.0744784 ; S:0.8639620 | MaxF:0.9477927 ; MAE:0.0649923 ; S:0.8765208 | +| 评估指标(多卡) | MaxF:0.9356243 ; MAE:0.0805953 ; S:0.8595030 | MaxF:0.9457578 ; MAE:0.0684581 ; S:0.8732929 | +| 数据集 | PASCAL-S, 850张图像 | PASCAL-S, 850张图像 | +| 评估指标(单卡) | MaxF:0.8777129 ; MAE:0.1188116 ; S:0.7653073 | MaxF:0.8861882 ; MAE:0.1061731 ; S:0.7792912 | +| 评估指标(多卡) | MaxF:0.8787268 ; MAE:0.1192975 ; S:0.7657838 | MaxF:0.8883396 ; MAE:0.1081997 ; S:0.7786236 | +| 数据集 | DUTS-OMRON, 5168张图像 | DUTS-OMRON, 5168张图像 | +| 评估指标(单卡) | MaxF:0.7821059 ; MAE:0.1424146 ; S:0.7529001 | MaxF:0.7999835 ; MAE:0.1330678 ; S:0.7671095 | +| 评估指标(多卡) | MaxF:0.7815770 ; MAE:0.1455649 ; S:0.7493499 | MaxF:0.7997979 ; MAE:0.1339806 ; S:0.7646356 | +| 数据集 | HKU-IS, 4447张图像 | HKU-IS, 4447张图像 | +| 评估指标(单卡) | MaxF:0.9193007 ; MAE:0.0732772 ; S:0.8674455 | MaxF:0.9299341 ; MAE:0.0631132 ; S:0.8817522 | +| 评估指标(多卡) | MaxF:0.9145629 ; MAE:0.0793372 ; S:0.8608878 | MaxF:0.9254014; MAE:0.0685441 ; S:0.8762386 | + +# ModelZoo主页 + +请浏览官网[主页](https://gitee.com/mindspore/models)。 diff --git a/research/cv/EGnet/data_crop.py b/research/cv/EGnet/data_crop.py new file mode 100644 index 0000000000000000000000000000000000000000..b9fd38c8255b9f74afc16d3575682be787b80f3e --- /dev/null +++ b/research/cv/EGnet/data_crop.py @@ -0,0 +1,92 @@ +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +"""Dataset crop""" + +import os +import argparse +from concurrent import futures +import cv2 +import pandas as pd + +def crop_one(input_img_path, output_img_path): + """ + center crop one image + """ + + image = cv2.imread(input_img_path) + img_shape = image.shape + img_height = img_shape[0] + img_width = img_shape[1] + + if (img_width < 200) or (img_height < 200): + os.remove(input_img_path) + else: + cropped = image[(img_height - 200) // 2:(img_height + 200) // 2, (img_width - 200) // 2:(img_width + 200) // 2] + cv2.imwrite(output_img_path, cropped) + + +def crop(data_root, output_path): + """ + crop all images with thread pool + """ + if not os.path.exists(data_root): + raise FileNotFoundError("data root not exist") + if not os.path.exists(output_path): + os.makedirs(output_path) + + image_filenames = [(os.path.join(data_root, x), os.path.join(output_path, x)) + for x in os.listdir(data_root)] + file_list = [] + for file in image_filenames: + file_list.append(file) + print(len(file_list)) + with futures.ThreadPoolExecutor(max_workers=os.cpu_count()) as tp: + all_task = [tp.submit(crop_one, file[0], file[1]) for file in file_list] + futures.wait(all_task) + print("all done!") + +def save(data_root, output_path): + file_list = [] + for path in os.listdir(data_root): + _, filename = os.path.split(path) + file_list.append(filename) + df = pd.DataFrame(file_list, columns=["one"]) + df.to_csv(os.path.join(output_path, "test.lst"), columns=["one"], index=False, header=False) + +if __name__ == "__main__": + parser = argparse.ArgumentParser(description="Crop Image to 200*200") + parser.add_argument("--data_name", type=str, help="dataset name", required=True, + choices=["ECSSD", "SOD", "DUT-OMRON", "PASCAL-S", "HKU-IS", "DUTS-TE", "DUTS-TR"]) + parser.add_argument("--data_root", type=str, help="root of images", required=True, + default="/home/data") + parser.add_argument("--output_path", type=str, help="output path of cropped images", required=True, + default="/home/data") + args = parser.parse_known_args()[0] + if args.data_name == "DUTS-TE": + Mask = "DUTS-TE-MASK" + Image = "DUTS-TE-Image" + elif args.data_name == "DUTS-TR": + Mask = "DUTS-TR-Mask" + Image = "DUTS-TR-Image" + else: + Mask = "groud_truth_mask" + Image = "images" + crop(os.path.join(args.data_root, args.data_name, Mask), + os.path.join(args.output_path, args.data_name, Mask)) + crop(os.path.join(args.data_root, args.data_name, Image), + os.path.join(args.output_path, args.data_name, Image)) + save(os.path.join(args.output_path, args.data_name, Image), + os.path.join(args.output_path, args.data_name)) diff --git a/research/cv/EGnet/default_config.yaml b/research/cv/EGnet/default_config.yaml new file mode 100644 index 0000000000000000000000000000000000000000..93b11163e6bf9a5b5949a0b6f30a6a7a91365a2b --- /dev/null +++ b/research/cv/EGnet/default_config.yaml @@ -0,0 +1,53 @@ +# ============================================================================== +# Hyper-parameters +n_color: 3 +device_target: "Ascend" + +# Dataset settings +train_path: "data/DUTS-TR" +test_path: "data" + +# Training settings +train_online: False +online_train_path: "" +online_pretrained_path: "" +train_url: "" +is_distributed: False +base_model: "vgg" # ['resnet','vgg'] +pretrained_url: "pretrained" +vgg: "pretrained/vgg_pretrained.ckpt" +resnet: "pretrained/resnet_pretrained.ckpt" +epoch: 30 +batch_size: 1 +num_thread: 4 +save_fold: "EGNet" +train_save_name: "nnet" +epoch_save: 1 +epoch_show: 1 +pre_trained: "" +start_epoch: 1 +n_ave_grad: 10 +show_every: 10 +save_tmp: 200 +loss_scale: 1 + +# Testing settings +eval_online: False +online_eval_path: "" +online_ckpt_path: "" +model: "EGNet/run-nnet/models/final_vgg_bone.ckpt" +test_fold: "result" +test_save_name: "EGNet_" +test_mode: 1 +sal_mode: "t" # ['e','t','d','h','p','s'] +test_batch_size: 1 + +# Misc +mode: "train" # ['train','test'] +visdom: False + +# Export settings +file_name: "EGNet" +file_format: "MINDIR" +ckpt_file: "" + diff --git a/research/cv/EGnet/eval.py b/research/cv/EGnet/eval.py new file mode 100644 index 0000000000000000000000000000000000000000..ed29b6b147fa09c910ac6d7ff348751207ec7e13 --- /dev/null +++ b/research/cv/EGnet/eval.py @@ -0,0 +1,311 @@ +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +"""Eval""" + +import os +import time +import cv2 +import numpy as np +from mindspore import DatasetHelper, load_checkpoint, context +from mindspore.nn import Sigmoid + +from model_utils.config import base_config +from src.dataset import create_dataset +from src.egnet import build_model + +def main(config): + if config.eval_online: + import moxing as mox + mox.file.shift('os', 'mox') + if config.sal_mode == "t": + Evalname = "DUTS-TE" + elif config.sal_mode == "s": + Evalname = "SOD" + elif config.sal_mode == "h": + Evalname = "HKU-IS" + elif config.sal_mode == "d": + Evalname = "DUT-OMRON" + elif config.sal_mode == "p": + Evalname = "PASCAL-S" + elif config.sal_mode == "e": + Evalname = "ECSSD" + config.test_path = os.path.join("/cache", config.test_path) + local_data_url = os.path.join(config.test_path, "%s"%(Evalname)) + local_list_eval = os.path.join(config.test_path, "%s/test.lst"%(Evalname)) + mox.file.copy_parallel(config.online_eval_path, local_data_url) + mox.file.copy_parallel(os.path.join(config.online_eval_path, "test.lst"), local_list_eval) + ckpt_path = os.path.join("/cache", os.path.dirname(config.model)) + mox.file.copy_parallel(config.online_ckpt_path, ckpt_path) + mox.file.copy_parallel(os.path.join(config.online_ckpt_path, + os.path.basename(config.model)), + os.path.join("/cache", config.model)) + config.model = os.path.join("/cache", config.model) + context.set_context(mode=context.GRAPH_MODE, device_target=config.device_target) + test_dataset, dataset = create_dataset(config.test_batch_size, mode="test", num_thread=config.num_thread, + test_mode=config.test_mode, sal_mode=config.sal_mode, + test_path=config.test_path, test_fold=config.test_fold) + evaluate(test_dataset, config, dataset) + + +class Metric: + """ + for metric + """ + def __init__(self): + self.epsilon = 1e-4 + self.beta = 0.3 + self.thresholds = 256 + self.mae = 0 + self.max_f = 0 + self.precision = np.zeros(self.thresholds) + self.recall = np.zeros(self.thresholds) + self.q = 0 + self.cnt = 0 + + def update(self, pred, gt): + assert pred.shape == gt.shape + pred = pred.astype(np.float32) + gt = gt.astype(np.float32) + norm_pred = pred / 255.0 + norm_gt = gt / 255.0 + self.compute_mae(norm_pred, norm_gt) + self.compute_precision_and_recall(pred, gt) + self.compute_s_measure(norm_pred, norm_gt) + self.cnt += 1 + + def print_result(self): + f_measure = (1 + self.beta) * (self.precision * self.recall) / (self.beta * self.precision + self.recall) + argmax = np.argmax(f_measure) + print("Max F-measure:", f_measure[argmax] / self.cnt) + print("Precision: ", self.precision[argmax] / self.cnt) + print("Recall: ", self.recall[argmax] / self.cnt) + print("MAE: ", self.mae / self.cnt) + print("S-measure: ", self.q / self.cnt) + + def compute_precision_and_recall(self, pred, gt): + """ + compute the precision and recall for pred + """ + for th in range(self.thresholds): + a = np.zeros_like(pred).astype(np.int32) + b = np.zeros_like(pred).astype(np.int32) + a[pred > th] = 1 + a[pred <= th] = 0 + b[gt > th / self.thresholds] = 1 + b[gt <= th / self.thresholds] = 0 + ab = np.sum(np.bitwise_and(a, b)) + a_sum = np.sum(a) + b_sum = np.sum(b) + self.precision[th] += (ab + self.epsilon) / (a_sum + self.epsilon) + self.recall[th] += (ab + self.epsilon) / (b_sum + self.epsilon) + + def compute_mae(self, pred, gt): + """ + compute mean average error + """ + self.mae += np.abs(pred - gt).mean() + + def compute_s_measure(self, pred, gt): + """ + compute s measure score + """ + + alpha = 0.5 + y = gt.mean() + if y == 0: + x = pred.mean() + q = 1.0 - x + elif y == 1: + x = pred.mean() + q = x + else: + gt[gt >= 0.5] = 1 + gt[gt < 0.5] = 0 + q = alpha * self._s_object(pred, gt) + (1 - alpha) * self._s_region(pred, gt) + if q < 0 or np.isnan(q): + q = 0 + self.q += q + + def _s_object(self, pred, gt): + """ + score of object + """ + fg = np.where(gt == 0, np.zeros_like(pred), pred) + bg = np.where(gt == 1, np.zeros_like(pred), 1 - pred) + o_fg = self._object(fg, gt) + o_bg = self._object(bg, 1 - gt) + u = gt.mean() + q = u * o_fg + (1 - u) * o_bg + return q + + @staticmethod + def _object(pred, gt): + """ + compute score of object + """ + temp = pred[gt == 1] + if temp.size == 0: + return 0 + x = temp.mean() + sigma_x = temp.std() + score = 2.0 * x / (x * x + 1.0 + sigma_x + 1e-20) + return score + + def _s_region(self, pred, gt): + """ + compute score of region + """ + x, y = self._centroid(gt) + gt1, gt2, gt3, gt4, w1, w2, w3, w4 = self._divide_gt(gt, x, y) + p1, p2, p3, p4 = self._divide_prediction(pred, x, y) + q1 = self._ssim(p1, gt1) + q2 = self._ssim(p2, gt2) + q3 = self._ssim(p3, gt3) + q4 = self._ssim(p4, gt4) + q = w1 * q1 + w2 * q2 + w3 * q3 + w4 * q4 + return q + + @staticmethod + def _divide_gt(gt, x, y): + """ + divide ground truth image + """ + if not isinstance(x, np.int64): + x = x[0][0] + if not isinstance(y, np.int64): + y = y[0][0] + h, w = gt.shape[-2:] + area = h * w + gt = gt.reshape(h, w) + lt = gt[:y, :x] + rt = gt[:y, x:w] + lb = gt[y:h, :x] + rb = gt[y:h, x:w] + x = x.astype(np.float32) + y = y.astype(np.float32) + w1 = x * y / area + w2 = (w - x) * y / area + w3 = x * (h - y) / area + w4 = 1 - w1 - w2 - w3 + return lt, rt, lb, rb, w1, w2, w3, w4 + + @staticmethod + def _divide_prediction(pred, x, y): + """ + divide predict image + """ + if not isinstance(x, np.int64): + x = x[0][0] + if not isinstance(y, np.int64): + y = y[0][0] + h, w = pred.shape[-2:] + pred = pred.reshape(h, w) + lt = pred[:y, :x] + rt = pred[:y, x:w] + lb = pred[y:h, :x] + rb = pred[y:h, x:w] + return lt, rt, lb, rb + + @staticmethod + def _ssim(pred, gt): + """ + structural similarity + """ + gt = gt.astype(np.float32) + h, w = pred.shape[-2:] + n = h * w + x = pred.mean() + y = gt.mean() + sigma_x2 = ((pred - x) * (pred - x)).sum() / (n - 1 + 1e-20) + sigma_y2 = ((gt - y) * (gt - y)).sum() / (n - 1 + 1e-20) + sigma_xy = ((pred - x) * (gt - y)).sum() / (n - 1 + 1e-20) + + alpha = 4 * x * y * sigma_xy + beta = (x * x + y * y) * (sigma_x2 + sigma_y2) + + if alpha != 0: + q = alpha / (beta + 1e-20) + elif alpha == 0 and beta == 0: + q = 1.0 + else: + q = 0 + return q + + @staticmethod + def _centroid(gt): + """ + compute center of ground truth image + """ + rows, cols = gt.shape[-2:] + gt = gt.reshape(rows, cols) + if gt.sum() == 0: + x = np.eye(1) * round(cols / 2) + y = np.eye(1) * round(rows / 2) + else: + total = gt.sum() + + i = np.arange(0, cols).astype(np.float32) + j = np.arange(0, rows).astype(np.float32) + x = np.round((gt.sum(axis=0) * i).sum() / total) + y = np.round((gt.sum(axis=1) * j).sum() / total) + return x.astype(np.int64), y.astype(np.int64) + + +def evaluate(test_ds, config, dataset): + """build network""" + model = build_model(config.base_model) + # Load pretrained model + load_checkpoint(config.model, net=model) + print(f"Loading pre-trained model from {config.model}...") + sigmoid = Sigmoid() + # test phase + test_save_name = config.test_save_name + config.base_model + test_fold = config.test_fold + if not os.path.exists(os.path.join(test_fold, test_save_name)): + os.makedirs(os.path.join(test_fold, test_save_name), exist_ok=True) + dataset_helper = DatasetHelper(test_ds, epoch_num=1, dataset_sink_mode=False) + time_t = 0.0 + + metric = Metric() + has_label = False + for i, data_batch in enumerate(dataset_helper): + sal_image, sal_label, name_index = data_batch[0], data_batch[1], data_batch[2] + name = dataset.image_list[name_index[0].asnumpy().astype(np.int32)] + save_file = os.path.join(test_fold, test_save_name, name[:-4] + "_sal.png") + directory, _ = os.path.split(save_file) + if not os.path.exists(directory): + os.makedirs(directory, exist_ok=True) + time_start = time.time() + _, _, up_sal_final = model(sal_image) + time_end = time.time() + time_t += time_end - time_start + pred = sigmoid(up_sal_final[-1]).asnumpy().squeeze() * 255 + + if sal_label is not None: + has_label = True + sal_label = sal_label.asnumpy().squeeze() * 255 + pred = np.round(pred).astype(np.uint8) + metric.update(pred, sal_label) + cv2.imwrite(save_file, pred) + print(f"process image index {i} done") + + print(f"--- {time_t} seconds ---") + if has_label: + metric.print_result() + + +if __name__ == "__main__": + main(base_config) diff --git a/research/cv/EGnet/export.py b/research/cv/EGnet/export.py new file mode 100644 index 0000000000000000000000000000000000000000..6ca9a4d54dda1dfa2b80872248ddff067471d2f4 --- /dev/null +++ b/research/cv/EGnet/export.py @@ -0,0 +1,46 @@ +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +""" +##############export checkpoint file into air, mindir models################# +python export.py +""" +import numpy as np + +import mindspore as ms +from mindspore import Tensor, load_checkpoint, load_param_into_net, export, context + +from src.egnet import build_model +from model_utils.config import base_config + + +def run_export(config): + """ + run export operation + """ + context.set_context(mode=context.GRAPH_MODE, device_target=config.device_target) + + net = build_model(config.base_model) + + assert config.ckpt_file is not None, "config.ckpt_file is None." + param_dict = load_checkpoint(config.ckpt_file) + load_param_into_net(net, param_dict) + + input_arr = Tensor(np.ones([config.batch_size, 3, 200, 200]), ms.float32) + export(net, input_arr, file_name=config.file_name, file_format=config.file_format) + + +if __name__ == "__main__": + run_export(base_config) diff --git a/research/cv/EGnet/model_utils/config.py b/research/cv/EGnet/model_utils/config.py new file mode 100644 index 0000000000000000000000000000000000000000..9bc46ee9a02a65e9465b3032a8efee0c2bfec65b --- /dev/null +++ b/research/cv/EGnet/model_utils/config.py @@ -0,0 +1,130 @@ +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +"""Parse arguments""" + +import os +import ast +import argparse +from pprint import pprint, pformat +import yaml + + +class Config: + """ + Configuration namespace. Convert dictionary to members. + """ + + def __init__(self, cfg_dict): + for k, v in cfg_dict.items(): + if isinstance(v, (list, tuple)): + setattr(self, k, [Config(x) if isinstance(x, dict) else x for x in v]) + else: + setattr(self, k, Config(v) if isinstance(v, dict) else v) + + def __str__(self): + return pformat(self.__dict__) + + def __repr__(self): + return self.__str__() + + +def parse_cli_to_yaml(parser, cfg, helper=None, choices=None, cfg_path="default_config.yaml"): + """ + Parse command line arguments to the configuration according to the default yaml. + + Args: + parser: Parent parser. + cfg: Base configuration. + helper: Helper description. + cfg_path: Path to the default yaml config. + """ + parser = argparse.ArgumentParser(description="[REPLACE THIS at config.py]", + parents=[parser]) + helper = {} if helper is None else helper + choices = {} if choices is None else choices + for item in cfg: + if not isinstance(cfg[item], list) and not isinstance(cfg[item], dict): + help_description = helper[item] if item in helper else "Please reference to {}".format(cfg_path) + choice = choices[item] if item in choices else None + if isinstance(cfg[item], bool): + parser.add_argument("--" + item, type=ast.literal_eval, default=cfg[item], choices=choice, + help=help_description) + else: + parser.add_argument("--" + item, type=type(cfg[item]), default=cfg[item], choices=choice, + help=help_description) + args = parser.parse_args() + return args + + +def parse_yaml(yaml_path): + """ + Parse the yaml config file. + + Args: + yaml_path: Path to the yaml config. + """ + with open(yaml_path, "r") as fin: + try: + cfgs = yaml.load_all(fin.read(), Loader=yaml.FullLoader) + cfgs = [x for x in cfgs] + if len(cfgs) == 1: + cfg_helper = {} + cfg = cfgs[0] + cfg_choices = {} + elif len(cfgs) == 2: + cfg, cfg_helper = cfgs + cfg_choices = {} + elif len(cfgs) == 3: + cfg, cfg_helper, cfg_choices = cfgs + else: + raise ValueError("At most 3 docs (config, description for help, choices) are supported in config yaml") + print(cfg_helper) + except: + raise ValueError("Failed to parse yaml") + return cfg, cfg_helper, cfg_choices + + +def merge(args, cfg): + """ + Merge the base config from yaml file and command line arguments. + + Args: + args: Command line arguments. + cfg: Base configuration. + """ + args_var = vars(args) + for item in args_var: + cfg[item] = args_var[item] + return cfg + + +def get_config(): + """ + Get Config according to the yaml file and cli arguments. + """ + parser = argparse.ArgumentParser(description="default name", add_help=False) + current_dir = os.path.dirname(os.path.abspath(__file__)) + parser.add_argument("--config_path", type=str, default=os.path.join(current_dir, "../default_config.yaml"), + help="Config file path") + path_args, _ = parser.parse_known_args() + default, helper, choices = parse_yaml(path_args.config_path) + args = parse_cli_to_yaml(parser=parser, cfg=default, helper=helper, choices=choices, cfg_path=path_args.config_path) + final_config = merge(args, default) + pprint(final_config) + print("Please check the above information for the configurations", flush=True) + return Config(final_config) + +base_config = get_config() diff --git a/research/cv/EGnet/sal2edge.py b/research/cv/EGnet/sal2edge.py new file mode 100644 index 0000000000000000000000000000000000000000..d50ad86d785989612ab2b639f82e6294f41981b3 --- /dev/null +++ b/research/cv/EGnet/sal2edge.py @@ -0,0 +1,74 @@ +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +"""Extract edge""" + +import os +import argparse +from concurrent import futures +import cv2 +import numpy as np + + +def sal2edge_one(image_file, output_file): + """ + process one image + """ + if not os.path.exists(image_file): + return + image = cv2.imread(image_file, cv2.IMREAD_UNCHANGED) + b_image = image > 128 + b_image = b_image.astype(np.float64) + dx, dy = np.gradient(b_image) + temp_edge = dx * dx + dy * dy + temp_edge[temp_edge != 0] = 255 + bound = temp_edge.astype(np.uint8) + cv2.imwrite(output_file, bound) + + +def sal2edge(data_root, output_path, image_list_file): + """ + extract edge from salience image (use thread pool) + """ + if not os.path.exists(data_root): + print("data root not exist", data_root) + return + if not os.path.exists(output_path): + os.makedirs(output_path) + if not os.path.exists(image_list_file): + print("image list file not exist", image_list_file) + return + image_list = np.loadtxt(image_list_file, str) + file_list = [] + ext = image_list[0][1][-4:] + for image in image_list: + file_list.append(image[1][:-4]) + with futures.ThreadPoolExecutor(max_workers=os.cpu_count()) as tp: + all_task = [] + for file in file_list: + img_path = os.path.join(data_root, file + ext) + result_path = os.path.join(output_path, file + "_edge" + ext) + all_task.append(tp.submit(sal2edge_one, img_path, result_path)) + futures.wait(all_task) + print("all done!") + + +if __name__ == "__main__": + parser = argparse.ArgumentParser(description="Covert Salient Image to Edge Image") + parser.add_argument("--data_root", type=str, help="root of salient images", required=True) + parser.add_argument("--output_path", type=str, help="output path of edge images", required=True) + parser.add_argument("--image_list_file", type=str, help="image list of salient images", required=True) + args = parser.parse_known_args()[0] + sal2edge(args.data_root, args.output_path, args.image_list_file) diff --git a/research/cv/EGnet/scripts/run_distribute_train.sh b/research/cv/EGnet/scripts/run_distribute_train.sh new file mode 100644 index 0000000000000000000000000000000000000000..86d246e5cc5a54a193af389ea404f6ac564f5aaa --- /dev/null +++ b/research/cv/EGnet/scripts/run_distribute_train.sh @@ -0,0 +1,46 @@ +#!/bin/bash + +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + + +set -e +RANK_SIZE=$1 +RANK_TABLE_FILE=$2 + +if [ ! -f ${RANK_TABLE_FILE} ]; then +echo "file not exists" +exit +fi +export RANK_TABLE_FILE=${RANK_TABLE_FILE} +export RANK_SIZE=${RANK_SIZE} + +for((i=0;i<${RANK_SIZE};i++)) +do + rm -rf device$i + mkdir device$i + cp -r ../src ./device$i + cp -r ../model_utils ./device$i + cp -r ../data ./device$i + cp ../sal2edge.py ./device$i + cp ../train.py ./device$i + cp ../default_config.yaml ./device$i + cd ./device$i + export DEVICE_ID=$i + export RANK_ID=$i + python -u ./train.py --is_distributed True > train.log 2>&1 & + cd ../ +done + diff --git a/research/cv/EGnet/scripts/run_eval.sh b/research/cv/EGnet/scripts/run_eval.sh new file mode 100644 index 0000000000000000000000000000000000000000..0ef0f42f12ea3db80ddf47dab8d4369f50c52304 --- /dev/null +++ b/research/cv/EGnet/scripts/run_eval.sh @@ -0,0 +1,40 @@ +#!/bin/bash +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +cd .. +python eval.py --test_fold='./result/ECSSD' \ + --model='./EGNet/run-nnet/models/final_resnet_bone.ckpt' \ + --sal_mode=e \ + --base_model=resnet >test_e.log +python eval.py --test_fold='./result/PASCAL-S' \ + --model='./EGNet/run-nnet/models/final_resnet_bone.ckpt' \ + --sal_mode=p \ + --base_model=resnet >test_p.log +python eval.py --test_fold='./result/DUT-OMRON' \ + --model='./EGNet/run-nnet/models/final_resnet_bone.ckpt' \ + --sal_mode=d \ + --base_model=resnet >test_d.log +python eval.py --test_fold='./result/HKU-IS' \ + --model='./EGNet/run-nnet/models/final_resnet_bone.ckpt' \ + --sal_mode=h \ + --base_model=resnet >test_h.log +python eval.py --test_fold='./result/SOD' \ + --model='./EGNet/run-nnet/models/final_resnet_bone.ckpt' \ + --sal_mode=s \ + --base_model=resnet >test_s.log +python eval.py --test_fold='./result/DUTS-TE' \ + --model='./EGNet/run-nnet/models/final_resnet_bone.ckpt' \ + --sal_mode=t \ + --base_model=resnet >test_t.log \ No newline at end of file diff --git a/research/cv/EGnet/scripts/run_standalone_train.sh b/research/cv/EGnet/scripts/run_standalone_train.sh new file mode 100644 index 0000000000000000000000000000000000000000..2f9a83c6cd42761fd87fe5abbd787de6e722270a --- /dev/null +++ b/research/cv/EGnet/scripts/run_standalone_train.sh @@ -0,0 +1,16 @@ +#!/bin/bash +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +python train.py --base_model=vgg >train.log diff --git a/research/cv/EGnet/src/dataset.py b/research/cv/EGnet/src/dataset.py new file mode 100644 index 0000000000000000000000000000000000000000..49eb0a7a44c6288c0d2127b675852aae5d6d9aff --- /dev/null +++ b/research/cv/EGnet/src/dataset.py @@ -0,0 +1,217 @@ +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +"""Create train or eval dataset.""" + +import os +import random +from PIL import Image +import cv2 +import numpy as np + +from model_utils.config import base_config +from mindspore.dataset import GeneratorDataset +from mindspore.communication.management import get_rank, get_group_size +if base_config.train_online: + import moxing as mox + mox.file.shift('os', 'mox') + +class ImageDataTrain: + """ + training dataset + """ + def __init__(self, train_path=""): + self.sal_root = train_path + self.sal_source = os.path.join(train_path, "train_pair_edge.lst") + with open(self.sal_source, "r") as f: + self.sal_list = [x.strip() for x in f.readlines()] + self.sal_num = len(self.sal_list) + + def __getitem__(self, item): + sal_image = load_image(os.path.join(self.sal_root, self.sal_list[item % self.sal_num].split()[0])) + sal_label = load_sal_label(os.path.join(self.sal_root, self.sal_list[item % self.sal_num].split()[1])) + sal_edge = load_edge_label(os.path.join(self.sal_root, self.sal_list[item % self.sal_num].split()[2])) + sal_image, sal_label, sal_edge = cv_random_flip(sal_image, sal_label, sal_edge) + return sal_image, sal_label, sal_edge + + def __len__(self): + return self.sal_num + + +class ImageDataTest: + """ + test dataset + """ + def __init__(self, test_mode=1, sal_mode="e", test_path="", test_fold=""): + if test_mode == 1: + if sal_mode == "e": + self.image_root = test_path + "/ECSSD/images/" + self.image_source = test_path + "/ECSSD/test.lst" + self.test_fold = test_fold + "/ECSSD/" + self.test_root = test_path + "/ECSSD/ground_truth_mask/" + elif sal_mode == "p": + self.image_root = test_path + "/PASCAL-S/images/" + self.image_source = test_path + "/PASCAL-S/test.lst" + self.test_fold = test_fold + "/PASCAL-S/" + self.test_root = test_path + "/PASCAL-S/ground_truth_mask/" + elif sal_mode == "d": + self.image_root = test_path + "/DUT-OMRON/images/" + self.image_source = test_path + "/DUT-OMRON/test.lst" + self.test_fold = test_fold + "/DUT-OMRON/" + self.test_root = test_path + "/DUT-OMRON/ground_truth_mask/" + elif sal_mode == "h": + self.image_root = test_path + "/HKU-IS/images/" + self.image_source = test_path + "/HKU-IS/test.lst" + self.test_fold = test_fold + "/HKU-IS/" + self.test_root = test_path + "/HKU-IS/ground_truth_mask/" + elif sal_mode == "s": + self.image_root = test_path + "/SOD/images/" + self.image_source = test_path + "/SOD/test.lst" + self.test_fold = test_fold + "/SOD/" + self.test_root = test_path + "/SOD/ground_truth_mask/" + elif sal_mode == "t": + self.image_root = test_path + "/DUTS-TE/DUTS-TE-Image" + self.image_source = test_path + "/DUTS-TE/test.lst" + self.test_fold = test_fold + "/DUTS-TE/" + self.test_root = test_path + "/DUTS-TE/DUTS-TE-Mask/" + else: + raise ValueError("Unknown sal_mode") + else: + raise ValueError("Unknown sal_mode") + + with open(self.image_source, "r") as f: + self.image_list = [x.strip() for x in f.readlines()] + self.image_num = len(self.image_list) + + def __getitem__(self, item): + image, _ = load_image_test(os.path.join(self.image_root, self.image_list[item])) + label = load_sal_label(os.path.join(self.test_root, self.image_list[item][0:-4]+".png")) + return image, label, item % self.image_num + + def save_folder(self): + return self.test_fold + + def __len__(self): + return self.image_num + + +# get the dataloader (Note: without data augmentation, except saliency with random flip) +def create_dataset(batch_size, mode="train", num_thread=1, test_mode=1, sal_mode="e", train_path="", test_path="", + test_fold="", is_distributed=False): + """ + create dataset + """ + shuffle = False + drop_remainder = False + + if mode == "train": + shuffle = True + drop_remainder = True + dataset = ImageDataTrain(train_path=train_path) + else: + dataset = ImageDataTest(test_mode=test_mode, sal_mode=sal_mode, test_path=test_path, test_fold=test_fold) + + if is_distributed: + # get rank_id and rank_size + rank_id = get_rank() + rank_size = get_group_size() + ds = GeneratorDataset(dataset, column_names=["sal_image", "sal_label", "sal_edge_or_index"], + shuffle=shuffle, num_parallel_workers=num_thread, num_shards=rank_size, shard_id=rank_id) + else: + ds = GeneratorDataset(dataset, column_names=["sal_image", "sal_label", "sal_edge_or_index"], + shuffle=shuffle, num_parallel_workers=num_thread) + return ds.batch(batch_size, drop_remainder=drop_remainder, num_parallel_workers=num_thread), dataset + + +def save_img(img, path): + range_ = np.max(img) - np.min(img) + img = (img - np.min(img)) / range_ + img = img * 255 + 0.5 + img = img.astype(np.uint8).squeeze() + Image.fromarray(img).save(path) + + +def load_image(pah): + if not os.path.exists(pah): + print("File Not Exists") + print(pah) + im = cv2.imread(pah) + in_ = np.array(im, dtype=np.float32) + in_ -= np.array((104.00699, 116.66877, 122.67892)) + in_ = in_.transpose((2, 0, 1)) + return in_ + + +def load_image_test(pah): + """ + load test image + """ + pah = pah.split(".")[0] + if "HKU-IS" in pah: + pah = pah + ".png" + else: + pah = pah + ".jpg" + print("--------", pah) + if not os.path.exists(pah): + print("File Not Exists") + im = cv2.imread(pah) + in_ = np.array(im, dtype=np.float32) + im_size = tuple(in_.shape[:2]) + in_ -= np.array((104.00699, 116.66877, 122.67892)) + in_ = in_.transpose((2, 0, 1)) + return in_, im_size + + +def load_edge_label(pah): + """ + pixels > 0.5 -> 1 + Load label image as 1 x height x width integer array of label indices. + The leading singleton dimension is required by the loss. + """ + if not os.path.exists(pah): + print("File Not Exists") + im = Image.open(pah) + label = np.array(im, dtype=np.float32) + if len(label.shape) == 3: + label = label[:, :, 0] + label = label / 255. + label[np.where(label > 0.5)] = 1. + label = label[np.newaxis, ...] + return label + + +def load_sal_label(pah): + """ + Load label image as 1 x height x width integer array of label indices. + The leading singleton dimension is required by the loss. + """ + if not os.path.exists(pah): + print("File Not Exists") + im = Image.open(pah) + label = np.array(im, dtype=np.float32) + if len(label.shape) == 3: + label = label[:, :, 0] + label = label / 255. + label = label[np.newaxis, ...] + return label + + +def cv_random_flip(img, label, edge): + flip_flag = random.randint(0, 1) + if flip_flag == 1: + img = img[:, :, ::-1].copy() + label = label[:, :, ::-1].copy() + edge = edge[:, :, ::-1].copy() + return img, label, edge diff --git a/research/cv/EGnet/src/egnet.py b/research/cv/EGnet/src/egnet.py new file mode 100644 index 0000000000000000000000000000000000000000..524d840c7395205c1d99e34abbc5cc2e1b43e5c0 --- /dev/null +++ b/research/cv/EGnet/src/egnet.py @@ -0,0 +1,294 @@ +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +"""EGNet model define""" + +import mindspore +import mindspore.common.initializer as init +from mindspore import nn, load_checkpoint + +from src.resnet import resnet50 +from src.vgg import Vgg16 +import numpy as np + +config_vgg = {"convert": [[128, 256, 512, 512, 512], [64, 128, 256, 512, 512]], + "merge1": [[128, 256, 128, 3, 1], [256, 512, 256, 3, 1], [512, 0, 512, 5, 2], [512, 0, 512, 5, 2], + [512, 0, 512, 7, 3]], "merge2": [[128], [256, 512, 512, 512]]} # no convert layer, no conv6 + +config_resnet = {"convert": [[64, 256, 512, 1024, 2048], [128, 256, 512, 512, 512]], + "deep_pool": [[512, 512, 256, 256, 128], [512, 256, 256, 128, 128], [False, True, True, True, False], + [True, True, True, True, False]], "score": 256, + "edgeinfo": [[16, 16, 16, 16], 128, [16, 8, 4, 2]], "edgeinfoc": [64, 128], + "block": [[512, [16]], [256, [16]], [256, [16]], [128, [16]]], "fuse": [[16, 16, 16, 16], True], + "fuse_ratio": [[16, 1], [8, 1], [4, 1], [2, 1]], + "merge1": [[128, 256, 128, 3, 1], [256, 512, 256, 3, 1], [512, 0, 512, 5, 2], [512, 0, 512, 5, 2], + [512, 0, 512, 7, 3]], "merge2": [[128], [256, 512, 512, 512]]} + + +class ConvertLayer(nn.Cell): + """ + Convert layer + """ + def __init__(self, list_k): + """ + initialize convert layer for resnet config + """ + super(ConvertLayer, self).__init__() + up0 = [] + for i in range(len(list_k[0])): + up0.append(nn.SequentialCell([nn.Conv2d(list_k[0][i], list_k[1][i], 1, 1, has_bias=False), nn.ReLU()])) + self.convert0 = nn.CellList(up0) + + def construct(self, list_x): + resl = [] + for i in range(len(list_x)): + resl.append(self.convert0[i](list_x[i])) + tuple_resl = () + for i in resl: + tuple_resl += (i,) + return tuple_resl + + +class MergeLayer1(nn.Cell): # list_k: [[64, 512, 64], [128, 512, 128], [256, 0, 256] ... ] + """ + merge layer 1 + """ + def __init__(self, list_k): + """ + initialize merge layer 1 + """ + super(MergeLayer1, self).__init__() + self.list_k = list_k + trans, up, score = [], [], [] + for ik in list_k: + if ik[1] > 0: + trans.append(nn.SequentialCell([nn.Conv2d(ik[1], ik[0], 1, 1, has_bias=False), nn.ReLU()])) + # Conv + up.append(nn.SequentialCell( + [nn.Conv2d(ik[0], ik[2], ik[3], 1, has_bias=True, pad_mode="pad", padding=ik[4]), nn.ReLU(), + nn.Conv2d(ik[2], ik[2], ik[3], 1, has_bias=True, pad_mode="pad", padding=ik[4]), nn.ReLU(), + nn.Conv2d(ik[2], ik[2], ik[3], 1, has_bias=True, pad_mode="pad", padding=ik[4]), nn.ReLU()])) + # Conv | + score.append(nn.Conv2d(ik[2], 1, 3, 1, pad_mode="pad", padding=1, has_bias=True)) + trans.append(nn.SequentialCell([nn.Conv2d(512, 128, 1, 1, has_bias=False), nn.ReLU()])) + self.trans, self.up, self.score = nn.CellList(trans), nn.CellList(up), nn.CellList(score) + self.relu = nn.ReLU() + self.resize_bilinear = nn.ResizeBilinear() + + def construct(self, list_x, x_size): + """ + forward + """ + up_edge, up_sal, edge_feature, sal_feature = [], [], [], [] + + num_f = len(list_x) + # Conv6-3 Conv + tmp = self.up[num_f - 1](list_x[num_f - 1]) + sal_feature.append(tmp) + u_tmp = tmp + + # layer6 -> layer0 + up_sal.append(self.resize_bilinear(self.score[num_f - 1](tmp), x_size, align_corners=True)) + + # layer5 layer4 layer3 + for j in range(2, num_f): + i = num_f - j + # different channel, use trans layer, or resize and add directly + if list_x[i].shape[1] < u_tmp.shape[1]: + u_tmp = list_x[i] + self.resize_bilinear((self.trans[i](u_tmp)), list_x[i].shape[2:], + align_corners=True) + else: + u_tmp = list_x[i] + self.resize_bilinear(u_tmp, list_x[i].shape[2:], align_corners=True) + # Conv + tmp = self.up[i](u_tmp) + u_tmp = tmp + sal_feature.append(tmp) + up_sal.append(self.resize_bilinear(self.score[i](tmp), x_size, align_corners=True)) + + u_tmp = list_x[0] + self.resize_bilinear(self.trans[-1](sal_feature[0]), list_x[0].shape[2:], + align_corners=True) + tmp = self.up[0](u_tmp) + # layer 2 + edge_feature.append(tmp) + up_edge.append(self.resize_bilinear(self.score[0](tmp), x_size, align_corners=True)) + tuple_up_edge, tuple_edge_feature, tuple_up_sal, tuple_sal_feature = (), (), (), () + for i in up_edge: + tuple_up_edge += (i,) + for i in edge_feature: + tuple_edge_feature += (i,) + for i in up_sal: + tuple_up_sal += (i,) + for i in sal_feature: + tuple_sal_feature += (i,) + + return tuple_up_edge, tuple_edge_feature, tuple_up_sal, tuple_sal_feature + + +class MergeLayer2(nn.Cell): + """ + merge layer 2 + """ + def __init__(self, list_k): + """ + initialize merge layer 2 + """ + super(MergeLayer2, self).__init__() + self.list_k = list_k + trans, up, score = [], [], [] + for i in list_k[0]: + tmp = [] + tmp_up = [] + tmp_score = [] + feature_k = [[3, 1], [5, 2], [5, 2], [7, 3]] + for idx, j in enumerate(list_k[1]): + tmp.append(nn.SequentialCell([nn.Conv2d(j, i, 1, 1, has_bias=False), nn.ReLU()])) + + tmp_up.append( + nn.SequentialCell([nn.Conv2d(i, i, feature_k[idx][0], 1, pad_mode="pad", padding=feature_k[idx][1], + has_bias=True), nn.ReLU(), + nn.Conv2d(i, i, feature_k[idx][0], 1, pad_mode="pad", padding=feature_k[idx][1], + has_bias=True), nn.ReLU(), + nn.Conv2d(i, i, feature_k[idx][0], 1, pad_mode="pad", padding=feature_k[idx][1], + has_bias=True), nn.ReLU()])) + tmp_score.append(nn.Conv2d(i, 1, 3, 1, pad_mode="pad", padding=1, has_bias=True)) + trans.append(nn.CellList(tmp)) + up.append(nn.CellList(tmp_up)) + score.append(nn.CellList(tmp_score)) + + self.trans, self.up, self.score = nn.CellList(trans), nn.CellList(up), nn.CellList(score) + + self.final_score = nn.SequentialCell([nn.Conv2d(list_k[0][0], list_k[0][0], 5, 1, has_bias=True), nn.ReLU(), + nn.Conv2d(list_k[0][0], 1, 3, 1, has_bias=True)]) + self.relu = nn.ReLU() + self.resize_bilinear = nn.ResizeBilinear() + + def construct(self, list_x, list_y, x_size): + """ + forward + """ + up_score, tmp_feature = [], [] + list_y = list_y[::-1] + + for i, i_x in enumerate(list_x): + for j, j_x in enumerate(list_y): + tmp = self.resize_bilinear(self.trans[i][j](j_x), i_x.shape[2:], align_corners=True) + i_x + tmp_f = self.up[i][j](tmp) + up_score.append(self.resize_bilinear(self.score[i][j](tmp_f), x_size, align_corners=True)) + tmp_feature.append(tmp_f) + + tmp_fea = tmp_feature[0] + for i_fea in range(len(tmp_feature) - 1): + tmp_fea = self.relu(tmp_fea + self.resize_bilinear(tmp_feature[i_fea + 1], tmp_feature[0].shape[2:], + align_corners=True)) + up_score.append(self.resize_bilinear(self.final_score(tmp_fea), x_size, align_corners=True)) + return up_score + + +class EGNet(nn.Cell): + """ + EGNet network + """ + def __init__(self, base_model_cfg, base, merge1_layers, merge2_layers): + """ initialize + """ + super(EGNet, self).__init__() + self.base_model_cfg = base_model_cfg + if self.base_model_cfg == "resnet": + self.convert = ConvertLayer(config_resnet["convert"]) + self.base = base + self.merge1 = merge1_layers + self.merge2 = merge2_layers + + def construct(self, x): + """ + forward + """ + x_size = x.shape[2:] + conv2merge = self.base(x) + if self.base_model_cfg == "resnet": + conv2merge = self.convert(conv2merge) + up_edge, edge_feature, up_sal, sal_feature = self.merge1(conv2merge, x_size) + up_sal_final = self.merge2(edge_feature, sal_feature, x_size) + tuple_up_edge, tuple_up_sal, tuple_up_sal_final = (), (), () + for i in up_edge: + tuple_up_edge += (i,) + for i in up_sal: + tuple_up_sal += (i,) + for i in up_sal_final: + tuple_up_sal_final += (i,) + + # only can work in dynamic graph + # return tuple(up_edge), tuple(up_sal), tuple(up_sal_final) + return tuple_up_edge, tuple_up_sal, tuple_up_sal_final + + def load_pretrained_model(self, model_file): + """ + load pretrained model + """ + load_checkpoint(model_file, net=self) + + +def extra_layer(base_model_cfg, base): + """ + extra layer for different base network + """ + if base_model_cfg == "vgg": + config = config_vgg + elif base_model_cfg == "resnet": + config = config_resnet + else: + raise ValueError(f"{base_model_cfg} backbone is not implemented") + merge1_layers = MergeLayer1(config["merge1"]) + merge2_layers = MergeLayer2(config["merge2"]) + + return base, merge1_layers, merge2_layers + + +def build_model(base_model_cfg="vgg"): + """ + build the whole network + """ + if base_model_cfg == "vgg": + return EGNet(base_model_cfg, *extra_layer(base_model_cfg, Vgg16())) + if base_model_cfg == "resnet": + return EGNet(base_model_cfg, *extra_layer(base_model_cfg, resnet50())) + raise ValueError("unknown config") + + +def init_weights(net, init_type="normal", init_gain=0.01, constant=0.001): + """ + Initialize network weights. + """ + np.random.seed(1) + for _, cell in net.cells_and_names(): + if isinstance(cell, nn.Conv2d): + if init_type == "normal": + cell.weight.set_data( + mindspore.Tensor(np.random.normal(0, 0.01, size=cell.weight.shape), dtype=cell.weight.dtype)) + elif init_type == "xavier": + cell.weight.set_data( + init.initializer(init.XavierUniform(init_gain), cell.weight.shape, cell.weight.dtype)) + elif init_type == "constant": + cell.weight.set_data(init.initializer(constant, cell.weight.shape, cell.weight.dtype)) + else: + raise NotImplementedError("initialization method [%s] is not implemented" % init_type) + elif isinstance(cell, nn.BatchNorm2d): + cell.gamma.set_data(init.initializer("ones", cell.gamma.shape, cell.gamma.dtype)) + cell.beta.set_data(init.initializer("zeros", cell.beta.shape, cell.gamma.dtype)) + + +if __name__ == "__main__": + model = build_model() + print(model) diff --git a/research/cv/EGnet/src/resnet.py b/research/cv/EGnet/src/resnet.py new file mode 100644 index 0000000000000000000000000000000000000000..2796e57d993dc6fa77adc208d75a08814804b590 --- /dev/null +++ b/research/cv/EGnet/src/resnet.py @@ -0,0 +1,256 @@ +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +"""Resnet model define""" + +import mindspore as ms +import mindspore.nn as nn +from mindspore import load_checkpoint + +affine_par = True + + +def conv3x3(in_planes, out_planes, stride=1): + return nn.Conv2d(in_planes, out_planes, kernel_size=3, padding="same", stride=stride, has_bias=False) + + +class Bottleneck(nn.Cell): + """ + Bottleneck layer + """ + expansion = 4 + + def __init__(self, in_planes, planes, stride=1, dilation_=1, downsample=None): + super(Bottleneck, self).__init__() + self.conv1 = nn.Conv2d(in_planes, planes, kernel_size=1, stride=stride, has_bias=False) + self.bn1 = nn.BatchNorm2d(planes, affine=affine_par, use_batch_statistics=False) + for i in self.bn1.get_parameters(): + i.requires_grad = False + padding = 1 + if dilation_ == 2: + padding = 2 + elif dilation_ == 4: + padding = 4 + self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=1, padding=padding, pad_mode="pad", has_bias=False, + dilation=dilation_) + + self.bn2 = nn.BatchNorm2d(planes, affine=affine_par, use_batch_statistics=False) + for i in self.bn2.get_parameters(): + i.requires_grad = False + self.conv3 = nn.Conv2d(planes, planes * 4, kernel_size=1, has_bias=False) + self.bn3 = nn.BatchNorm2d(planes * 4, affine=affine_par, use_batch_statistics=False) + for i in self.bn3.get_parameters(): + i.requires_grad = False + self.relu = nn.ReLU() + self.downsample = downsample + self.stride = stride + + def construct(self, x): + """ + forword + """ + residual = x + + out = self.conv1(x) + out = self.bn1(out) + out = self.relu(out) + + out = self.conv2(out) + out = self.bn2(out) + out = self.relu(out) + + out = self.conv3(out) + out = self.bn3(out) + + if self.downsample is not None: + residual = self.downsample(x) + + out += residual + out = self.relu(out) + + return out + + +class ResNet(nn.Cell): + """ + resnet + """ + def __init__(self, block, layers): + self.in_planes = 64 + super(ResNet, self).__init__() + self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, pad_mode="pad", + has_bias=False) + self.bn1 = nn.BatchNorm2d(64, affine=affine_par, use_batch_statistics=False) + for i in self.bn1.get_parameters(): + i.requires_grad = False + self.relu = nn.ReLU() + self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, pad_mode="same") # change + self.layer1 = self._make_layer(block, 64, layers[0]) + self.layer2 = self._make_layer(block, 128, layers[1], stride=2) + self.layer3 = self._make_layer(block, 256, layers[2], stride=2) + self.layer4 = self._make_layer(block, 512, layers[3], stride=1, dilation=2) + + def _make_layer(self, block, planes, blocks, stride=1, dilation=1): + """ + make layer + """ + downsample = None + if stride != 1 or self.in_planes != planes * block.expansion or dilation == 2 or dilation == 4: + downsample = nn.SequentialCell( + nn.Conv2d(self.in_planes, planes * block.expansion, + kernel_size=1, stride=stride, has_bias=False), + nn.BatchNorm2d(planes * block.expansion, affine=affine_par, use_batch_statistics=False), + ) + for i in downsample[1].get_parameters(): + i.requires_grad = False + layers = [block(self.in_planes, planes, stride, dilation_=dilation, downsample=downsample)] + self.in_planes = planes * block.expansion + for i in range(1, blocks): + layers.append(block(self.in_planes, planes, dilation_=dilation)) + + return nn.SequentialCell(*layers) + + def load_pretrained_model(self, model_file): + """ + load pretrained model + """ + load_checkpoint(model_file, net=self) + + def construct(self, x): + """ + forward + """ + tmp_x = [] + x = self.conv1(x) + x = self.bn1(x) + x = self.relu(x) + tmp_x.append(x) + x = self.maxpool(x) + + x = self.layer1(x) + tmp_x.append(x) + x = self.layer2(x) + tmp_x.append(x) + x = self.layer3(x) + tmp_x.append(x) + x = self.layer4(x) + tmp_x.append(x) + + return tmp_x + + +class AdaptiveAvgPool2D(nn.Cell): + """ + AdaptiveAvgPool2D layer + """ + def __init__(self, output_size): + super(AdaptiveAvgPool2D, self).__init__() + self.adaptive_avg_pool = ms.ops.AdaptiveAvgPool2D(output_size) + + def construct(self, x): + """ + forward + :param x: + :return: + """ + return self.adaptive_avg_pool(x) + + +class ResNetLocate(nn.Cell): + """ + resnet for resnet101 + """ + def __init__(self, block, layers): + super(ResNetLocate, self).__init__() + self.resnet = ResNet(block, layers) + self.in_planes = 512 + self.out_planes = [512, 256, 256, 128] + + self.ppms_pre = nn.Conv2d(2048, self.in_planes, 1, 1, bias=False) + ppms, infos = [], [] + for ii in [1, 3, 5]: + ppms.append( + nn.SequentialCell(AdaptiveAvgPool2D(ii), + nn.Conv2d(self.in_planes, self.in_planes, 1, 1, has_bias=False), + nn.ReLU())) + self.ppms = nn.CellList(ppms) + + self.ppm_cat = nn.SequentialCell(nn.Conv2d(self.in_planes * 4, self.in_planes, 3, 1, 1, bias=False), + nn.ReLU()) + for ii in self.out_planes: + infos.append(nn.SequentialCell(nn.Conv2d(self.in_planes, ii, 3, 1, 1, bias=False), nn.ReLU())) + self.infos = nn.CellList(infos) + + self.resize_bilinear = nn.ResizeBilinear() + self.cat = ms.ops.Concat(axis=1) + + def load_pretrained_model(self, model): + self.resnet.load_state_dict(model) + + def construct(self, x): + """ + forward + """ + xs = self.resnet(x) + + xs_1 = self.ppms_pre(xs[-1]) + xls = [xs_1] + for k in range(len(self.ppms)): + xls.append(self.resize_bilinear(self.ppms[k](xs_1), xs_1.size()[2:], align_corners=True)) + xls = self.ppm_cat(self.cat(xls)) + top_score = None + + infos = [] + for k in range(len(self.infos)): + infos.append(self.infos[k]( + self.resize_bilinear(xls, xs[len(self.infos) - 1 - k].size()[2:], align_corners=True))) + + return xs, top_score, infos + + +def resnet50(pretrained=False): + """Constructs a ResNet-50 model. + """ + model_file = "" + model = ResNet(Bottleneck, [3, 4, 6, 3]) + if pretrained: + load_checkpoint(model_file, net=model) + return model + + +def resnet101(pretrained=False): + """Constructs a ResNet-101 model. + """ + model_file = "" + model = ResNetLocate(Bottleneck, [3, 4, 23, 3]) + if pretrained: + load_checkpoint(model_file, net=model) + return model + + +if __name__ == "__main__": + name = "resnet50" + net = resnet50() + num_params = 0 + num_layers = 0 + for n, param in net.parameters_and_names(): + if "moving_" in n: + continue + num_params += param.size + num_layers += 1 + print(name) + print(net) + print(f"The number of layers: {num_layers}") + print(f"The number of parameters: {num_params}") diff --git a/research/cv/EGnet/src/sal_edge_loss.py b/research/cv/EGnet/src/sal_edge_loss.py new file mode 100644 index 0000000000000000000000000000000000000000..87036d47d714c49ba85cd13b2e6b8767367203b0 --- /dev/null +++ b/research/cv/EGnet/src/sal_edge_loss.py @@ -0,0 +1,104 @@ +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +"""SalEdgeLoss define""" + +import mindspore as ms +from mindspore import nn +from mindspore.ops import Equal, Cast, ReduceSum, BCEWithLogitsLoss, OnesLike +from mindspore import Parameter + + +class SalEdgeLoss(nn.Cell): + """ + salience and edge loss + """ + def __init__(self, n_ave_grad, batch_size): + super(SalEdgeLoss, self).__init__() + self.n_ave_grad = n_ave_grad + self.batch_size = batch_size + self.sum = ReduceSum() + self.equal = Equal() + self.cast = Cast() + self.ones = OnesLike() + self.bce = BCEWithLogitsLoss(reduction="sum") + self.zero = ms.Tensor(0, dtype=ms.float32) + # for log + self.sal_loss = Parameter(default_input=0.0, requires_grad=False) + self.edge_loss = Parameter(default_input=0.0, requires_grad=False) + self.total_loss = Parameter(default_input=0.0, requires_grad=False) + + def bce2d_new(self, predict, target): + """ + binary cross entropy loss with weights + """ + pos = self.cast(self.equal(target, 1), ms.float32) + neg = self.cast(self.equal(target, 0), ms.float32) + + num_pos = self.sum(pos) + num_neg = self.sum(neg) + num_total = num_pos + num_neg + + alpha = num_neg / num_total + beta = 1.1 * num_pos / num_total + # target pixel = 1 -> weight beta + # target pixel = 0 -> weight 1-beta + weights = alpha * pos + beta * neg + bce = BCEWithLogitsLoss(reduction="sum") + return bce(predict, target, weights, self.ones(predict)) + + def construct(self, up_edge, up_sal, up_sal_f, sal_label, sal_edge): + """ + compute loss + """ + edge_loss = self.zero + for ix in up_edge: + edge_loss += self.bce2d_new(ix, sal_edge) + edge_loss = edge_loss / (self.n_ave_grad * self.batch_size) + + sal_loss1 = self.zero + sal_loss2 = self.zero + for ix in up_sal: + bce = BCEWithLogitsLoss(reduction="sum") + sal_loss1 += bce(ix, sal_label, self.ones(ix), self.ones(ix)) + for ix in up_sal_f: + bce = BCEWithLogitsLoss(reduction="sum") + sal_loss2 += bce(ix, sal_label, self.ones(ix), self.ones(ix)) + + sal_loss = (sal_loss1 + sal_loss2) / (self.n_ave_grad * self.batch_size) + loss = sal_loss + edge_loss + self.sal_loss, self.edge_loss, self.total_loss = sal_loss, edge_loss, loss + return loss + + +class WithLossCell(nn.Cell): + """ + loss cell + """ + def __init__(self, backbone, loss_fn): + super(WithLossCell, self).__init__(auto_prefix=False) + self.backbone = backbone + self.loss_fn = loss_fn + + def construct(self, data, sal_label, sal_edge): + """ + compute loss + """ + up_edge, up_sal, up_sal_f = self.backbone(data) + return self.loss_fn(up_edge, up_sal, up_sal_f, sal_label, sal_edge) + + @property + def backbone_network(self): + return self.backbone diff --git a/research/cv/EGnet/src/train_forward_backward.py b/research/cv/EGnet/src/train_forward_backward.py new file mode 100644 index 0000000000000000000000000000000000000000..12a50d65b3728a45f9a4ff3ebc0c9786cec1bf4d --- /dev/null +++ b/research/cv/EGnet/src/train_forward_backward.py @@ -0,0 +1,99 @@ +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +"""Train forward and backward define""" + +from mindspore import ops, ParameterTuple +from mindspore.nn import Cell + +_sum_op = ops.MultitypeFuncGraph("grad_sum_op") +_clear_op = ops.MultitypeFuncGraph("clear_op") + + +@_sum_op.register("Tensor", "Tensor") +def _cumulative_grad(grad_sum, grad): + """Apply grad sum to cumulative gradient.""" + add = ops.AssignAdd() + return add(grad_sum, grad) + + +@_clear_op.register("Tensor", "Tensor") +def _clear_grad_sum(grad_sum, zero): + """Apply zero to clear grad_sum.""" + success = True + success = ops.depend(success, ops.assign(grad_sum, zero)) + return success + + +class TrainForwardBackward(Cell): + """ + cell for step train + """ + def __init__(self, network, optimizer, grad_sum, sens=1.0): + super(TrainForwardBackward, self).__init__(auto_prefix=False) + self.network = network + self.network.set_grad() + self.network.add_flags(defer_inline=True) + self.weights = ParameterTuple(network.trainable_params()) + self.optimizer = optimizer + self.grad_sum = grad_sum + self.grad = ops.GradOperation(get_by_list=True, sens_param=True) + self.sens = sens + self.hyper_map = ops.HyperMap() + + def construct(self, *inputs): + """ + forward one step, accumulate grad + """ + weights = self.weights + loss = self.network(*inputs) + sens = ops.Fill()(ops.DType()(loss), ops.Shape()(loss), self.sens) + grads = self.grad(self.network, weights)(*inputs, sens) + return ops.depend(loss, self.hyper_map(ops.partial(_sum_op), self.grad_sum, grads)) + + +class TrainOptimize(Cell): + """ + optimize cell + """ + def __init__(self, optimizer, grad_sum): + super(TrainOptimize, self).__init__(auto_prefix=False) + self.optimizer = optimizer + self.grad_sum = grad_sum + + def construct(self): + """ + optimize + :return: + """ + return self.optimizer(self.grad_sum) + + +class TrainClear(Cell): + """ + clear cell + """ + def __init__(self, grad_sum, zeros): + super(TrainClear, self).__init__(auto_prefix=False) + self.grad_sum = grad_sum + self.zeros = zeros + self.hyper_map = ops.HyperMap() + + def construct(self): + """ + clear grad + """ + success = self.hyper_map(ops.partial(_clear_op), self.grad_sum, self.zeros) + return success diff --git a/research/cv/EGnet/src/vgg.py b/research/cv/EGnet/src/vgg.py new file mode 100644 index 0000000000000000000000000000000000000000..f45a54f0ed4801f60bbace4cb6458de0f386c245 --- /dev/null +++ b/research/cv/EGnet/src/vgg.py @@ -0,0 +1,103 @@ +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +"""VGG model define""" + +import mindspore.nn as nn +from mindspore.train import load_checkpoint +import mindspore + + + +def vgg(cfg, i, batch_norm=False): + """Make stage network of VGG.""" + layers = [] + in_channels = i + stage = 1 + pad = nn.Pad(((0, 0), (0, 0), (1, 1), (1, 1))).to_float(mindspore.dtype.float32) + for v in cfg: + if v == "M": + stage += 1 + layers += [pad, nn.MaxPool2d(kernel_size=3, stride=2, pad_mode="valid")] + else: + conv2d = nn.Conv2d(in_channels, v, kernel_size=3, pad_mode="pad", padding=1, has_bias=True) + if batch_norm: + layers += [conv2d, nn.BatchNorm2d(v), nn.ReLU()] + else: + layers += [conv2d, nn.ReLU()] + in_channels = v + return layers + + +class Vgg16(nn.Cell): + """ + VGG network definition. + """ + + def __init__(self): + super(Vgg16, self).__init__() + self.cfg = {"tun": [64, 64, "M", 128, 128, "M", 256, 256, 256, "M", 512, 512, 512, "M", 512, 512, 512, "M"], + "tun_ex": [512, 512, 512]} + self.extract = [8, 15, 22, 29] + self.extract_ex = [5] + self.base = nn.CellList(vgg(self.cfg["tun"], 3)) + self.base_ex = VggEx(self.cfg["tun_ex"], 512) + + def load_pretrained_model(self, model_file): + load_checkpoint(model_file, net=self.base) + + def construct(self, x, multi=0): + """construct""" + tmp_x = [] + for k in range(len(self.base)): + x = self.base[k](x) + if k in self.extract: + tmp_x.append(x) + x = self.base_ex(x) + tmp_x.append(x) + if multi == 1: + tmp_y = [tmp_x[0]] + return tmp_y + return tmp_x + + +class VggEx(nn.Cell): + """ VGGEx block. """ + + def __init__(self, cfg, incs=512, padding=1, dilation=1): + super(VggEx, self).__init__() + self.cfg = cfg + layers = [] + for v in self.cfg: + conv2d = nn.Conv2d(incs, v, kernel_size=3, pad_mode="pad", padding=padding, dilation=dilation, + has_bias=False) + layers += [conv2d, nn.ReLU()] + incs = v + self.ex = nn.SequentialCell(*layers) + + def construct(self, x): + x = self.ex(x) + return x + + +if __name__ == "__main__": + # print VGG network + net = Vgg16() + num_params = 0 + for m in net.get_parameters(): + print(m) + num_params += m.size + print(net) + print("The number of parameters: {}".format(num_params)) diff --git a/research/cv/EGnet/train.py b/research/cv/EGnet/train.py new file mode 100644 index 0000000000000000000000000000000000000000..b81d59ae35dbb62fb936f805c290457ef34a4625 --- /dev/null +++ b/research/cv/EGnet/train.py @@ -0,0 +1,258 @@ +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +"""Train""" + +import os +from collections import OrderedDict + +from mindspore import set_seed +from mindspore import context +from mindspore import load_checkpoint, save_checkpoint, DatasetHelper +from mindspore.communication import init +from mindspore.context import ParallelMode +from mindspore.nn import Sigmoid +from mindspore.nn.optim import Adam + +from model_utils.config import base_config +from src.dataset import create_dataset, save_img +from src.egnet import build_model, init_weights +from src.sal_edge_loss import SalEdgeLoss, WithLossCell +from src.train_forward_backward import TrainClear, TrainOptimize, TrainForwardBackward +if base_config.train_online: + import moxing as mox + mox.file.shift('os', 'mox') + +def main(config): + if config.train_online: + local_data_url = os.path.join("/cache", config.train_path) + mox.file.copy_parallel(config.online_train_path, local_data_url) + config.train_path = local_data_url + if not config.online_pretrained_path == "": + pretrained_path = os.path.join("/cache", config.pretrained_url) + mox.file.copy_parallel(config.online_pretrained_path, pretrained_path) + if config.pre_trained == "": + if config.base_model == "vgg": + config.vgg = os.path.join("/cache", config.vgg) + mox.file.copy_parallel(os.path.join(config.online_pretrained_path, + os.path.basename(config.vgg)), config.vgg) + elif config.base_model == "resnet": + config.resnet = os.path.join("/cache", config.resnet) + mox.file.copy_parallel(os.path.join(config.online_pretrained_path, + os.path.basename(config.resnet)), config.resnet) + else: + config.pre_trained = os.path.join("/cache", config.pre_trained) + mox.file.copy_parallel(os.path.join(config.online_pretrained_path, + os.path.basename(config.pre_trained)), config.pre_trained) + context.set_context(mode=context.GRAPH_MODE, + device_target=config.device_target, + reserve_class_name_in_scope=False, + device_id=os.getenv('DEVICE_ID', '0')) + if config.is_distributed: + config.epoch = config.epoch * 6 + set_seed(1234) + context.set_auto_parallel_context(parallel_mode=ParallelMode.DATA_PARALLEL, gradients_mean=True) + init() + train_dataset, _ = create_dataset(config.batch_size, num_thread=config.num_thread, train_path=config.train_path, + is_distributed=config.is_distributed) + run = config.train_save_name + if not os.path.exists(config.save_fold): + os.mkdir(config.save_fold) + if not os.path.exists("%s/run-%s" % (config.save_fold, run)): + os.mkdir("%s/run-%s" % (config.save_fold, run)) + os.mkdir("%s/run-%s/logs" % (config.save_fold, run)) + os.mkdir("%s/run-%s/models" % (config.save_fold, run)) + config.save_fold = "%s/run-%s" % (config.save_fold, run) + train = Solver(train_dataset, config) + train.train() + + +class Solver: + def __init__(self, train_ds, config): + self.train_ds = train_ds + self.config = config + self.network = build_model(config.base_model) + init_weights(self.network) + # Load pretrained model + if self.config.pre_trained == "": + if config.base_model == "vgg": + if os.path.exists(self.config.vgg): + self.network.base.load_pretrained_model(self.config.vgg) + print("Load VGG pretrained model") + elif config.base_model == "resnet": + if os.path.exists(self.config.resnet): + self.network.base.load_pretrained_model(self.config.resnet) + print("Load ResNet pretrained model") + else: + raise ValueError("unknown base model") + else: + load_checkpoint(self.config.pre_trained, self.network) + print("Load EGNet pretrained model") + self.log_output = open("%s/logs/log.txt" % config.save_fold, "w") + + """some hyper params""" + p = OrderedDict() + if self.config.base_model == "vgg": # Learning rate resnet:5e-5, vgg:2e-5(begin with 2e-8, warm up to 2e-5 in epoch 3) + p["lr_bone"] = 2e-8 + if self.config.is_distributed: + p["lr_bone"] = 2e-9 + elif self.config.base_model == "resnet": + p["lr_bone"] = 5e-5 + if self.config.is_distributed: + p["lr_bone"] = 5e-9 + else: + raise ValueError("unknown base model") + p["lr_branch"] = 0.025 # Learning rate + p["wd"] = 0.0005 # Weight decay + p["momentum"] = 0.90 # Momentum + self.p = p + self.lr_decay_epoch = [15, 24] + if config.is_distributed: + self.lr_decay_epoch = [15*6, 24*6] + self.tmp_path = "tmp_see" + + self.lr_bone = p["lr_bone"] + self.lr_branch = p["lr_branch"] + self.optimizer = Adam(self.network.trainable_params(), learning_rate=self.lr_bone, + weight_decay=p["wd"], loss_scale=self.config.loss_scale) + self.print_network() + self.loss_fn = SalEdgeLoss(config.n_ave_grad, config.batch_size) + params = self.optimizer.parameters + self.grad_sum = params.clone(prefix="grad_sum", init="zeros") + self.zeros = params.clone(prefix="zeros", init="zeros") + self.train_forward_backward = self.build_train_forward_backward_network() + self.train_optimize = self.build_train_optimize() + self.train_clear = self.build_train_clear() + self.sigmoid = Sigmoid() + + def build_train_forward_backward_network(self): + """Build forward and backward network""" + network = self.network + network = WithLossCell(network, self.loss_fn) + self.config.loss_scale = 1.0 + network = TrainForwardBackward(network, self.optimizer, self.grad_sum, self.config.loss_scale).set_train() + return network + + def build_train_optimize(self): + """Build optimizer network""" + network = TrainOptimize(self.optimizer, self.grad_sum).set_train() + return network + + def build_train_clear(self): + """Build clear network""" + network = TrainClear(self.grad_sum, self.zeros).set_train() + return network + + def print_network(self): + """ + print network architecture + """ + name = "EGNet-" + self.config.base_model + model = self.network + num_params = 0 + i = 0 + for param in model.get_parameters(): + i += 1 + num_params += param.size + print(name) + print(model) + print(f"The number of layers: {i}") + print(f"The number of parameters: {num_params}") + + def train(self): + """training phase""" + ave_grad = 0 + iter_num = self.train_ds.get_dataset_size() + dataset_helper = DatasetHelper(self.train_ds, dataset_sink_mode=False, epoch_num=self.config.epoch) + if not os.path.exists(self.tmp_path): + os.mkdir(self.tmp_path) + for epoch in range(self.config.epoch): + r_edge_loss, r_sal_loss, r_sum_loss = 0, 0, 0 + for i, data_batch in enumerate(dataset_helper): + sal_image, sal_label, sal_edge = data_batch[0], data_batch[1], data_batch[2] + if sal_image.shape[2:] != sal_label.shape[2:]: + print("Skip this batch") + continue + self.train_forward_backward(sal_image, sal_label, sal_edge) + r_edge_loss += self.loss_fn.edge_loss.asnumpy() + r_sal_loss += self.loss_fn.sal_loss.asnumpy() + r_sum_loss += self.loss_fn.total_loss.asnumpy() + + if (ave_grad + 1) % self.config.n_ave_grad == 0: + self.train_optimize() + self.train_clear() + ave_grad = 0 + else: + ave_grad += 1 + if (i + 1) % self.config.show_every == 0: + num_step = self.config.n_ave_grad * self.config.batch_size + log_str = "epoch: [%2d/%2d], iter: [%5d/%5d] || Edge : %10.4f || Sal : %10.4f || Sum : %10.4f" \ + % (epoch + 1, self.config.epoch, i + 1, iter_num, + r_edge_loss * num_step / self.config.show_every, + r_sal_loss * num_step / self.config.show_every, + r_sum_loss * num_step / self.config.show_every) + print(log_str) + print(f"Learning rate: {self.lr_bone}") + self.log_output.write(log_str + "\n") + self.log_output.write(f"Learning rate: {self.lr_bone}\n") + r_edge_loss, r_sal_loss, r_sum_loss = 0, 0, 0 + + if (i + 1) % self.config.save_tmp == 0: + _, _, up_sal_final = self.network(sal_image) + sal = self.sigmoid((up_sal_final[-1])).asnumpy().squeeze() + sal_image = sal_image.asnumpy().squeeze().transpose((1, 2, 0)) + sal_label = sal_label.asnumpy().squeeze() + save_img(sal, os.path.join(self.tmp_path, f"iter{i}-sal-0.jpg")) + save_img(sal_image, os.path.join(self.tmp_path, f"iter{i}-sal-data.jpg")) + save_img(sal_label, os.path.join(self.tmp_path, f"iter{i}-sal-target.jpg")) + + if (epoch + 1) % self.config.epoch_save == 0: + if self.config.train_online: + save_checkpoint(self.network, "epoch_%d_%s_bone.ckpt" % + (epoch + 1, self.config.base_model)) + mox.file.copy_parallel("epoch_%d_%s_bone.ckpt" % + (epoch + 1, self.config.base_model), + os.path.join(self.config.train_url, "epoch_%d_%s_bone.ckpt" % + (epoch + 1, self.config.base_model))) + else: + save_checkpoint(self.network, "%s/models/epoch_%d_%s_bone.ckpt" % + (self.config.save_fold, epoch + 1, self.config.base_model)) + if self.config.base_model == "vgg" or self.config.is_distributed: + if self.config.is_distributed: + lr_rise_epoch = [3, 6, 9, 12] + else: + lr_rise_epoch = [1, 2, 3] + if epoch in lr_rise_epoch: + self.lr_bone = self.lr_bone * 10 + self.optimizer = Adam(filter(lambda p: p.requires_grad, self.network.get_parameters()), + learning_rate=self.lr_bone, weight_decay=self.p["wd"]) + self.train_optimize = self.build_train_optimize() + if epoch in self.lr_decay_epoch: + self.lr_bone = self.lr_bone * 0.1 + self.optimizer = Adam(filter(lambda p: p.requires_grad, self.network.get_parameters()), + learning_rate=self.lr_bone, weight_decay=self.p["wd"]) + self.train_optimize = self.build_train_optimize() + if self.config.train_online: + save_checkpoint(self.network, "final_%s_bone.ckpt"% (self.config.base_model)) + mox.file.copy_parallel("final_%s_bone.ckpt"% (self.config.base_model), + os.path.join(self.config.train_url, "final_%s_bone.ckpt"% (self.config.base_model))) + else: + save_checkpoint(self.network, + "%s/models/final_%s_bone.ckpt" % (self.config.save_fold, self.config.base_model)) + + +if __name__ == "__main__": + main(base_config) + \ No newline at end of file