diff --git a/research/cv/PAGENet/README.md b/research/cv/PAGENet/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..b43372f0b82b71f78fd4683d3e8144ae1157d6d9
--- /dev/null
+++ b/research/cv/PAGENet/README.md
@@ -0,0 +1,175 @@
+## 目录
+
+- [目录](#目录)
+ - [pagenet描述](#pagenet描述)
+ - [模型架构](#模型架构)
+ - [数据集](#数据集)
+ - [数据集配置](#数据集配置)
+ - [环境要求](#环境要求)
+ - [脚本说明](#脚本说明)
+ - [代码文件说明](#代码文件说明)
+ - [脚本参数](#脚本参数)
+ - [训练过程](#训练过程)
+ - [训练](#训练)
+ - [分布式训练](#分布式训练)
+ - [评估过程](#评估过程)
+ - [导出过程](#导出过程)
+ - [模型描述](#模型描述)
+ - [评估性能](#评估性能)
+ - [推理性能](#推理性能)
+ - [modelzoo主页](#modelzoo主页)
+
+## pagenet描述
+
+PAGE-Net是通过监督学习解决显著性目标检测问题,它由提取特征的骨干网络模块,金字塔注意力模块和显著性边缘检测模块三部分构成。作者通过融合不同分辨率的显著性信息使得到的特征有更大的感受野和更好的表达能力,同时显著性边缘检测模块获得的边缘信息也能更加精确的分割显著性物体的边缘部分,从而使检测的结果更加精确。与其他19个工作在6个数据集上通过3种评价指标进行评估表明,PAGE-Net有着更加优异的性能和有竞争力的结果。
+
+[PAGE-Net的tensorflow-keras源码](https://github.com/wenguanwang/PAGE-Net),由论文作者提供。具体包含运行文件、模型文件,此外还有数据集,预训练模型的获取途径。
+
+[论文](https://www.researchgate.net/publication/332751907_Salient_Object_Detection_With_Pyramid_Attention_and_Salient_Edges):Wang W , Zhao S , Shen J , et al. Salient Object Detection With Pyramid Attention and Salient Edges[C]// CVPR19. 2019.
+
+## 模型架构
+
+PAGE-Net网络由三个部分组成,提取特征的CNN模块,金字塔注意力模块和边缘检测模块。预处理后的输入图片通过降采样输出特征信息,与此同时,对每一层的特征通过金字塔注意力模块生成更好表达力的特征,然后将边缘信息与不同深度提取出来的多尺度特征进行融合,最终输出了一张融合后的显著性检测图像。
+
+## 数据集
+
+数据集统一放在一个目录
+
+### 数据集配置
+
+数据集目录修改在config.py中,训练集变量为train_dataset_imgs,train_dataset_gts,train_dataset_edges,
+测试集路径请自行修改
+测试集若要使用自己的数据集,请添加数据集路径,并在train.py中添加新增的数据集
+
+- 训练集:
+
+ [THUS10K数据集]([MSRA10K Salient Object Database – 程明明个人主页 (mmcheng.net)](https://mmcheng.net/msra10k/)) , 342MB,共有10000张带有标签的图像
+
+- 测试集:
+
+ [ECSSD数据集](https://gitee.com/link?target=http%3A%2F%2Fwww.cse.cuhk.edu.hk%2Fleojia%2Fprojects%2Fhsaliency%2Fdata%2FECSSD%2Fimages.zip%EF%BC%8Chttp%3A%2F%2Fwww.cse.cuhk.edu.hk%2Fleojia%2Fprojects%2Fhsaliency%2Fdata%2FECSSD%2Fground_truth_mask.zip),67.2MB,共1000张
+
+ [DUTS-OMRON数据集](https://gitee.com/link?target=http%3A%2F%2Fsaliencydetection.net%2Fdut-omron%2F),113MB,共5163张
+
+ [HKU-IS数据集](https://gitee.com/link?target=https%3A%2F%2Fi.cs.hku.hk%2F~gbli%2Fdeep_saliency.html),899MB,共4447张
+
+ [SOD数据集](https://gitee.com/link?target=https%3A%2F%2Fwww.elderlab.yorku.ca%2F%3Fsmd_process_download%3D1%26download_id%3D8285),19.7MB,共1000张
+
+ [DUTS-TE数据集](https://gitee.com/link?target=http%3A%2F%2Fsaliencydetection.net%2Fduts%2Fdownload%2FDUTS-TE.zip),132MB,共5019张
+
+## 环境要求
+
+- 硬件(CPU/GPU)
+
+- 如需查看详情,请参见如下资源:
+
+ [MindSpore教程](https://gitee.com/link?target=https%3A%2F%2Fwww.mindspore.cn%2Ftutorials%2Fzh-CN%2Fmaster%2Findex.html)
+ [MindSpore Python API](https://gitee.com/link?target=https%3A%2F%2Fwww.mindspore.cn%2Fdocs%2Fapi%2Fzh-CN%2Fmaster%2Findex.html)
+
+- 需要的包
+
+ Mindspore-GPU 1.5.0
+
+## 脚本说明
+
+### 代码文件说明
+
+```markdown
+├── model_zoo
+ ├── PAGENet
+ ├── dataset
+ │ ├──train_dataset #训练集
+ │ ├──test_dataset #测试集
+ ├──README.md # README文件
+ ├── config.py # 参数配置脚本文件
+ ├── scripts
+ │ ├──run_standalone_train_gpu.sh # 单卡训练脚本文件
+ │ ├──run_distribute_train_gpu.sh # 多卡训练脚本文件
+ │ ├──run_eval.sh # 评估脚本文件
+ ├── src
+ │ ├──mind_dataloader.py # 加载数据集并进行预处理
+ │ ├──pagenet.py # pageNet的网络结构
+ │ ├──train_loss.py # 损失定义
+ ├── train.py # 训练脚本
+ ├── eval.py # 评估脚本
+ ├── export.py # 模型导出脚本
+```
+
+### 脚本参数
+
+```markdown
+device_target: "GPU" # 运行设备 ["CPU", "GPU"]
+batch_size: 20 # 训练批次大小
+n_ave_grad: 10 # 梯度累积step数
+epoch_size: 100 # 总计训练epoch数
+image_height: 224 # 输入到模型的图像高度
+image_width: 224 # 输入到模型的图像宽度
+train_path: "./data/DUTS-TR/" # 训练数据集的路径
+test_path: "./data" # 测试数据集的根目录
+vgg: "/home/EGnet/EGnet/model/vgg16.ckpt" # vgg预训练模型的路径
+resnet: "/home/EGnet/EGnet/model/resnet50.ckpt" # resnet预训练模型的路径
+model: "EGNet/run-nnet/models/final_vgg_bone.ckpt" # 测试时使用的checkpoint文件
+```
+
+## 训练过程
+
+```bash
+
+### 训练
+cd scripts
+bash run_standalone_train_gpu.sh
+
+### 分布式训练
+bash run_distribute_train_gpu.sh
+
+## 评估过程
+bash run_eval.sh [CKPT_FILE] #CKPT_FILE 为权重文件名,请将权重文件放在当前目录下
+
+## 导出过程
+
+python export.py
+
+```
+
+## 模型描述
+
+### 评估性能
+
+ THUS10K上的PAGE-Net(GPU)
+
+| 参数 | GPU(单卡) | GPU(8卡) |
+| ------------- | -------------------------------- | ------------------------------- |
+| 模型 | PAGE-Net | PAGE-Net |
+| 上传日期 | 2022.6.20 | 2022.6.20 |
+| Mindspore版本 | 1.5.0 | 1.5.0 |
+| 数据集 | THUS10K | THUS10K |
+| 训练参数 | epoch=100,steps=1000,batch_size=10|epoch=200,steps=125,batch_size=10|
+| 损失函数 | MSE&BCE | MSE&BCE |
+| 优化器 | Adam | Adam |
+| 速度 | 52s/step | 87s/step |
+| 总时长 | 7h15m0s | 3h28m0s |
+| 微调检查点 | 390M(.ckpt文件) | 390M(.ckpt文件) |
+
+### 推理性能
+
+显著性目标检测数据集上的PAGE-Net(GPU)
+
+| 参数 | GPU(单卡) | GPU(8卡) |
+| ------------- | --------------------- | ---------------------|
+| 模型 | PAGE-Net | PAGE-Net |
+| 上传日期 | 2022.6.20 | 2022.6.20 |
+| Mindspore版本 | 1.5.0 | 1.5.0 |
+| 数据集 | SOD, 1000张图像 | SOD, 1000张图像 |
+| 评估指标 | F-score:0.974 | F-score:0.974 |
+| 数据集 | ECCSD, 1000张图像 | ECCSD, 1000张图像 |
+| 评估指标 | F-score: 0.845 | F-score:0.845 |
+| 数据集 | DUTS-OMRON, 5163张图像| DUTS-OMRON, 5163张图像|
+| 评估指标 | F-score: 0.80 | F-score: 0.80 |
+| 数据集 | HKU-IS, 4447张图像 | HKU-IS, 4447张图像 |
+| 评估指标 | F-score: 0.842 | F-score: 0.842 |
+| 数据集 | DUTS-TE, 5019张图像 | DUTS-TE, 5019张图像 |
+| 评估指标 | F-score: 0.778 | F-score: 0.778 |
+
+## modelzoo主页
+
+请浏览官网[主页](https://gitee.com/mindspore/models)。
diff --git a/research/cv/PAGENet/config.py b/research/cv/PAGENet/config.py
new file mode 100644
index 0000000000000000000000000000000000000000..d129f55d63cbdaef52740acb1bbba62ce0ba16e6
--- /dev/null
+++ b/research/cv/PAGENet/config.py
@@ -0,0 +1,45 @@
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+from mindspore import context
+
+train_img_path = "./dataset/train_dataset/images"
+train_gt_path = "./dataset/train_dataset/labels"
+train_edge_path = "./dataset/train_dataset/edges"
+
+DUT_OMRON_img_path = "./dataset/test_dataset/DUT-OMRON/DUT-OMRON-image"
+DUT_OMRON_gt_path = "./dataset/test_dataset/DUT-OMRON/DUT-OMRON-mask"
+DUTS_TE_img_path = "./dataset/test_dataset/DUTS-TE/DUTS-TE-Image"
+DUTS_TE_gt_path = "./dataset/test_dataset/DUTS-TE/DUTS-TE-Mask"
+ECCSD_img_path = "./dataset/test_dataset/ECCSD/ECCSD-image"
+ECCSD_gt_path = "./dataset/test_dataset/ECCSD/ECCSD-mask"
+HKU_IS_img_path = "./dataset/test_dataset/HKU-IS/HKU-IS-image"
+HKU_IS_gt_path = "./dataset/test_dataset/HKU-IS/HKU-IS-mask"
+SOD_img_path = "./dataset/test_dataset/SOD/SOD-image"
+SOD_gt_path = "./dataset/test_dataset/SOD/SOD-mask"
+
+batch_size = 10
+train_size = 224
+
+device_target = 'GPU'
+LR = 2e-5
+WD = 0.0005
+EPOCH = 100
+
+MODE = context.GRAPH_MODE
+ckpt_file = "PAGENET.ckpt"
+
+file_name = 'pagenet'
+file_format = 'MINDIR'
diff --git a/research/cv/PAGENet/eval.py b/research/cv/PAGENet/eval.py
new file mode 100644
index 0000000000000000000000000000000000000000..07b81ceb7d1534303d556bcd925e8d4395e86aae
--- /dev/null
+++ b/research/cv/PAGENet/eval.py
@@ -0,0 +1,109 @@
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+import argparse
+import time
+import numpy as np
+import mindspore.nn as nn
+import mindspore as ms
+from mindspore import context
+import config
+from config import MODE, device_target, train_size
+
+from src.pagenet import MindsporeModel
+from src.mind_dataloader_final import get_test_loader
+
+
+def main(test_img_path, test_gt_path, ckpt_file):
+ # context_set
+ context.set_context(mode=MODE,
+ device_target=device_target,
+ reserve_class_name_in_scope=False)
+
+ # dataset
+ test_loader = get_test_loader(test_img_path, test_gt_path, batchsize=1, testsize=train_size)
+ data_iterator = test_loader.create_tuple_iterator()
+ # step
+ total_test_step = 0
+ test_data_size = test_loader.get_dataset_size()
+ # loss&eval
+ loss = nn.Loss()
+ mae = nn.MAE()
+ F_score = nn.F1()
+ # model
+ model = MindsporeModel()
+ ckpt_file_name = ckpt_file
+ ms.load_checkpoint(ckpt_file_name, net=model)
+
+ model.set_train(False)
+
+ mae.clear()
+ loss.clear()
+ start = time.time()
+ for imgs, targets in data_iterator:
+
+ targets1 = targets.astype(int)
+ outputs = model(imgs)
+ pre_mask = outputs[9]
+ pre_mask = pre_mask.flatten()
+ targets1 = targets1.flatten()
+
+ pre_mask1 = pre_mask.asnumpy().tolist()
+
+ F_pre = np.array([[1 - i, i] for i in pre_mask1])
+
+ F_score.update(F_pre, targets1)
+
+ mae.update(pre_mask, targets1)
+
+ total_test_step = total_test_step + 1
+ if total_test_step % 100 == 0:
+ print("evaling:{}/{}".format(total_test_step, test_data_size))
+
+ end = time.time()
+ total = end - start
+ print("total time is {}h".format(total / 3600))
+ print("step time is {}s".format(total / (test_data_size)))
+ mae_result = mae.eval()
+
+ F_score_result = F_score.eval()
+ print("mae: ", mae_result)
+
+ print("F-score: ", (F_score_result[0] + F_score_result[1]) / 2)
+
+
+if __name__ == "__main__":
+ parser = argparse.ArgumentParser(description='manual to this script')
+ parser.add_argument('-s', '--test_set', type=str)
+ parser.add_argument('-c', '--ckpt', type=str)
+ args = parser.parse_args()
+ if args.test_set == 'DUT-OMRON':
+ img_path = config.DUT_OMRON_img_path
+ gt_path = config.DUT_OMRON_gt_path
+ elif args.test_set == 'DUTS-TE':
+ img_path = config.DUTS_TE_img_path
+ gt_path = config.DUTS_TE_gt_path
+ elif args.test_set == 'ECCSD':
+ img_path = config.ECCSD_img_path
+ gt_path = config.ECCSD_gt_path
+ elif args.test_set == 'HKU-IS':
+ img_path = config.HKU_IS_img_path
+ gt_path = config.HKU_IS_gt_path
+ elif args.test_set == 'SOD':
+ img_path = config.SOD_img_path
+ gt_path = config.SOD_gt_path
+ else:
+ print("dataset is not exist")
+ ckpt = args.ckpt
+ main(img_path, gt_path, ckpt)
diff --git a/research/cv/PAGENet/export.py b/research/cv/PAGENet/export.py
new file mode 100644
index 0000000000000000000000000000000000000000..148b608446f366a7a3e587f9e3af19a1ffa96255
--- /dev/null
+++ b/research/cv/PAGENet/export.py
@@ -0,0 +1,47 @@
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+"""
+##############export checkpoint file into air, mindir models#################
+python export.py
+"""
+import numpy as np
+
+import mindspore as ms
+from mindspore import Tensor, load_checkpoint, load_param_into_net, export, context
+
+from src.pagenet import MindsporeModel
+import config
+
+
+def run_export():
+ """
+ run export operation
+ """
+ context.set_context(mode=context.GRAPH_MODE, device_target=config.device_target)
+
+ net = MindsporeModel()
+
+ if config.ckpt_file is not None:
+ print("config.ckpt_file is None.")
+ param_dict = load_checkpoint(config.ckpt_file)
+ load_param_into_net(net, param_dict)
+
+ input_arr = Tensor(np.ones([config.batch_size, 3, 224, 224]), ms.float32)
+ export(net, input_arr, file_name=config.file_name, file_format=config.file_format)
+
+
+if __name__ == "__main__":
+ run_export()
diff --git a/research/cv/PAGENet/requirements.txt b/research/cv/PAGENet/requirements.txt
new file mode 100644
index 0000000000000000000000000000000000000000..c1059c7da84b1b3c3a09ccf4d42bccbab1fa5785
--- /dev/null
+++ b/research/cv/PAGENet/requirements.txt
@@ -0,0 +1,2 @@
+numpy
+PIL
diff --git a/research/cv/PAGENet/scripts/run_distribute_train_gpu.sh b/research/cv/PAGENet/scripts/run_distribute_train_gpu.sh
new file mode 100644
index 0000000000000000000000000000000000000000..69b19458fdea288e46fe16c67a3e89552759deb4
--- /dev/null
+++ b/research/cv/PAGENet/scripts/run_distribute_train_gpu.sh
@@ -0,0 +1,35 @@
+#!/bin/bash
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+# Get absolute path
+echo "Usage: bash run_distribute_gpu.sh"
+
+get_real_path(){
+ if [ "${1:0:1}" == "/" ]; then
+ echo "$1"
+ else
+ echo "$(realpath -m $PWD/$1)"
+ fi
+}
+
+# Get current script path
+BASE_PATH=$(cd "`dirname $0`" || exit; pwd)
+
+cd $BASE_PATH/..
+
+mpirun --allow-run-as-root -n 8 python train.py --train_mode 'distribute' &> distribute.log 2>&1 &
+
+echo "The train log is at ../distribute.log."
diff --git a/research/cv/PAGENet/scripts/run_eval.sh b/research/cv/PAGENet/scripts/run_eval.sh
new file mode 100644
index 0000000000000000000000000000000000000000..4f56ae45e99fdde680519bd7177c516271c9f0d4
--- /dev/null
+++ b/research/cv/PAGENet/scripts/run_eval.sh
@@ -0,0 +1,27 @@
+#!/bin/bash
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# =========================================================================
+if [ $# != 1 ]
+then
+ echo "Usage: bash run_eval.sh [CKPT_PATH]"
+exit 1
+fi
+
+cd ..
+python eval.py -s 'DUT-OMRON' -c $1 &> test_O.log 2>&1 &
+python eval.py -s 'DUTS-TE' -c $1 &> test_T.log 2>&1 &
+python eval.py -s 'ECCSD' -c $1 &> test_E.log 2>&1 &
+python eval.py -s 'HKU-IS' -c $1 &> test_H.log 2>&1 &
+python eval.py -s 'SOD' -c $1 &> test_S.log 2>&1 &
diff --git a/research/cv/PAGENet/scripts/run_standalone_train_gpu.sh b/research/cv/PAGENet/scripts/run_standalone_train_gpu.sh
new file mode 100644
index 0000000000000000000000000000000000000000..f7664f066bf9c513838f374ea299c0182082ec8f
--- /dev/null
+++ b/research/cv/PAGENet/scripts/run_standalone_train_gpu.sh
@@ -0,0 +1,33 @@
+#!/bin/bash
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+echo "Usage: bash run_standalone_train_gpu.sh "
+# Get absolute path
+get_real_path(){
+ if [ "${1:0:1}" == "/" ]; then
+ echo "$1"
+ else
+ echo "$(realpath -m $PWD/$1)"
+ fi
+}
+
+# Get current script path
+BASE_PATH=$(cd "`dirname $0`" || exit; pwd)
+
+cd $BASE_PATH/..
+
+python train.py --train_mode 'single' &> standalone_train.log 2>&1 &
+
+echo "The train log is at ../standalone_train.log."
diff --git a/research/cv/PAGENet/src/mind_dataloader_final.py b/research/cv/PAGENet/src/mind_dataloader_final.py
new file mode 100644
index 0000000000000000000000000000000000000000..9b3528ff00bc2e69f04780e9b8bfaa69d5868687
--- /dev/null
+++ b/research/cv/PAGENet/src/mind_dataloader_final.py
@@ -0,0 +1,143 @@
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+import os
+import numpy as np
+import mindspore.dataset as ds
+import mindspore.dataset.transforms as transforms
+import mindspore.dataset.vision as vision
+import mindspore.dataset.vision.c_transforms as C
+from PIL import Image
+
+
+# transform img to tensor and resize to 224x224x3
+class TrainData:
+ """
+ dataloader for pageNet
+ """
+
+ def __init__(self, image_root, gt_root, edge_root, img_size, augmentations):
+ self.img_size = img_size
+ self.augmentations = augmentations
+ self.images = [image_root + "/" + f for f in os.listdir(image_root) if
+ f.endswith('.jpg') or f.endswith('.png')]
+ self.gts = [gt_root + "/" + f for f in os.listdir(gt_root) if f.endswith('.jpg') or f.endswith('.png')]
+ self.edges = [edge_root + "/" + f for f in os.listdir(edge_root) if f.endswith('.jpg') or f.endswith('.png')]
+ self.images = sorted(self.images)
+ self.gts = sorted(self.gts)
+ self.edges = sorted(self.edges)
+ self.size = len(self.images)
+
+ print('no augmentation')
+ self.img_transform = transforms.c_transforms.Compose([
+ C.Resize((self.img_size, self.img_size)),
+ vision.py_transforms.ToTensor()])
+ self.gt_transform = transforms.c_transforms.Compose([
+ C.Resize((self.img_size, self.img_size)),
+ vision.py_transforms.ToTensor()])
+ self.edge_transform = transforms.c_transforms.Compose([
+ C.Resize((self.img_size, self.img_size)),
+ vision.py_transforms.ToTensor()])
+
+ def __getitem__(self, index):
+
+ img = Image.open(self.images[index], 'r').convert('RGB')
+
+ gt = Image.open(self.gts[index], 'r').convert('1')
+
+ edge = Image.open(self.edges[index], 'r').convert('1')
+
+ if self.img_transform is not None:
+ img = np.array(img, dtype=np.float32)
+ img -= np.array((104.00699, 116.66877, 122.67892))
+ img = self.img_transform(img)
+ img = img * 255
+
+ if self.gt_transform is not None:
+ gt = self.gt_transform(gt)
+ gt = gt * 255
+
+ if self.edge_transform is not None:
+ edge = self.edge_transform(edge)
+ edge = edge * 255
+
+ return img, gt, edge
+
+ def __len__(self):
+ return self.size
+
+
+class TestData:
+ """
+ dataloader for pageNet
+ """
+
+ def __init__(self, image_root, gt_root, img_size, augmentations):
+ self.img_size = img_size
+ self.augmentations = augmentations
+ self.images = [image_root + "/" + f for f in os.listdir(image_root) if
+ f.endswith('.jpg') or f.endswith('.png')]
+ self.gts = [gt_root + "/" + f for f in os.listdir(gt_root) if f.endswith('.jpg') or f.endswith('.png')]
+ self.images = sorted(self.images)
+ self.gts = sorted(self.gts)
+ self.size = len(self.images)
+
+ self.img_transform = transforms.c_transforms.Compose([
+ C.Resize((self.img_size, self.img_size)),
+ vision.py_transforms.ToTensor()])
+ self.gt_transform = transforms.c_transforms.Compose([
+ C.Resize((self.img_size, self.img_size)),
+ vision.py_transforms.ToTensor(),
+ ])
+
+ def __getitem__(self, index):
+
+ img = Image.open(self.images[index], 'r').convert('RGB')
+
+ gt = Image.open(self.gts[index], 'r').convert('1')
+
+ if self.img_transform is not None:
+ img = np.array(img, dtype=np.float32)
+ img -= np.array((104.00699, 116.66877, 122.67892))
+ img = self.img_transform(img)
+ img = img * 255
+
+ if self.gt_transform is not None:
+ gt = self.gt_transform(gt)
+ gt = gt * 255
+
+ return img, gt
+
+ def __len__(self):
+ return self.size
+
+
+def get_train_loader(image_root, gt_root, edge_root, batchsize, trainsize, device_num=1, rank_id=0, shuffle=True,
+ num_parallel_workers=1, augmentation=False):
+ dataset_generator = TrainData(image_root, gt_root, edge_root, trainsize, augmentation)
+ dataset = ds.GeneratorDataset(dataset_generator, ["imgs", "gts", "edges"], shuffle=shuffle,
+ num_parallel_workers=num_parallel_workers, num_shards=device_num, shard_id=rank_id)
+
+ data_loader = dataset.batch(batch_size=batchsize)
+
+ return data_loader
+
+
+def get_test_loader(image_root, gt_root, batchsize, testsize, augmentation=False):
+ dataset_generator = TestData(image_root, gt_root, testsize, augmentation)
+ dataset = ds.GeneratorDataset(dataset_generator, ["imgs", "gts"])
+
+ data_loader = dataset.batch(batch_size=batchsize)
+ return data_loader
diff --git a/research/cv/PAGENet/src/pagenet.py b/research/cv/PAGENet/src/pagenet.py
new file mode 100644
index 0000000000000000000000000000000000000000..2e6e4afee6ae9a176ad3ba6160301931e9015e35
--- /dev/null
+++ b/research/cv/PAGENet/src/pagenet.py
@@ -0,0 +1,416 @@
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+import mindspore.ops as P
+from mindspore import nn
+
+
+class upSampleLike(nn.Cell):
+
+ def __init__(self):
+ super(upSampleLike, self).__init__()
+ self.resize = nn.ResizeBilinear()
+
+ def construct(self, fm, x1):
+ fm = self.resize(fm, (x1.shape[2], x1.shape[3]))
+ return fm
+
+
+class MindsporeModel(nn.Cell):
+
+ def __init__(self):
+ super(MindsporeModel, self).__init__()
+
+ self.in_channels = 3
+ self.conv3_64 = self.__make_layer(64, 2)
+ self.conv3_128 = self.__make_layer(128, 2)
+ self.conv3_256 = self.__make_layer(256, 3)
+ self.conv3_512a = self.__make_layer(512, 3)
+ self.conv3_512b = self.__make_layer(512, 3)
+ self.max_1 = nn.MaxPool2d((2, 2), stride=(2, 2), pad_mode='same')
+ self.max_4 = nn.MaxPool2d((4, 4), stride=(4, 4), pad_mode='same')
+ self.max_8 = nn.MaxPool2d((8, 8), stride=(8, 8), pad_mode='same')
+ self.max_16 = nn.MaxPool2d((16, 16), stride=(16, 16), pad_mode='same')
+ self.max_32 = nn.MaxPool2d((32, 32), stride=(32, 32), pad_mode='same')
+
+ self.upSampleLike = upSampleLike()
+
+ # edge detection module
+ self.salConv6 = nn.Conv2d(kernel_size=(5, 5), in_channels=512, out_channels=512, stride=(1, 1), dilation=(1, 1),
+ padding=(2, 2, 2, 2), pad_mode='pad', group=1, has_bias=True)
+ self.sigmoid = nn.Sigmoid()
+ self.relu = nn.LeakyReLU()
+ self.salConv5 = nn.Conv2d(kernel_size=(3, 3), in_channels=512, out_channels=512, stride=(1, 1), dilation=(1, 1),
+ padding=(1, 1, 1, 1), pad_mode='pad', group=1, has_bias=True)
+ self.edgeConv5 = nn.Conv2d(kernel_size=(3, 3), in_channels=512, out_channels=512, stride=(1, 1),
+ dilation=(1, 1), padding=(1, 1, 1, 1), pad_mode='pad', group=1, has_bias=True)
+ self.salConv4 = nn.Conv2d(kernel_size=(3, 3), in_channels=512, out_channels=256, stride=(1, 1), dilation=(1, 1),
+ padding=(1, 1, 1, 1), pad_mode='pad', group=1, has_bias=True)
+ self.salConv4_1 = nn.Conv2d(kernel_size=(3, 3), in_channels=256, out_channels=256, stride=(1, 1),
+ dilation=(1, 1), padding=(1, 1, 1, 1), pad_mode='pad', group=1, has_bias=True)
+ self.edgeConv4 = nn.Conv2d(kernel_size=(3, 3), in_channels=512, out_channels=256, stride=(1, 1),
+ dilation=(1, 1), padding=(1, 1, 1, 1), pad_mode='pad', group=1, has_bias=True)
+ self.edgeConv4_1 = nn.Conv2d(kernel_size=(3, 3), in_channels=256, out_channels=256, stride=(1, 1),
+ dilation=(1, 1), padding=(1, 1, 1, 1), pad_mode='pad', group=1, has_bias=True)
+ self.salConv3 = nn.Conv2d(kernel_size=(3, 3), in_channels=256, out_channels=256, stride=(1, 1), dilation=(1, 1),
+ padding=(1, 1, 1, 1), pad_mode='pad', group=1, has_bias=True)
+ self.edgeConv3 = nn.Conv2d(kernel_size=(3, 3), in_channels=256, out_channels=256, stride=(1, 1),
+ dilation=(1, 1), padding=(1, 1, 1, 1), pad_mode='pad', group=1, has_bias=True)
+ self.salConv2 = nn.Conv2d(kernel_size=(3, 3), in_channels=128, out_channels=128, stride=(1, 1), dilation=(1, 1),
+ padding=(1, 1, 1, 1), pad_mode='pad', group=1, has_bias=True)
+ self.edgeConv2 = nn.Conv2d(kernel_size=(3, 3), in_channels=128, out_channels=128, stride=(1, 1),
+ dilation=(1, 1), padding=(1, 1, 1, 1), pad_mode='pad', group=1, has_bias=True)
+ self.salConv1 = nn.Conv2d(kernel_size=(3, 3), in_channels=64, out_channels=64, stride=(1, 1), dilation=(1, 1),
+ padding=(1, 1, 1, 1), pad_mode='pad', group=1, has_bias=True)
+ self.edgeConv1 = nn.Conv2d(kernel_size=(3, 3), in_channels=64, out_channels=64, stride=(1, 1), dilation=(1, 1),
+ padding=(1, 1, 1, 1), pad_mode='pad', group=1, has_bias=True)
+
+ # saliency + edge + attention
+ self.Reshape = P.Reshape()
+ self.Transpose = P.Transpose()
+ self.Tile = P.Tile()
+ self.Mul = P.Mul()
+ self.Add = P.Add()
+ self.softmax = nn.Softmax()
+ self.flatten = nn.Flatten()
+ self.conv1 = nn.Conv2d(1, 1, (1, 1))
+ self.conv1_512 = nn.Conv2d(kernel_size=(1, 1), in_channels=512, out_channels=1, stride=(1, 1), dilation=(1, 1),
+ padding=0, pad_mode='valid', group=1, has_bias=True)
+ self.conv1_256 = nn.Conv2d(kernel_size=(1, 1), in_channels=256, out_channels=1, stride=(1, 1), dilation=(1, 1),
+ padding=0, pad_mode='valid', group=1, has_bias=True)
+ self.conv1_128 = nn.Conv2d(kernel_size=(1, 1), in_channels=128, out_channels=1, stride=(1, 1), dilation=(1, 1),
+ padding=0, pad_mode='valid', group=1, has_bias=True)
+ self.conv1_64 = nn.Conv2d(kernel_size=(1, 1), in_channels=64, out_channels=1, stride=(1, 1), dilation=(1, 1),
+ padding=0, pad_mode='valid', group=1, has_bias=True)
+ self.conv1_32 = nn.Conv2d(kernel_size=(1, 1), in_channels=32, out_channels=1, stride=(1, 1), dilation=(1, 1),
+ padding=0, pad_mode='valid', group=1, has_bias=True)
+ self.conv3_513 = nn.Conv2d(kernel_size=(3, 3), in_channels=513, out_channels=256, stride=(1, 1),
+ dilation=(1, 1), padding=(1, 1, 1, 1), pad_mode='pad', group=1, has_bias=True)
+ self.conv3_257 = nn.Conv2d(kernel_size=(3, 3), in_channels=257, out_channels=256, stride=(1, 1),
+ dilation=(1, 1), padding=(1, 1, 1, 1), pad_mode='pad', group=1, has_bias=True)
+ self.conv257_128 = nn.Conv2d(kernel_size=(3, 3), in_channels=257, out_channels=128, stride=(1, 1),
+ dilation=(1, 1), padding=(1, 1, 1, 1), pad_mode='pad', group=1, has_bias=True)
+ self.dilation_256 = nn.Conv2d(kernel_size=(1, 1), in_channels=256, out_channels=1, stride=(1, 1),
+ dilation=(3, 3), padding=(1, 1, 1, 1), pad_mode='pad', group=1, has_bias=True)
+ self.dilation_128 = nn.Conv2d(kernel_size=(1, 1), in_channels=128, out_channels=1, stride=(1, 1),
+ dilation=(3, 3), padding=(1, 1, 1, 1), pad_mode='pad', group=1, has_bias=True)
+ self.dilation_64 = nn.Conv2d(kernel_size=(1, 1), in_channels=64, out_channels=1, stride=(1, 1), dilation=(3, 3),
+ padding=(1, 1, 1, 1), pad_mode='pad', group=1, has_bias=True)
+ self.dilation_32 = nn.Conv2d(kernel_size=(1, 1), in_channels=32, out_channels=1, stride=(1, 1), dilation=(3, 3),
+ padding=(1, 1, 1, 1), pad_mode='pad', group=1, has_bias=True)
+
+ # conv function
+ self.conv_128_1 = nn.Conv2d(128, 1, (1, 1))
+ self.conv_64_1 = nn.Conv2d(64, 1, (1, 1))
+ self.conv_32_1 = nn.Conv2d(32, 1, (1, 1))
+ self.conv_258_128 = nn.Conv2d(kernel_size=(3, 3), in_channels=258, out_channels=128, stride=(1, 1),
+ dilation=(1, 1), padding=(1, 1, 1, 1), pad_mode='pad', group=1, has_bias=True)
+ self.conv_129_128 = nn.Conv2d(kernel_size=(3, 3), in_channels=129, out_channels=128, stride=(1, 1),
+ dilation=(1, 1), padding=(1, 1, 1, 1), pad_mode='pad', group=1, has_bias=True)
+ self.conv_259_128 = nn.Conv2d(kernel_size=(3, 3), in_channels=259, out_channels=128, stride=(1, 1),
+ dilation=(1, 1), padding=(1, 1, 1, 1), pad_mode='pad', group=1, has_bias=True)
+ self.conv_131_64 = nn.Conv2d(kernel_size=(3, 3), in_channels=131, out_channels=64, stride=(1, 1),
+ dilation=(1, 1), padding=(1, 1, 1, 1), pad_mode='pad', group=1, has_bias=True)
+ self.conv_132_64 = nn.Conv2d(kernel_size=(3, 3), in_channels=132, out_channels=64, stride=(1, 1),
+ dilation=(1, 1), padding=(1, 1, 1, 1), pad_mode='pad', group=1, has_bias=True)
+ self.conv_65_64 = nn.Conv2d(kernel_size=(3, 3), in_channels=65, out_channels=64, stride=(1, 1), dilation=(1, 1),
+ padding=(1, 1, 1, 1), pad_mode='pad', group=1, has_bias=True)
+ self.conv_68_32 = nn.Conv2d(kernel_size=(3, 3), in_channels=68, out_channels=32, stride=(1, 1), dilation=(1, 1),
+ padding=(1, 1, 1, 1), pad_mode='pad', group=1, has_bias=True)
+ self.conv_69_32 = nn.Conv2d(kernel_size=(3, 3), in_channels=69, out_channels=32, stride=(1, 1), dilation=(1, 1),
+ padding=(1, 1, 1, 1), pad_mode='pad', group=1, has_bias=True)
+ self.conv_33_32 = nn.Conv2d(kernel_size=(3, 3), in_channels=33, out_channels=32, stride=(1, 1), dilation=(1, 1),
+ padding=(1, 1, 1, 1), pad_mode='pad', group=1, has_bias=True)
+
+ def __make_layer(self, channels, num):
+ layers = []
+ for i in range(num):
+ i = i + 1
+ layers.append(nn.Conv2d(in_channels=self.in_channels,
+ out_channels=channels,
+ kernel_size=3,
+ stride=(1, 1),
+ dilation=(1, 1),
+ padding=(1, 1, 1, 1),
+ pad_mode='pad',
+ group=1,
+ has_bias=False)) # same padding
+ layers.append(nn.ReLU())
+ self.in_channels = channels
+ i = i - 1
+ return nn.SequentialCell(*layers)
+
+ def attention5(self, sal_5):
+ att_5a = self.sigmoid(self.conv1_256(sal_5))
+ att_5a = self.softmax(self.flatten(att_5a))
+ att_5b = self.sigmoid(self.dilation_256(self.max_1(sal_5)))
+ att_5b = self.upSampleLike(att_5b, sal_5)
+ att_5b = self.softmax(self.flatten(att_5b))
+
+ att_5 = (att_5a + att_5b) / 2.0 # (2, 1*14*14)->(256, 2, 1*14*14)
+
+ att_5 = self.Tile(att_5, (256, 1, 1)) # (batchsize,14*14)->(256,batchsize,14*14)
+ att_5 = self.Transpose(att_5, (1, 0, 2)) # (2, 256, 14*14)
+ att_5 = self.Reshape(att_5, (-1, 256, 14, 14))
+ att_5 = self.Mul(att_5, sal_5)
+ sal_5 = self.Add(att_5, sal_5) # (2, 256, 14, 14)
+ return sal_5
+
+ def attention4(self, sal_4): # sal_4:2, 128, 28, 28
+ att_4a = self.sigmoid(self.conv_128_1(sal_4)) # (2, 1, 28, 28)
+ att_4a = self.softmax(self.flatten(att_4a)) # (2, 1*28*28)
+
+ att_4b = self.sigmoid(self.dilation_128(self.max_1(sal_4))) # 2, 1, 14, 14
+ att_4b = self.upSampleLike(att_4b, sal_4) # 2, 1, 28, 28
+ att_4b = self.softmax(self.flatten(att_4b)) # (2, 1*28*28)
+
+ att_4c = self.sigmoid(self.dilation_128(self.max_4(sal_4)))
+ att_4c = self.upSampleLike(att_4c, sal_4)
+ att_4c = self.softmax(self.flatten(att_4c)) # (2, 1*28*28)
+
+ att_4 = (att_4a + att_4b + att_4c) / 3.0
+
+ att_4 = self.Tile(att_4, (128, 1, 1)) # (128, 2, 1*28*28)
+ att_4 = self.Transpose(att_4, (1, 0, 2)) # (2, 128, 1*28*28)
+ att_4 = self.Reshape(att_4, (-1, 128, 28, 28))
+ att_4 = self.Mul(att_4, sal_4)
+ sal_4 = self.Add(att_4, sal_4)
+ return sal_4
+
+ def attention3(self, sal_3): # sal_3:2, 128, 56, 56
+ att_3a = self.sigmoid(self.conv_128_1(sal_3)) # (2, 1, 56, 56)
+ att_3a = self.softmax(self.flatten(att_3a)) # (2, 1*56*56)
+
+ att_3b = self.sigmoid(self.dilation_128(self.max_1(sal_3))) # 2, 1, 28, 28
+ att_3b = self.upSampleLike(att_3b, sal_3) # 2, 1, 56, 56
+ att_3b = self.softmax(self.flatten(att_3b)) # (2, 1*56*56)
+
+ att_3c = self.sigmoid(self.dilation_128(self.max_4(sal_3)))
+ att_3c = self.upSampleLike(att_3c, sal_3)
+ att_3c = self.softmax(self.flatten(att_3c)) # (2, 1*56*56)
+
+ att_3d = self.sigmoid(self.dilation_128(self.max_8(sal_3))) # 2, 1, 7, 7
+ att_3d = self.upSampleLike(att_3d, sal_3)
+ att_3d = self.softmax(self.flatten(att_3d)) # (2, 1*56*56)
+
+ att_3 = (att_3a + att_3b + att_3c + att_3d) / 4.0
+
+ att_3 = self.Tile(att_3, (128, 1, 1)) # (128, 2, 1*56*56)
+ att_3 = self.Transpose(att_3, (1, 0, 2)) # (2, 128, 1*56*56)
+ att_3 = self.Reshape(att_3, (-1, 128, 56, 56))
+ att_3 = self.Mul(att_3, sal_3)
+ sal_3 = self.Add(att_3, sal_3)
+ return sal_3
+
+ def attention2(self, sal_2): # (64, 112, 112)
+ att_2a = self.sigmoid(self.conv_64_1(sal_2)) # (2, 1, 112, 112)
+ att_2a = self.softmax(self.flatten(att_2a)) # (2, 1*112*112)
+
+ att_2b = self.sigmoid(self.dilation_64(self.max_1(sal_2))) # 2, 1, 56, 56
+ att_2b = self.upSampleLike(att_2b, sal_2) # 2, 1, 112, 112
+ att_2b = self.softmax(self.flatten(att_2b)) # (2, 1*112*112)
+
+ att_2c = self.sigmoid(self.dilation_64(self.max_4(sal_2))) # 2, 1, 28, 28
+ att_2c = self.upSampleLike(att_2c, sal_2)
+ att_2c = self.softmax(self.flatten(att_2c)) # (2, 1*112*112)
+
+ att_2d = self.sigmoid(self.dilation_64(self.max_8(sal_2))) # 2, 1, 14, 14
+ att_2d = self.upSampleLike(att_2d, sal_2)
+ att_2d = self.softmax(self.flatten(att_2d)) # (2, 1*112*112)
+
+ att_2e = self.sigmoid(self.dilation_64(self.max_16(sal_2)))
+ att_2e = self.upSampleLike(att_2e, sal_2)
+ att_2e = self.softmax(self.flatten(att_2e))
+
+ att_2 = (att_2a + att_2b + att_2c + att_2d + att_2e) / 5.0
+
+ att_2 = self.Tile(att_2, (64, 1, 1)) # (64, 2, 1*112*112)
+ att_2 = self.Transpose(att_2, (1, 0, 2)) # (2, 64, 1*112*112)
+ att_2 = self.Reshape(att_2, (-1, 64, 112, 112))
+ att_2 = self.Mul(att_2, sal_2)
+ sal_2 = self.Add(att_2, sal_2)
+ return sal_2
+
+ def attention1(self, sal_1): # 2, 32, 224, 224
+ att_1a = self.sigmoid(self.conv_32_1(sal_1)) # (2, 1, 224, 224)
+ att_1a = self.softmax(self.flatten(att_1a)) # (2, 1*224*224)
+
+ att_1b = self.sigmoid(self.dilation_32(self.max_1(sal_1))) # 2, 1, 112, 112
+ att_1b = self.upSampleLike(att_1b, sal_1) # 2, 1, 224, 224
+ att_1b = self.softmax(self.flatten(att_1b)) # (2, 1*224*224)
+
+ att_1c = self.sigmoid(self.dilation_32(self.max_4(sal_1)))
+ att_1c = self.upSampleLike(att_1c, sal_1)
+ att_1c = self.softmax(self.flatten(att_1c)) # (2, 1*112*112)
+
+ att_1d = self.sigmoid(self.dilation_32(self.max_8(sal_1))) # 2, 1, 14, 14
+ att_1d = self.upSampleLike(att_1d, sal_1)
+ att_1d = self.softmax(self.flatten(att_1d)) # (2, 1*112*112)
+
+ att_1e = self.sigmoid(self.dilation_32(self.max_16(sal_1)))
+ att_1e = self.upSampleLike(att_1e, sal_1)
+ att_1e = self.softmax(self.flatten(att_1e))
+
+ att_1f = self.sigmoid(self.dilation_32(self.max_32(sal_1)))
+ att_1f = self.upSampleLike(att_1f, sal_1)
+ att_1f = self.softmax(self.flatten(att_1f))
+
+ att_1 = (att_1a + att_1b + att_1c + att_1d + att_1e + att_1f) / 6.0
+
+ att_1 = self.Tile(att_1, (32, 1, 1)) # (32, 2, 1*224*224)
+ att_1 = self.Transpose(att_1, (1, 0, 2)) # (2, 32, 1*224*224)
+ att_1 = self.Reshape(att_1, (-1, 32, 224, 224))
+ att_1 = self.Mul(att_1, sal_1)
+ sal_1 = self.Add(att_1, sal_1)
+ return sal_1
+
+ def construct(self, x):
+ # vgg16
+ x1 = self.conv3_64(x) # x1:(64, 224, 224)
+ x1_max = self.max_1(x1) # x1_max:(64, 112, 112)
+ x2 = self.conv3_128(x1_max) # x2:(128, 112, 112)
+ x2_max = self.max_1(x2) # x2_max:(128, 56, 56)
+ x3 = self.conv3_256(x2_max) # x3:(256, 56, 56)
+ x3_max = self.max_1(x3) # x3_max:(256, 28, 28)
+ x4 = self.conv3_512a(x3_max) # x4:(512, 28, 28)
+ x4_max = self.max_1(x4) # x4_max:(512, 14, 14)
+ x5 = self.conv3_512b(x4_max) # x5:(512, 14, 14)
+ x6 = self.max_1(x5) # x6:(512, 7, 7)
+
+ # sal_conv
+ sal_6 = self.relu(self.salConv6(x6))
+ sal_6 = self.sigmoid(self.salConv6(sal_6)) # sal_6:(512, 7, 7)
+
+ sal_5 = self.relu(self.salConv5(x5))
+ sal_5 = self.sigmoid(self.salConv5(sal_5)) # sal_5:(512, 14, 14)
+
+ sal_4 = self.relu(self.salConv4(x4))
+ sal_4 = self.sigmoid(self.salConv4_1(sal_4)) # sal_4:(256, 28, 28)
+
+ sal_3 = self.relu(self.salConv3(x3))
+ sal_3 = self.sigmoid(self.salConv3(sal_3)) # sal_3:(256, 56, 56)
+
+ sal_2 = self.relu(self.salConv2(x2))
+ sal_2 = self.sigmoid(self.salConv2(sal_2)) # sal_2:(128, 112, 112)
+
+ sal_1 = self.relu(self.salConv1(x1))
+ sal_1 = self.sigmoid(self.salConv1(sal_1)) # sal_1:(64, 224, 224)
+
+ # edge_conv
+ edg_5 = self.relu(self.edgeConv5(x5))
+ edg_5 = self.sigmoid(self.edgeConv5(edg_5)) # edg_5:(512, 14, 14)
+
+ edg_4 = self.relu(self.edgeConv4(x4))
+ edg_4 = self.sigmoid(self.edgeConv4_1(edg_4)) # edg_4:(256, 28, 28)
+
+ edg_3 = self.relu(self.edgeConv3(x3))
+ edg_3 = self.sigmoid(self.edgeConv3(edg_3)) # edg_3:(256, 56, 56)
+
+ edg_2 = self.relu(self.edgeConv2(x2))
+ edg_2 = self.sigmoid(self.edgeConv2(edg_2)) # edg_2:(128, 112, 112)
+
+ edg_1 = self.relu(self.edgeConv1(x1))
+ edg_1 = self.sigmoid(self.edgeConv1(edg_1)) # edg_1:(64, 224, 224) sigmoid-88
+
+ # saliency from sal_6 sal_6_up
+ saliency6 = self.sigmoid(self.conv1_512(sal_6)) # saliency6_up:sigmoid((1, 7, 7))
+ saliency6_up = self.upSampleLike(saliency6, x1) # saliency6_up:(1, 224, 224)
+
+ # saliency from sal_5 sal_5_up edge5 edge5_up
+ edge5 = self.sigmoid(self.conv1_512(edg_5)) # edge5:(1, 14, 14) sigmoid-92
+ edge5_up = self.upSampleLike(edge5, x1) # edge5_up:(2, 1, 224, 224)
+
+ sal_5 = P.Concat(axis=1)([sal_5, self.upSampleLike(saliency6, sal_5)])
+ sal_5 = self.sigmoid(self.conv3_513(sal_5)) # sal_5: 256, 14, 14 sigmoid-94
+
+ sal_5 = P.Concat(axis=1)([self.attention5(sal_5), edge5]) # sal_5:(257, 14, 14)
+ sal_5 = self.sigmoid(self.conv3_257(sal_5)) # sal_5:(256, 14, 14)
+ saliency5 = self.sigmoid(self.conv1_256(sal_5)) # saliency:(1, 14, 14) sigmoid-107
+ sal_5_up = self.upSampleLike(saliency5, x1) # sal_5_up:(1, 224, 224)
+
+ # saliency from sal_4 sal_4_up edge4 edge4_up
+ edg_4 = P.Concat(axis=1)([edg_4, self.upSampleLike(edge5, sal_4)]) # (257, 28, 28)
+ edg_4 = self.sigmoid(self.conv257_128(edg_4)) # (128, 28, 28) sigmoid-109
+ edge4 = self.sigmoid(self.conv1_128(edg_4)) # edge5:(1, 28, 28) sigmoid-111
+ edge4_up = self.upSampleLike(edge4, x1) # edge5_up:(2, 1, 224, 224)
+
+ sal_4 = P.Concat(axis=1)(
+ [sal_4, self.upSampleLike(saliency6, sal_4), self.upSampleLike(saliency5, sal_4)]) # (258, 28, 28)
+ sal_4 = self.sigmoid(self.conv_258_128(sal_4)) # sigmoid-112
+
+ sal_4 = P.Concat(axis=1)([self.attention4(sal_4), edge4])
+ sal_4 = self.sigmoid(self.conv_129_128(sal_4)) # sigmoid-126
+
+ saliency4 = self.sigmoid(self.conv1_128(sal_4)) # sigmoid-128
+ sal_4_up = self.upSampleLike(saliency4, x1)
+
+ # saliency from sal_3 sal_3_up edge3 edge3_up
+ edg_3 = P.Concat(axis=1)(
+ [edg_3, self.upSampleLike(edge5, sal_3), self.upSampleLike(edge4, sal_3)]) # (258, 56, 56)
+ edg_3 = self.sigmoid(self.conv_258_128(edg_3)) # (128, 56, 56)
+ edge3 = self.sigmoid(self.conv1_128(edg_3))
+ edge3_up = self.upSampleLike(edge3, x1)
+
+ sal_3 = P.Concat(axis=1)([sal_3, self.upSampleLike(saliency6, sal_3), self.upSampleLike(saliency5, sal_3),
+ self.upSampleLike(saliency4, sal_3)])
+ sal_3 = self.sigmoid(self.conv_259_128(sal_3)) # 2, 128, 56, 56
+
+ sal_3 = P.Concat(axis=1)([self.attention3(sal_3), edge3])
+ sal_3 = self.sigmoid(self.conv_129_128(sal_3)) # sigmoid-151
+
+ saliency3 = self.sigmoid(self.conv1_128(sal_3)) # sigmoid-153
+ sal_3_up = self.upSampleLike(saliency3, x1)
+
+ # saliency from sal_2 sal_2_up edge2 edge2_up
+ edg_2 = P.Concat(axis=1)([edg_2, self.upSampleLike(edge5, edg_2), self.upSampleLike(edge4, edg_2),
+ self.upSampleLike(edge3, edg_2)]) # (131, 112, 112)
+ edg_2 = self.sigmoid(self.conv_131_64(edg_2)) # (64, 112, 112)
+
+ edge2 = self.sigmoid(self.conv1_64(edg_2))
+ edge2_up = self.upSampleLike(edge2, x1)
+
+ sal_2 = P.Concat(axis=1)([sal_2, self.upSampleLike(saliency6, sal_2), self.upSampleLike(saliency5, sal_2),
+ self.upSampleLike(saliency4, sal_2), self.upSampleLike(saliency3, sal_2)])
+ sal_2 = self.sigmoid(self.conv_132_64(sal_2)) # (2, 64, 112, 112)
+
+ sal_2 = P.Concat(axis=1)([self.attention2(sal_2), edge2])
+ sal_2 = self.sigmoid(self.conv_65_64(sal_2))
+
+ saliency2 = self.sigmoid(self.conv1_64(sal_2))
+ sal_2_up = self.upSampleLike(saliency2, x1)
+
+ # saliency from sal_1 sal_1_up edge1 edge1_up
+ edg_1 = P.Concat(axis=1)(
+ [edg_1, self.upSampleLike(edge5, sal_1), self.upSampleLike(edge4, sal_1), self.upSampleLike(edge3, sal_1),
+ self.upSampleLike(edge2, sal_1)]) # (68,224,224)
+ edg_1 = self.sigmoid(self.conv_68_32(edg_1)) # (32, 224, 224)
+ edge1 = self.sigmoid(self.conv1_32(edg_1))
+
+ sal_1 = P.Concat(axis=1)([sal_1, self.upSampleLike(saliency6, sal_1), self.upSampleLike(saliency5, sal_1),
+ self.upSampleLike(saliency4, sal_1), self.upSampleLike(saliency3, sal_1),
+ self.upSampleLike(saliency2, sal_1)])
+ sal_1 = self.sigmoid(self.conv_69_32(sal_1)) # 32, 224, 224
+
+ sal_1 = P.Concat(axis=1)([self.attention1(sal_1), edge1])
+ sal_1 = self.sigmoid(self.conv_33_32(sal_1))
+
+ saliency1 = self.sigmoid(self.conv1_32(sal_1)) # (1, 224, 224)
+
+ return [saliency6_up, edge5_up, sal_5_up, edge4_up, sal_4_up, edge3_up, sal_3_up, edge2_up, sal_2_up, saliency1,
+ edge1]
+
+
+if __name__ == "__main__":
+ m = MindsporeModel()
+ print(m)
diff --git a/research/cv/PAGENet/src/train_loss.py b/research/cv/PAGENet/src/train_loss.py
new file mode 100644
index 0000000000000000000000000000000000000000..b976d61f663db0e0ecfbf89bd09f1ff12e21de99
--- /dev/null
+++ b/research/cv/PAGENet/src/train_loss.py
@@ -0,0 +1,66 @@
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+
+"""SalEdgeLoss define"""
+
+import mindspore as ms
+from mindspore import nn
+from mindspore import Parameter
+
+
+class total_loss(nn.Cell):
+ def __init__(self):
+ super(total_loss, self).__init__()
+ self.loss_fn1 = nn.MSELoss()
+ self.loss_fn2 = nn.BCELoss(reduction="mean")
+ self.zero = ms.Tensor(0, dtype=ms.float32)
+ # for log
+ self.sal_loss = Parameter(default_input=0.0, requires_grad=False)
+ self.edge_loss = Parameter(default_input=0.0, requires_grad=False)
+ self.total_loss = Parameter(default_input=0.0, requires_grad=False)
+
+ def construct(self, pres, gts, edges):
+ loss_edg_5 = self.loss_fn1(pres[1], edges)
+ loss_sal_5 = self.loss_fn2(pres[2], gts)
+ loss_5 = loss_sal_5 + loss_edg_5
+ loss_4 = self.loss_fn1(pres[3], edges) + self.loss_fn2(pres[4], gts)
+ loss_3 = self.loss_fn1(pres[5], edges) + self.loss_fn2(pres[6], gts)
+ loss_2 = self.loss_fn1(pres[7], edges) + self.loss_fn2(pres[8], gts)
+ loss_1 = self.loss_fn1(pres[10], edges) + self.loss_fn2(pres[9], gts)
+ loss = loss_1 + loss_2 + loss_3 + loss_4 + loss_5
+ return loss
+
+
+class WithLossCell(nn.Cell):
+ """
+ loss cell
+ """
+
+ def __init__(self, backbone, loss_fn):
+ super(WithLossCell, self).__init__(auto_prefix=False)
+ self.backbone = backbone
+ self.loss_fn = loss_fn
+
+ def construct(self, data, gts, edges):
+ """
+ compute loss
+ """
+ pres = self.backbone(data)
+ return self.loss_fn(pres, gts, edges)
+
+ @property
+ def backbone_network(self):
+ return self.backbone
diff --git a/research/cv/PAGENet/train.py b/research/cv/PAGENet/train.py
new file mode 100644
index 0000000000000000000000000000000000000000..23093cc369e6d09fc875c811bf3b8531f538ac23
--- /dev/null
+++ b/research/cv/PAGENet/train.py
@@ -0,0 +1,104 @@
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+import time
+import argparse
+import mindspore
+import mindspore.nn as nn
+from mindspore import context
+from mindspore.communication.management import init, get_rank, get_group_size
+from config import MODE, device_target, train_size, train_img_path, train_edge_path, train_gt_path, batch_size, EPOCH, \
+ LR, WD
+from src.pagenet import MindsporeModel
+from src.train_loss import total_loss, WithLossCell
+from src.mind_dataloader_final import get_train_loader
+
+
+def main(train_mode='single'):
+ context.set_context(mode=MODE,
+ device_target=device_target,
+ reserve_class_name_in_scope=False)
+
+ if train_mode == 'single':
+ # env set
+
+ # dataset
+ train_loader = get_train_loader(train_img_path, train_gt_path, train_edge_path, batchsize=batch_size,
+ trainsize=train_size)
+ train_data_size = train_loader.get_dataset_size()
+ # epoch
+ epoch = EPOCH
+ else:
+ init()
+ rank_id = get_rank()
+ device_num = get_group_size()
+ context.set_auto_parallel_context(device_num=device_num, gradients_mean=True,
+ parallel_mode=context.ParallelMode.DATA_PARALLEL)
+
+ # dataset
+ train_loader = get_train_loader(train_img_path, train_gt_path, train_edge_path, device_num=device_num,
+ rank_id=rank_id, num_parallel_workers=8, batchsize=batch_size,
+ trainsize=train_size)
+
+ train_data_size = train_loader.get_dataset_size()
+ # epoch
+ epoch = EPOCH * 2
+
+ # setup train_parameters
+
+ model = MindsporeModel()
+ # loss function
+ loss_fn = total_loss()
+
+ # learning_rate and optimizer
+
+ optimizer = nn.Adam(model.trainable_params(), learning_rate=LR, weight_decay=WD)
+
+ # train model
+ net_with_loss = WithLossCell(model, loss_fn)
+ train_network = nn.TrainOneStepCell(net_with_loss, optimizer)
+ train_network.set_train()
+
+ data_iterator = train_loader.create_tuple_iterator(num_epochs=epoch)
+
+ start = time.time()
+ for i in range(epoch):
+ total_train_step = 0
+ for imgs, gts, edges in data_iterator:
+
+ loss = train_network(imgs, gts, edges)
+
+ total_train_step = total_train_step + 1
+
+ if total_train_step % 10 == 0:
+ print("epoch: {}, step: {}/{}, loss: {}".format(i, total_train_step, train_data_size, loss))
+
+ if train_mode == 'single':
+ mindspore.save_checkpoint(train_network, "PAGENET" + '.ckpt')
+ print("PAGENET.ckpt" + " have saved!")
+
+ else:
+ mindspore.save_checkpoint(train_network, "PAGENET" + str(get_rank()) + '.ckpt')
+ print("PAGENET" + str(get_rank()) + '.ckpt' + " have saved!")
+ end = time.time()
+ total = end - start
+ print("total time is {}h".format(total / 3600))
+ print("step time is {}s".format(total / (train_data_size * epoch)))
+
+
+if __name__ == "__main__":
+ parser = argparse.ArgumentParser(description='manual to this script')
+ parser.add_argument('-m', '--train_mode', type=str)
+ args = parser.parse_args()
+ main(args.train_mode)