!3472 [兰州大学][高校贡献][Mindspore][ResNet34]-CPU模型迁移训练+推理提交

Merge pull request !3472 from 罗诺/master

!3472 [兰州大学][高校贡献][Mindspore][ResNet34]-CPU模型迁移训练+推理提交
Merge pull request !3472 from 罗诺/master
d852fcdd · i-robot · Gitee · 05f17a74 · 777b12a0 · d852fcdd
Unverified Commit d852fcdd authored 2 years ago by i-robot Committed by Gitee 2 years ago
--- a/official/cv/resnet/README_CN.md
+++ b/official/cv/resnet/README_CN.md
@@ -208,6 +208,7 @@ bash run_eval_gpu.sh [DATASET_PATH] [CHECKPOINT_PATH]  [CONFIG_PATH]
    ├── resnet18_imagenet2012_config_gpu.yaml
    ├── resnet34_imagenet2012_config.yaml
    ├── resnet50_cifar10_config.yaml
+    ├── resnet34_cpu_config.yaml
    ├── resnet50_imagenet2012_Boost_config.yaml     # 高性能版本：性能提高超过10%而精度下降少于1%
    ├── resnet50_imagenet2012_Ascend_Thor_config.yaml
    ├── resnet50_imagenet2012_config.yaml
@@ -226,6 +227,7 @@ bash run_eval_gpu.sh [DATASET_PATH] [CHECKPOINT_PATH]  [CONFIG_PATH]
    ├── run_standalone_train_gpu.sh        # 启动GPU单机训练（单卡）
    └── cache_util.sh                      # 使用单节点緩存的帮助函数
  ├── src
+    ├── data_split.py                      # 切分迁移数据集脚本（cpu）
    ├── dataset.py                         # 数据预处理
    ├── eval_callback.py                   # 训练时推理回调函数
    ├── CrossEntropySmooth.py              # ImageNet2012数据集的损失定义
@@ -236,6 +238,9 @@ bash run_eval_gpu.sh [DATASET_PATH] [CHECKPOINT_PATH]  [CONFIG_PATH]
       ├── device_adapter.py               # 设备配置
       ├── local_adapter.py                # 本地设备配置
       └── moxing_adapter.py               # modelarts设备配置
+  ├── fine_tune.py                         # 迁移训练网络（cpu）
+  ├── quick_start.py                       # quick start演示文件（cpu）
+  ├── requirements.txt                     # 第三方依赖
  ├── eval.py                              # 评估网络
  └── train.py                             # 训练网络
 ```
@@ -475,6 +480,48 @@ bash run_standalone_train_gpu.sh [DATASET_PATH] [CONFIG_PATH] [RUN_EVAL](optiona

 在训练结束后，可以选择关闭缓存服务器或不关闭它以继续为未来的推理提供缓存服务。

+## 迁移训练过程
+
+### 迁移数据集处理
+
+[根据提供的数据集链接下载数据集](https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz ),将切分数据集脚本data_split.py放置在下载好的flower_photos目录下，运行后会生成train文件夹及test文件夹，将train文件夹及test文件夹保存到新建文件夹datasets里.
+
+### 迁移训练Ckpt获取
+
+[根据提供的Ckpt链接下载数据集](https://download.mindspore.cn/models/r1.5/ ),Ckpt文件名称为“resnet34_ascend_v170_imagenet2012_official_cv_top1acc73.61_top5acc91.74.ckpt”，下载后存放在fine_tune.py同路径下。
+
+### 用法
+
+您可以通过python脚本开始训练：
+
+```shell
+python fine_tune.py --config_path ./config/resnet34_cpu_config.yaml
+```
+
+### 结果
+
+- 使用flower_photos据集训练ResNet34
+
+```text
+# 迁移训练结果（CPU）
+epoch: 1 step: 1, loss is 1.5975518
+epoch: 1 step: 2, loss is 1.5453123
+epoch: 1 step: 3, loss is 1.3293151
+epoch: 1 step: 4, loss is 1.491757
+epoch: 1 step: 5, loss is 1.3044931
+...
+```
+
+## 迁移训练推理过程
+
+### 用法
+
+您可以通过python脚本开始推理(需要先到resnet34_cpu_config.yaml配置文件中将ckpt_path设为最好的ckpt文件路径):
+
+```shell
+python eval.py --config_path ./cpu_default_config.yaml --data_path ./dataset/flower_photos/test
+```
+
 ## 续训过程

 ### 用法
@@ -1039,6 +1086,26 @@ result:{'top_1_accuracy': 0.928385416666666} prune_rate=0.45 ckpt=~/resnet50_cif
 | 微调检查点| 166M（.ckpt文件）                                         |
 | 配置文件                    | [链接](https://gitee.com/mindspore/models/tree/master/official/cv/resnet/config) | [链接](https://gitee.com/mindspore/models/tree/master/official/cv/resnet/config)|

+#### flower_photos上的ResNet34
+
+| 参数                 | CPU                                                                           |
+| -------------------------- |-------------------------------------------------------------------------------|
+| 模型版本              | ResNet34                                                                      |
+| 资源                   | CPU 3.40GHz，4核；内存 8G；系统 win7                                                  |
+| 上传日期              | 2022-08-30                                                                    |
+| MindSpore版本          | 1.8.1                                                                         |
+| 数据集                    | flower_photos                                                                 |
+| 训练参数        | epoch=10, steps per epoch=85, batch_size = 32                                 |
+| 优化器                  | Momentum                                                                      |
+| 损失函数              | Softmax交叉熵                                                                    |
+| 输出                    | 概率                                                                            |
+| 损失                       | 0.32727173                                                                    |
+| 速度                      | 6859毫秒/步                                                                      |
+| 总时长                 | 70分钟                                                                          |
+| 参数(M)             | 20.28                                                                         |
+| 微调检查点| 81M（.ckpt文件）                                                                  |
+| 配置文件                    | [链接](https://gitee.com/mindspore/models/tree/master/official/cv/resnet/config) |
+
 #### ImageNet2012上的ResNet101

 | 参数                 | Ascend 910                                                   |   GPU |

--- a/official/cv/resnet/config/resnet34_cpu_config.yaml
+++ b/official/cv/resnet/config/resnet34_cpu_config.yaml
+# Builtin Configurations(DO NOT CHANGE THESE CONFIGURATIONS unless you know exactly what you are doing)
+enable_modelarts: False
+# Url for modelarts
+data_url: ""
+train_url: ""
+checkpoint_url: ""
+# Path for local
+run_distribute: False
+enable_profiling: False
+data_path: "./datasets/"
+output_path: "/cache/train"
+load_path: "/cache/checkpoint_path/"
+device_target: "CPU"
+checkpoint_path: "./resnet34_ascend_v170_imagenet2012_official_cv_top1acc73.61_top5acc91.74.ckpt"
+checkpoint_file_path: "./best_acc.ckpt"
+
+# ==============================================================================
+# Training options
+optimizer: "Momentum"
+infer_label: ""
+class_num: 5
+batch_size: 32
+loss_scale: 1024
+momentum: 0.9
+weight_decay: 0.0001
+epoch_size: 10
+pretrain_epoch_size: 0
+save_checkpoint: True
+save_checkpoint_epochs: 5
+keep_checkpoint_max: 10
+warmup_epochs: 5
+learning_rate: 0.001
+lr_decay_mode: "poly"
+lr_init: 0.01
+lr_end: 0.00001
+lr_max: 0.1
+lars_epsilon: 0.0
+lars_coefficient: 0.001
+
+net_name: "resnet34"
+dataset: "flower_photos"
+device_num: 1
+pre_trained: ""
+run_eval: False
+eval_dataset_path: ""
+parameter_server: False
+filter_weight: False
+save_best_ckpt: True
+eval_start_epoch: 40
+eval_interval: 1
+enable_cache: False
+cache_session_id: ""
+mode_name: "GRAPH"
+boost_mode: "O0"
+conv_init: "XavierUniform"
+dense_init: "TruncatedNormal"
+train_image_size: 224
+eval_image_size: 224
+
+# Export options
+device_id: 0
+width: 224
+height: 224
+file_name: "resnet34"
+file_format: "MINDIR"
+ckpt_file: ""
+network_dataset: "resnet34_flower_photos"
+
+# Retrain options
+save_graphs: False
+save_graphs_path: "./graphs"
+has_trained_epoch: 0
+has_trained_step: 0
+
+# postprocess resnet inference
+result_path: ''
+label_path: ''
+
+---
+# Help description for each configuration
+enable_modelarts: "Whether training on modelarts, default: False"
+data_url: "Dataset url for obs"
+checkpoint_url: "The location of checkpoint for obs"
+data_path: "Dataset path for local"
+output_path: "Training output path for local"
+load_path: "The location of checkpoint for obs"
+device_target: "Target device type, available: [Ascend, GPU, CPU]"
+enable_profiling: "Whether enable profiling while training, default: False"
+num_classes: "Class for dataset"
+batch_size: "Batch size for training and evaluation"
+epoch_size: "Total training epochs."
+checkpoint_path: "The location of the checkpoint file."
+checkpoint_file_path: "The location of the checkpoint file."
+save_graphs: "Whether save graphs during training, default: False."
+save_graphs_path: "Path to save graphs."
--- a/official/cv/resnet/eval.py
+++ b/official/cv/resnet/eval.py
@@ -35,6 +35,7 @@ if config.net_name in ("resnet18", "resnet34", "resnet50", "resnet152"):
        from src.dataset import create_dataset1 as create_dataset
    else:
        from src.dataset import create_dataset2 as create_dataset
+
 elif config.net_name == "resnet101":
    from src.resnet import resnet101 as resnet
    from src.dataset import create_dataset3 as create_dataset
@@ -46,7 +47,6 @@ else:
 def eval_net():
    """eval net"""
    target = config.device_target
-
    # init context
    ms.set_context(mode=ms.GRAPH_MODE, device_target=target, save_graphs=False)
    if target == "Ascend":

--- a/official/cv/resnet/fine_tune.py
+++ b/official/cv/resnet/fine_tune.py
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""train resnet34."""
+
+import os
+import mindspore as ms
+import mindspore.nn as nn
+
+from mindspore.train.model import Model
+from mindspore.train.callback import TimeMonitor
+from mindspore.nn.loss import SoftmaxCrossEntropyWithLogits
+from train import LossCallBack
+from src.resnet import resnet34
+from src.dataset import create_dataset2
+from src.model_utils.config import config
+from src.eval_callback import EvalCallBack
+
+
+ms.set_context(mode=ms.GRAPH_MODE, device_target=config.device_target, save_graphs=False)
+
+
+def import_data():
+    dataset_train = create_dataset2(dataset_path=os.path.join(config.data_path, "train"), do_train=True,
+                                    batch_size=config.batch_size, train_image_size=config.train_image_size,
+                                    eval_image_size=config.eval_image_size, target=config.device_target,
+                                    distribute=False, enable_cache=False, cache_session_id=None)
+    dataset_val = create_dataset2(dataset_path=os.path.join(config.data_path, "test"), do_train=True,
+                                  batch_size=config.batch_size, train_image_size=config.train_image_size,
+                                  eval_image_size=config.eval_image_size, target=config.device_target,
+                                  distribute=False, enable_cache=False, cache_session_id=None)
+    #
+    data = next(dataset_train.create_dict_iterator())
+    images = data["image"]
+    labels = data["label"]
+    print("Tensor of image", images.shape)
+    print("Labels:", labels)
+
+    return dataset_train, dataset_val
+
+
+# define the head layer
+class DenseHead(nn.Cell):
+    def __init__(self, input_channel, num_classes):
+        super(DenseHead, self).__init__()
+        self.dense = nn.Dense(input_channel, num_classes)
+
+    def construct(self, x):
+        return self.dense(x)
+
+
+def filter_checkpoint_parameter_by_list(origin_dict, param_filter):
+    """remove useless parameters according to filter_list"""
+    for key in list(origin_dict.keys()):
+        for name in param_filter:
+            if name in key:
+                print("Delete parameter from checkpoint: ", key)
+                del origin_dict[key]
+                break
+
+
+def init_weight(net, param_dict):
+    """init_weight"""
+    if param_dict:
+        if config.filter_weight:
+            filter_list = [x.name for x in net.end_point.get_parameters()]
+            filter_checkpoint_parameter_by_list(param_dict, filter_list)
+        ms.load_param_into_net(net, param_dict)
+
+
+def eval_net(net, dataset):
+    """eval net"""
+    net.set_train(False)
+
+    loss = SoftmaxCrossEntropyWithLogits(sparse=True, reduction='mean')
+
+    # define model
+    model = ms.Model(net, loss_fn=loss, metrics={'top_1_accuracy', 'top_5_accuracy'})
+
+    # eval model
+    res = model.eval(dataset)
+    print("result:", res)
+
+
+def apply_eval(eval_param):
+    eval_model = eval_param["model"]
+    eval_ds = eval_param["dataset"]
+    metrics_name = eval_param["metrics_name"]
+    res = eval_model.eval(eval_ds)
+    print("res:", res)
+    return res[metrics_name]
+
+
+def run_eval(model, ckpt_save_dir, eval_dataset, cb):
+    """run_eval"""
+    eval_param_dict = {"model": model, "dataset": eval_dataset, "metrics_name": "top_1_accuracy"}
+    eval_cb = EvalCallBack(apply_eval, eval_param_dict, interval=1,
+                           eval_start_epoch=0, save_best_ckpt=True,
+                           ckpt_directory=ckpt_save_dir, besk_ckpt_name="best_acc.ckpt",
+                           metrics_name="acc")
+    cb += [eval_cb]
+
+
+def finetune_train():
+    dataset_train, data_val = import_data()
+
+    ckpt_param_dict = ms.load_checkpoint(config.checkpoint_path)
+    net = resnet34(class_num=1001)
+    init_weight(net=net, param_dict=ckpt_param_dict)
+    print("net parameter:")
+    for param in net.get_parameters():
+        print("param:", param)
+
+    # fully Connected layer the size of the input layer
+    src_head = net.end_point
+    in_channels = src_head.in_channels
+    # the number of output channels is 5
+    head = DenseHead(in_channels, config.class_num)
+    # reset the fully connected layer
+    net.end_point = head
+
+    print("net.get_parameters():", net.get_parameters())
+    print("net.trainable_params():", net.trainable_params())
+    # freeze all parameters except the last layer
+    for param in net.get_parameters():
+        if param.name not in ["end_point.dense.weight", "end_point.dense.bias"]:
+            param.requires_grad = False
+        if param.name == "end_point.dense.weight":
+            param.name = "end_point.weight"
+        if param.name == "end_point.dense.bias":
+            param.name = "end_point.bias"
+
+    # define optimizer and loss function
+    opt = nn.Momentum(params=net.trainable_params(), learning_rate=config.learning_rate, momentum=config.momentum)
+    loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction='mean')
+
+    # instantiating the model
+    model = Model(net, loss, opt, metrics={'top_1_accuracy', 'top_5_accuracy'})
+
+    # define callbacks
+    step_size = dataset_train.get_dataset_size()
+    time_cb = TimeMonitor(data_size=step_size)
+    loss_cb = LossCallBack(config.has_trained_epoch)
+    cb = [time_cb, loss_cb]
+
+    run_eval(model, "./", data_val, cb)
+
+    num_epochs = config.epoch_size
+    model.train(num_epochs, dataset_train, callbacks=cb)
+
+    eval_net(net, data_val)
+
+
+if __name__ == '__main__':
+    finetune_train()
--- a/official/cv/resnet/quick_start.py
+++ b/official/cv/resnet/quick_start.py
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""quick start"""
+# ## This paper mainly visualizes the prediction data, uses the model to predict, and visualizes the prediction results.
+import os
+import numpy as np
+import matplotlib.pyplot as plt
+from mindspore import Tensor
+from mindspore.train import Model
+from mindspore import load_checkpoint, load_param_into_net
+from src.dataset import create_dataset2
+from src.resnet import resnet34
+from src.model_utils.config import config
+
+
+# class_name corresponds to label,and labels are marked in the order of folders
+class_name = {0: "daisy", 1: "dandelion", 2: "roses", 3: "sunflowers", 4: "tulips"}
+
+
+# define visual prediction data functions：
+def visual_input_data(val_dataset):
+    data = next(val_dataset.create_dict_iterator())
+    images = data["image"]
+    labels = data["label"]
+    print("Tensor of image", images.shape)
+    print("Labels:", labels)
+    plt.figure(figsize=(15, 7))
+    for i in range(len(labels)):
+        # get the image and its corresponding label
+        data_image = images[i].asnumpy()
+        # data_label = labels[i]
+        # process images for display
+        data_image = np.transpose(data_image, (1, 2, 0))
+        mean = np.array([0.485, 0.456, 0.406])
+        std = np.array([0.229, 0.224, 0.225])
+        data_image = std * data_image + mean
+        data_image = np.clip(data_image, 0, 1)
+        # display image
+        plt.subplot(4, 8, i+1)
+        plt.imshow(data_image)
+        plt.title(class_name[int(labels[i].asnumpy())], fontsize=10)
+        plt.axis("off")
+
+    plt.show()
+
+
+# define visualize_model()，visualize model prediction
+def visualize_model(best_ckpt_path, val_ds):
+    net = resnet34(class_num=config.class_num)
+    # load model parameters
+    param_dict = load_checkpoint(best_ckpt_path)
+    load_param_into_net(net, param_dict)
+    model = Model(net)
+    # load the data of the validation set for validation
+    data = next(val_ds.create_dict_iterator())
+    images = data["image"].asnumpy()
+    labels = data["label"].asnumpy()
+    flower_class_name = {0: "daisy", 1: "dandelion", 2: "roses", 3: "sunflowers", 4: "tulips"}
+    # prediction image category
+    output = model.predict(Tensor(data['image']))
+    pred = np.argmax(output.asnumpy(), axis=1)
+
+    # display the image and the predicted value of the image
+    plt.figure(figsize=(15, 7))
+    for i in range(len(labels)):
+        plt.subplot(4, 8, i + 1)
+        # if the prediction is correct, it is displayed in blue; if the prediction is wrong, it is displayed in red
+        color = 'blue' if pred[i] == labels[i] else 'red'
+        plt.title('predict:{}'.format(flower_class_name[pred[i]]), color=color)
+        picture_show = np.transpose(images[i], (1, 2, 0))
+        mean = np.array([0.485, 0.456, 0.406])
+        std = np.array([0.229, 0.224, 0.225])
+        picture_show = std * picture_show + mean
+        picture_show = np.clip(picture_show, 0, 1)
+        plt.imshow(picture_show)
+        plt.axis('off')
+
+    plt.show()
+
+
+# the best ckpt file obtained by model tuning is used to predict the images of the validation set
+# (need to go to resnet34_cpu_config.yaml set ckpt_path as the best ckpt file path）
+if __name__ == '__main__':
+    # load inference dataset
+    dataset_val = create_dataset2(dataset_path=os.path.join(config.data_path, "flower_photos/"), do_train=True,
+                                  batch_size=config.batch_size, train_image_size=config.train_image_size,
+                                  eval_image_size=config.eval_image_size, target=config.device_target,
+                                  distribute=False, enable_cache=False, cache_session_id=None)
+
+    visual_input_data(dataset_val)
+    # the best ckpt file obtained by model tuning is used to predict the images of the validation set
+    # (need to go to resnet34_cpu_config.yaml set ckpt_path as the best ckpt file path）
+    visualize_model('best_acc.ckpt', dataset_val)
--- a/official/cv/resnet/requirements.txt
+++ b/official/cv/resnet/requirements.txt
 numpy
 scipy
 easydict
+matplotlib
\ No newline at end of file
--- a/official/cv/resnet/src/data_split.py
+++ b/official/cv/resnet/src/data_split.py
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""
+cpu_cut_data.
+"""
+import os
+import shutil
+
+
+def generate_data():
+    dirs = []
+    path = "./"
+    abs_path = None
+    for abs_path, j, _ in os.walk(path):
+        print("abs_path:", abs_path)
+        if len(j).__trunc__() > 0:
+            dirs.append(j)
+    print(dirs)
+
+    train_folder = os.path.exists("./train")
+    if not train_folder:
+        os.makedirs("./train")
+    test_folder = os.path.exists("./test")
+    if not test_folder:
+        os.makedirs("./test")
+
+    for di in dirs[0]:
+        files = os.listdir(di)
+        train_set = files[: int(len(files) * 3 / 4)]
+        test_set = files[int(len(files) * 3 / 4):]
+        for file in train_set:
+            file_path = "./train/" + di + "/"
+            folder = os.path.exists(file_path)
+            if not folder:
+                os.makedirs(file_path)
+            src_file = "./" + di + "/" + file
+            print("src_file:", src_file)
+            dst_file = file_path + file
+            print("dst_file:", dst_file)
+            shutil.copyfile(src_file, dst_file)
+
+        for file in test_set:
+            file_path = "./test/" + di + "/"
+            folder = os.path.exists(file_path)
+            if not folder:
+                os.makedirs(file_path)
+            src_file = "./" + di + "/" + file
+            dst_file = file_path + file
+            shutil.copyfile(src_file, dst_file)
+
+
+if __name__ == '__main__':
+    generate_data()