Skip to content
Snippets Groups Projects
Unverified Commit d852fcdd authored by i-robot's avatar i-robot Committed by Gitee
Browse files

!3472 [兰州大学][高校贡献][Mindspore][ResNet34]-CPU模型迁移训练+推理提交

Merge pull request !3472 from 罗诺/master
parents 05f17a74 777b12a0
No related branches found
No related tags found
No related merge requests found
......@@ -208,6 +208,7 @@ bash run_eval_gpu.sh [DATASET_PATH] [CHECKPOINT_PATH] [CONFIG_PATH]
├── resnet18_imagenet2012_config_gpu.yaml
├── resnet34_imagenet2012_config.yaml
├── resnet50_cifar10_config.yaml
├── resnet34_cpu_config.yaml
├── resnet50_imagenet2012_Boost_config.yaml # 高性能版本:性能提高超过10%而精度下降少于1%
├── resnet50_imagenet2012_Ascend_Thor_config.yaml
├── resnet50_imagenet2012_config.yaml
......@@ -226,6 +227,7 @@ bash run_eval_gpu.sh [DATASET_PATH] [CHECKPOINT_PATH] [CONFIG_PATH]
├── run_standalone_train_gpu.sh # 启动GPU单机训练(单卡)
└── cache_util.sh # 使用单节点緩存的帮助函数
├── src
├── data_split.py # 切分迁移数据集脚本(cpu)
├── dataset.py # 数据预处理
├── eval_callback.py # 训练时推理回调函数
├── CrossEntropySmooth.py # ImageNet2012数据集的损失定义
......@@ -236,6 +238,9 @@ bash run_eval_gpu.sh [DATASET_PATH] [CHECKPOINT_PATH] [CONFIG_PATH]
├── device_adapter.py # 设备配置
├── local_adapter.py # 本地设备配置
└── moxing_adapter.py # modelarts设备配置
├── fine_tune.py # 迁移训练网络(cpu)
├── quick_start.py # quick start演示文件(cpu)
├── requirements.txt # 第三方依赖
├── eval.py # 评估网络
└── train.py # 训练网络
```
......@@ -475,6 +480,48 @@ bash run_standalone_train_gpu.sh [DATASET_PATH] [CONFIG_PATH] [RUN_EVAL](optiona
在训练结束后,可以选择关闭缓存服务器或不关闭它以继续为未来的推理提供缓存服务。
## 迁移训练过程
### 迁移数据集处理
[根据提供的数据集链接下载数据集](https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz ),将切分数据集脚本data_split.py放置在下载好的flower_photos目录下,运行后会生成train文件夹及test文件夹,将train文件夹及test文件夹保存到新建文件夹datasets里.
### 迁移训练Ckpt获取
[根据提供的Ckpt链接下载数据集](https://download.mindspore.cn/models/r1.5/ ),Ckpt文件名称为“resnet34_ascend_v170_imagenet2012_official_cv_top1acc73.61_top5acc91.74.ckpt”,下载后存放在fine_tune.py同路径下。
### 用法
您可以通过python脚本开始训练:
```shell
python fine_tune.py --config_path ./config/resnet34_cpu_config.yaml
```
### 结果
- 使用flower_photos据集训练ResNet34
```text
# 迁移训练结果(CPU)
epoch: 1 step: 1, loss is 1.5975518
epoch: 1 step: 2, loss is 1.5453123
epoch: 1 step: 3, loss is 1.3293151
epoch: 1 step: 4, loss is 1.491757
epoch: 1 step: 5, loss is 1.3044931
...
```
## 迁移训练推理过程
### 用法
您可以通过python脚本开始推理(需要先到resnet34_cpu_config.yaml配置文件中将ckpt_path设为最好的ckpt文件路径):
```shell
python eval.py --config_path ./cpu_default_config.yaml --data_path ./dataset/flower_photos/test
```
## 续训过程
### 用法
......@@ -1039,6 +1086,26 @@ result:{'top_1_accuracy': 0.928385416666666} prune_rate=0.45 ckpt=~/resnet50_cif
| 微调检查点| 166M(.ckpt文件) |
| 配置文件 | [链接](https://gitee.com/mindspore/models/tree/master/official/cv/resnet/config) | [链接](https://gitee.com/mindspore/models/tree/master/official/cv/resnet/config)|
#### flower_photos上的ResNet34
| 参数 | CPU |
| -------------------------- |-------------------------------------------------------------------------------|
| 模型版本 | ResNet34 |
| 资源 | CPU 3.40GHz,4核;内存 8G;系统 win7 |
| 上传日期 | 2022-08-30 |
| MindSpore版本 | 1.8.1 |
| 数据集 | flower_photos |
| 训练参数 | epoch=10, steps per epoch=85, batch_size = 32 |
| 优化器 | Momentum |
| 损失函数 | Softmax交叉熵 |
| 输出 | 概率 |
| 损失 | 0.32727173 |
| 速度 | 6859毫秒/步 |
| 总时长 | 70分钟 |
| 参数(M) | 20.28 |
| 微调检查点| 81M(.ckpt文件) |
| 配置文件 | [链接](https://gitee.com/mindspore/models/tree/master/official/cv/resnet/config) |
#### ImageNet2012上的ResNet101
| 参数 | Ascend 910 | GPU |
......
# Builtin Configurations(DO NOT CHANGE THESE CONFIGURATIONS unless you know exactly what you are doing)
enable_modelarts: False
# Url for modelarts
data_url: ""
train_url: ""
checkpoint_url: ""
# Path for local
run_distribute: False
enable_profiling: False
data_path: "./datasets/"
output_path: "/cache/train"
load_path: "/cache/checkpoint_path/"
device_target: "CPU"
checkpoint_path: "./resnet34_ascend_v170_imagenet2012_official_cv_top1acc73.61_top5acc91.74.ckpt"
checkpoint_file_path: "./best_acc.ckpt"
# ==============================================================================
# Training options
optimizer: "Momentum"
infer_label: ""
class_num: 5
batch_size: 32
loss_scale: 1024
momentum: 0.9
weight_decay: 0.0001
epoch_size: 10
pretrain_epoch_size: 0
save_checkpoint: True
save_checkpoint_epochs: 5
keep_checkpoint_max: 10
warmup_epochs: 5
learning_rate: 0.001
lr_decay_mode: "poly"
lr_init: 0.01
lr_end: 0.00001
lr_max: 0.1
lars_epsilon: 0.0
lars_coefficient: 0.001
net_name: "resnet34"
dataset: "flower_photos"
device_num: 1
pre_trained: ""
run_eval: False
eval_dataset_path: ""
parameter_server: False
filter_weight: False
save_best_ckpt: True
eval_start_epoch: 40
eval_interval: 1
enable_cache: False
cache_session_id: ""
mode_name: "GRAPH"
boost_mode: "O0"
conv_init: "XavierUniform"
dense_init: "TruncatedNormal"
train_image_size: 224
eval_image_size: 224
# Export options
device_id: 0
width: 224
height: 224
file_name: "resnet34"
file_format: "MINDIR"
ckpt_file: ""
network_dataset: "resnet34_flower_photos"
# Retrain options
save_graphs: False
save_graphs_path: "./graphs"
has_trained_epoch: 0
has_trained_step: 0
# postprocess resnet inference
result_path: ''
label_path: ''
---
# Help description for each configuration
enable_modelarts: "Whether training on modelarts, default: False"
data_url: "Dataset url for obs"
checkpoint_url: "The location of checkpoint for obs"
data_path: "Dataset path for local"
output_path: "Training output path for local"
load_path: "The location of checkpoint for obs"
device_target: "Target device type, available: [Ascend, GPU, CPU]"
enable_profiling: "Whether enable profiling while training, default: False"
num_classes: "Class for dataset"
batch_size: "Batch size for training and evaluation"
epoch_size: "Total training epochs."
checkpoint_path: "The location of the checkpoint file."
checkpoint_file_path: "The location of the checkpoint file."
save_graphs: "Whether save graphs during training, default: False."
save_graphs_path: "Path to save graphs."
......@@ -35,6 +35,7 @@ if config.net_name in ("resnet18", "resnet34", "resnet50", "resnet152"):
from src.dataset import create_dataset1 as create_dataset
else:
from src.dataset import create_dataset2 as create_dataset
elif config.net_name == "resnet101":
from src.resnet import resnet101 as resnet
from src.dataset import create_dataset3 as create_dataset
......@@ -46,7 +47,6 @@ else:
def eval_net():
"""eval net"""
target = config.device_target
# init context
ms.set_context(mode=ms.GRAPH_MODE, device_target=target, save_graphs=False)
if target == "Ascend":
......
# Copyright 2022 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""train resnet34."""
import os
import mindspore as ms
import mindspore.nn as nn
from mindspore.train.model import Model
from mindspore.train.callback import TimeMonitor
from mindspore.nn.loss import SoftmaxCrossEntropyWithLogits
from train import LossCallBack
from src.resnet import resnet34
from src.dataset import create_dataset2
from src.model_utils.config import config
from src.eval_callback import EvalCallBack
ms.set_context(mode=ms.GRAPH_MODE, device_target=config.device_target, save_graphs=False)
def import_data():
dataset_train = create_dataset2(dataset_path=os.path.join(config.data_path, "train"), do_train=True,
batch_size=config.batch_size, train_image_size=config.train_image_size,
eval_image_size=config.eval_image_size, target=config.device_target,
distribute=False, enable_cache=False, cache_session_id=None)
dataset_val = create_dataset2(dataset_path=os.path.join(config.data_path, "test"), do_train=True,
batch_size=config.batch_size, train_image_size=config.train_image_size,
eval_image_size=config.eval_image_size, target=config.device_target,
distribute=False, enable_cache=False, cache_session_id=None)
#
data = next(dataset_train.create_dict_iterator())
images = data["image"]
labels = data["label"]
print("Tensor of image", images.shape)
print("Labels:", labels)
return dataset_train, dataset_val
# define the head layer
class DenseHead(nn.Cell):
def __init__(self, input_channel, num_classes):
super(DenseHead, self).__init__()
self.dense = nn.Dense(input_channel, num_classes)
def construct(self, x):
return self.dense(x)
def filter_checkpoint_parameter_by_list(origin_dict, param_filter):
"""remove useless parameters according to filter_list"""
for key in list(origin_dict.keys()):
for name in param_filter:
if name in key:
print("Delete parameter from checkpoint: ", key)
del origin_dict[key]
break
def init_weight(net, param_dict):
"""init_weight"""
if param_dict:
if config.filter_weight:
filter_list = [x.name for x in net.end_point.get_parameters()]
filter_checkpoint_parameter_by_list(param_dict, filter_list)
ms.load_param_into_net(net, param_dict)
def eval_net(net, dataset):
"""eval net"""
net.set_train(False)
loss = SoftmaxCrossEntropyWithLogits(sparse=True, reduction='mean')
# define model
model = ms.Model(net, loss_fn=loss, metrics={'top_1_accuracy', 'top_5_accuracy'})
# eval model
res = model.eval(dataset)
print("result:", res)
def apply_eval(eval_param):
eval_model = eval_param["model"]
eval_ds = eval_param["dataset"]
metrics_name = eval_param["metrics_name"]
res = eval_model.eval(eval_ds)
print("res:", res)
return res[metrics_name]
def run_eval(model, ckpt_save_dir, eval_dataset, cb):
"""run_eval"""
eval_param_dict = {"model": model, "dataset": eval_dataset, "metrics_name": "top_1_accuracy"}
eval_cb = EvalCallBack(apply_eval, eval_param_dict, interval=1,
eval_start_epoch=0, save_best_ckpt=True,
ckpt_directory=ckpt_save_dir, besk_ckpt_name="best_acc.ckpt",
metrics_name="acc")
cb += [eval_cb]
def finetune_train():
dataset_train, data_val = import_data()
ckpt_param_dict = ms.load_checkpoint(config.checkpoint_path)
net = resnet34(class_num=1001)
init_weight(net=net, param_dict=ckpt_param_dict)
print("net parameter:")
for param in net.get_parameters():
print("param:", param)
# fully Connected layer the size of the input layer
src_head = net.end_point
in_channels = src_head.in_channels
# the number of output channels is 5
head = DenseHead(in_channels, config.class_num)
# reset the fully connected layer
net.end_point = head
print("net.get_parameters():", net.get_parameters())
print("net.trainable_params():", net.trainable_params())
# freeze all parameters except the last layer
for param in net.get_parameters():
if param.name not in ["end_point.dense.weight", "end_point.dense.bias"]:
param.requires_grad = False
if param.name == "end_point.dense.weight":
param.name = "end_point.weight"
if param.name == "end_point.dense.bias":
param.name = "end_point.bias"
# define optimizer and loss function
opt = nn.Momentum(params=net.trainable_params(), learning_rate=config.learning_rate, momentum=config.momentum)
loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction='mean')
# instantiating the model
model = Model(net, loss, opt, metrics={'top_1_accuracy', 'top_5_accuracy'})
# define callbacks
step_size = dataset_train.get_dataset_size()
time_cb = TimeMonitor(data_size=step_size)
loss_cb = LossCallBack(config.has_trained_epoch)
cb = [time_cb, loss_cb]
run_eval(model, "./", data_val, cb)
num_epochs = config.epoch_size
model.train(num_epochs, dataset_train, callbacks=cb)
eval_net(net, data_val)
if __name__ == '__main__':
finetune_train()
# Copyright 2022 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""quick start"""
# ## This paper mainly visualizes the prediction data, uses the model to predict, and visualizes the prediction results.
import os
import numpy as np
import matplotlib.pyplot as plt
from mindspore import Tensor
from mindspore.train import Model
from mindspore import load_checkpoint, load_param_into_net
from src.dataset import create_dataset2
from src.resnet import resnet34
from src.model_utils.config import config
# class_name corresponds to label,and labels are marked in the order of folders
class_name = {0: "daisy", 1: "dandelion", 2: "roses", 3: "sunflowers", 4: "tulips"}
# define visual prediction data functions:
def visual_input_data(val_dataset):
data = next(val_dataset.create_dict_iterator())
images = data["image"]
labels = data["label"]
print("Tensor of image", images.shape)
print("Labels:", labels)
plt.figure(figsize=(15, 7))
for i in range(len(labels)):
# get the image and its corresponding label
data_image = images[i].asnumpy()
# data_label = labels[i]
# process images for display
data_image = np.transpose(data_image, (1, 2, 0))
mean = np.array([0.485, 0.456, 0.406])
std = np.array([0.229, 0.224, 0.225])
data_image = std * data_image + mean
data_image = np.clip(data_image, 0, 1)
# display image
plt.subplot(4, 8, i+1)
plt.imshow(data_image)
plt.title(class_name[int(labels[i].asnumpy())], fontsize=10)
plt.axis("off")
plt.show()
# define visualize_model(),visualize model prediction
def visualize_model(best_ckpt_path, val_ds):
net = resnet34(class_num=config.class_num)
# load model parameters
param_dict = load_checkpoint(best_ckpt_path)
load_param_into_net(net, param_dict)
model = Model(net)
# load the data of the validation set for validation
data = next(val_ds.create_dict_iterator())
images = data["image"].asnumpy()
labels = data["label"].asnumpy()
flower_class_name = {0: "daisy", 1: "dandelion", 2: "roses", 3: "sunflowers", 4: "tulips"}
# prediction image category
output = model.predict(Tensor(data['image']))
pred = np.argmax(output.asnumpy(), axis=1)
# display the image and the predicted value of the image
plt.figure(figsize=(15, 7))
for i in range(len(labels)):
plt.subplot(4, 8, i + 1)
# if the prediction is correct, it is displayed in blue; if the prediction is wrong, it is displayed in red
color = 'blue' if pred[i] == labels[i] else 'red'
plt.title('predict:{}'.format(flower_class_name[pred[i]]), color=color)
picture_show = np.transpose(images[i], (1, 2, 0))
mean = np.array([0.485, 0.456, 0.406])
std = np.array([0.229, 0.224, 0.225])
picture_show = std * picture_show + mean
picture_show = np.clip(picture_show, 0, 1)
plt.imshow(picture_show)
plt.axis('off')
plt.show()
# the best ckpt file obtained by model tuning is used to predict the images of the validation set
# (need to go to resnet34_cpu_config.yaml set ckpt_path as the best ckpt file path)
if __name__ == '__main__':
# load inference dataset
dataset_val = create_dataset2(dataset_path=os.path.join(config.data_path, "flower_photos/"), do_train=True,
batch_size=config.batch_size, train_image_size=config.train_image_size,
eval_image_size=config.eval_image_size, target=config.device_target,
distribute=False, enable_cache=False, cache_session_id=None)
visual_input_data(dataset_val)
# the best ckpt file obtained by model tuning is used to predict the images of the validation set
# (need to go to resnet34_cpu_config.yaml set ckpt_path as the best ckpt file path)
visualize_model('best_acc.ckpt', dataset_val)
numpy
scipy
easydict
matplotlib
\ No newline at end of file
# Copyright 2022 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""
cpu_cut_data.
"""
import os
import shutil
def generate_data():
dirs = []
path = "./"
abs_path = None
for abs_path, j, _ in os.walk(path):
print("abs_path:", abs_path)
if len(j).__trunc__() > 0:
dirs.append(j)
print(dirs)
train_folder = os.path.exists("./train")
if not train_folder:
os.makedirs("./train")
test_folder = os.path.exists("./test")
if not test_folder:
os.makedirs("./test")
for di in dirs[0]:
files = os.listdir(di)
train_set = files[: int(len(files) * 3 / 4)]
test_set = files[int(len(files) * 3 / 4):]
for file in train_set:
file_path = "./train/" + di + "/"
folder = os.path.exists(file_path)
if not folder:
os.makedirs(file_path)
src_file = "./" + di + "/" + file
print("src_file:", src_file)
dst_file = file_path + file
print("dst_file:", dst_file)
shutil.copyfile(src_file, dst_file)
for file in test_set:
file_path = "./test/" + di + "/"
folder = os.path.exists(file_path)
if not folder:
os.makedirs(file_path)
src_file = "./" + di + "/" + file
dst_file = file_path + file
shutil.copyfile(src_file, dst_file)
if __name__ == '__main__':
generate_data()
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment