Skip to content
Snippets Groups Projects
Unverified Commit e4502e6b authored by i-robot's avatar i-robot Committed by Gitee
Browse files

!3370 昇腾众智-西安电子科技大学-MindSpore ONNX-resnet3d

Merge pull request !3370 from cfreshgirl/resnet3d_onnx
parents 76bd0d1f 924f83dc
No related branches found
No related tags found
No related merge requests found
...@@ -14,6 +14,7 @@ ...@@ -14,6 +14,7 @@
- [脚本参数](#脚本参数) - [脚本参数](#脚本参数)
- [训练过程](#训练过程) - [训练过程](#训练过程)
- [评估过程](#评估过程) - [评估过程](#评估过程)
- [ONNX评估](#ONNX评估)
- [导出过程](#导出过程) - [导出过程](#导出过程)
- [导出](#导出) - [导出](#导出)
- [推理过程](#推理过程) - [推理过程](#推理过程)
...@@ -54,9 +55,11 @@ resnet3d的总体网络架构如下: ...@@ -54,9 +55,11 @@ resnet3d的总体网络架构如下:
- [MIT](http://moments.csail.mit.edu/) - [MIT](http://moments.csail.mit.edu/)
- MIT-IBM Watson AI Lab 推出的一个全新的百万规模视频理解数据集Moments in Time,共有100,0000 个视频, 用于预训练 - MIT-IBM Watson AI Lab 推出的一个全新的百万规模视频理解数据集Moments in Time,共有100,0000 个视频, 用于预训练
- [hmdb51](https://serre-lab.clps.brown.edu/resource/hmdb-a-large-human-motion-database/#Downloads) - [hmdb51](https://serre-lab.clps.brown.edu/resource/hmdb-a-large-human-motion-database/#Downloads)
- 一个小型的视频行为识别数据集,包含51类动作,共有6849个视频,每个动作至少包含51个视频, 用于Fine-tune - 一个小型的视频行为识别数据集,包含51类动作,共有6849个视频,每个动作至少包含51个视频, 用于Fine-tune,此处使用Stabilized HMDB51
- labels地址(http://serre-lab.clps.brown.edu/wp-content/uploads/2013/10/test_train_splits.rar)
- [UCF101](https://www.crcv.ucf.edu/data/UCF101/UCF101.rar) - [UCF101](https://www.crcv.ucf.edu/data/UCF101/UCF101.rar)
- 从YouTube收集的具有101个动作类别的真实动作视频的动作识别数据集, 共计13320个视频, 用于Fine-tune - 从YouTube收集的具有101个动作类别的真实动作视频的动作识别数据集, 共计13320个视频, 用于Fine-tune
- labels地址(https://www.crcv.ucf.edu/data/UCF101/UCF101TrainTestSplits-RecognitionTask.zip)
预训练模型获取地址: 预训练模型获取地址:
[链接](https://github.com/kenshohara/3D-ResNets-PyTorch) [链接](https://github.com/kenshohara/3D-ResNets-PyTorch)
...@@ -68,7 +71,7 @@ python pth_to_ckpt.py --pth_path=./pretrained.pth --ckpt_path=./pretrained.ckpt ...@@ -68,7 +71,7 @@ python pth_to_ckpt.py --pth_path=./pretrained.pth --ckpt_path=./pretrained.ckpt
特别说明: 特别说明:
原始数据集下载后格式为: 按照下面格式创建目录,将下载好的hmdb51_sta.rar解压,把解压出来的文件夹放到videos目录中。将数据集对应的labels解压,把解压出的txt文件移动到labels目录中。
```text ```text
. .
...@@ -87,18 +90,18 @@ python pth_to_ckpt.py --pth_path=./pretrained.pth --ckpt_path=./pretrained.ckpt ...@@ -87,18 +90,18 @@ python pth_to_ckpt.py --pth_path=./pretrained.pth --ckpt_path=./pretrained.ckpt
└──json └──json
``` ```
使用src/generate_hmdb51_json.py生成json格式的标注文件 使用src/generate_video_jpgs.py将avi格式的视频文件转换为jpg格式的图片文件
```text ```text
cd ~/src cd ~/src
python3 generate_hmdb51_json.py --dir_path ~/dataset/hmdb51/labels/ --video_path ~/dataset/hmdb51/videos/ --dst_dir_path ~/dataset/hmdb51/json python3 generate_video_jpgs.py --video_path ~/dataset/hmdb51/videos/ --target_path ~/dataset/hmdb51/jpg/
``` ```
使用src/generate_video_jpgs.py将avi格式的视频文件转换为jpg格式的图片文件 使用src/generate_hmdb51_json.py生成json格式的标注文件
```text ```text
cd ~/src cd ~/src
python3 generate_video_jpgs.py --video_path ~/dataset/hmdb51/videos/ --target_path ~/dataset/hmdb51/jpg/ python3 generate_hmdb51_json.py --dir_path ~/dataset/hmdb51/labels/ --video_path ~/dataset/hmdb51/jpg/ --dst_dir_path ~/dataset/hmdb51/json
``` ```
# 特性 # 特性
...@@ -115,7 +118,7 @@ python3 generate_video_jpgs.py --video_path ~/dataset/hmdb51/videos/ --target_pa ...@@ -115,7 +118,7 @@ python3 generate_video_jpgs.py --video_path ~/dataset/hmdb51/videos/ --target_pa
- [MindSpore](https://www.mindspore.cn/install/en) - [MindSpore](https://www.mindspore.cn/install/en)
- 如需查看详情,请参见如下资源: - 如需查看详情,请参见如下资源:
- [MindSpore教程](https://www.mindspore.cn/tutorials/zh-CN/master/index.html) - [MindSpore教程](https://www.mindspore.cn/tutorials/zh-CN/master/index.html)
- [MindSpore Python API](https://www.mindspore.cn/docs/zh-CN/master/index.html) - [MindSpore Python API](https://www.mindspore.cn/docs/api/zh-CN/master/index.html)
# 快速入门 # 快速入门
...@@ -146,6 +149,7 @@ python3 generate_video_jpgs.py --video_path ~/dataset/hmdb51/videos/ --target_pa ...@@ -146,6 +149,7 @@ python3 generate_video_jpgs.py --video_path ~/dataset/hmdb51/videos/ --target_pa
├── run_distribute_train.sh # 启动Ascend分布式训练(8卡) ├── run_distribute_train.sh # 启动Ascend分布式训练(8卡)
├── run_eval.sh # 启动Ascend评估 ├── run_eval.sh # 启动Ascend评估
├── run_standalone_train.sh # 启动Ascend单机训练(单卡) ├── run_standalone_train.sh # 启动Ascend单机训练(单卡)
├──run_eval_onnx.sh # ONNX评估的shell脚本
├── src ├── src
├── __init__.py ├── __init__.py
├── config.py # yaml文件解析 ├── config.py # yaml文件解析
...@@ -165,6 +169,7 @@ python3 generate_video_jpgs.py --video_path ~/dataset/hmdb51/videos/ --target_pa ...@@ -165,6 +169,7 @@ python3 generate_video_jpgs.py --video_path ~/dataset/hmdb51/videos/ --target_pa
└── videodataset_multiclips.py # 自定义数据集加载方式 └── videodataset_multiclips.py # 自定义数据集加载方式
├── pth_to_ckpt.py # 将预训练模型从pth格式转换为ckpt格式 ├── pth_to_ckpt.py # 将预训练模型从pth格式转换为ckpt格式
├── eval.py # 评估网络 ├── eval.py # 评估网络
├── eval_onnx.py # ONNX评估脚本
├── train.py # 训练网络 ├── train.py # 训练网络
├── hmdb51_config.yaml # 参数配置 ├── hmdb51_config.yaml # 参数配置
└── ucf101_config.yaml # 参数配置 └── ucf101_config.yaml # 参数配置
...@@ -180,6 +185,7 @@ python3 generate_video_jpgs.py --video_path ~/dataset/hmdb51/videos/ --target_pa ...@@ -180,6 +185,7 @@ python3 generate_video_jpgs.py --video_path ~/dataset/hmdb51/videos/ --target_pa
'result_path': './results/ucf101', # 训练、推理结果路径 'result_path': './results/ucf101', # 训练、推理结果路径
'pretrain_path': '~/your_path/pretrained.ckpt', # 预训练模型文件路径 'pretrain_path': '~/your_path/pretrained.ckpt', # 预训练模型文件路径
'inference_ckpt_path': "~/your_path/results/ucf101/result.ckpt", # 用于推理的模型文件路径 'inference_ckpt_path': "~/your_path/results/ucf101/result.ckpt", # 用于推理的模型文件路径
'onnx_path': "~/your_path/results/result-3d.onnx", # 用于推理的模型文件路径
'n_classes': 101, # 数据集类别数 'n_classes': 101, # 数据集类别数
'sample_size': 112, # 图片分辨率 'sample_size': 112, # 图片分辨率
'sample_duration': 16, # 视频片段长度,单位:帧 'sample_duration': 16, # 视频片段长度,单位:帧
...@@ -303,6 +309,48 @@ clip: 66.5% top-1: 69.7% top-5: 93.8% ...@@ -303,6 +309,48 @@ clip: 66.5% top-1: 69.7% top-5: 93.8%
clip: 88.8% top-1: 92.7% top-5: 99.3% clip: 88.8% top-1: 92.7% top-5: 99.3%
``` ```
## ONNX评估
### 导出onnx模型
```bash
python export.py --ckpt_file=/path/best.ckpt --file_format=ONNX --n_classes=51 --batch_size=1 --device_target=GPU
```
- `ckpt_file` ckpt文件路径
- `file_format` 导出模型格式,此处为ONNX
- `n_classes` 使用数据集类别数,hmdb51数据集此参数为51,ucf101数据集此参数为101
- `batch_size` 批次数,固定为1
- `device_target` 目前仅支持GPU或CPU
### 运行ONNX模型评估
```bash
用法:bash run_eval_onnx.sh [ucf101|hmdb51] [VIDEO_PATH] [ANNOTATION_PATH] [ONNX_PATH]
实例:bash run_eval_onnx.sh ucf101 /path/ucf101/jpg /path/ucf101/json/ucf101_01.json /path/resnet-3d.onnx
```
- `[ucf101|hmdb51]` 选择所使用的数据集
- `[VIDEO_PATH]` 视频路径
- `[ANNOTATION_PATH]` 标签路径
- `[ONNX_PATH]` onnx模型的路径
### 结果
评估结果保存在示例路径中,文件名为“~/eval_onnx.log”。您可在此路径下的日志找到如下结果:
- 使用hmdb51数据集评估resnet3d
```text
clip: 66.5% top-1: 69.7% top-5: 93.8%
```
- 使用ucf101数据集评估resnet3d
```text
clip: 88.8% top-1: 92.7% top-5: 99.3%
```
## 导出过程 ## 导出过程
### 导出 ### 导出
...@@ -310,7 +358,7 @@ clip: 88.8% top-1: 92.7% top-5: 99.3% ...@@ -310,7 +358,7 @@ clip: 88.8% top-1: 92.7% top-5: 99.3%
在导出时,hmdb51数据集,参数n_classes设置为51,ucf101数据集,参数n_classes设置为101, 参数batch_size只能设置为1. 在导出时,hmdb51数据集,参数n_classes设置为51,ucf101数据集,参数n_classes设置为101, 参数batch_size只能设置为1.
```shell ```shell
python export.py --ckpt_file=./saved_model/best.ckpt --file_format=MINDIR --n_classes=51, --batch_size=1 python export.py --ckpt_file=./saved_model/best.ckpt --file_format=MINDIR --n_classes=51 --batch_size=1
``` ```
## 推理过程 ## 推理过程
......
# Copyright 2022 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""
Eval.
"""
import time
import random
import json
from collections import defaultdict
import numpy as np
import onnxruntime
from mindspore import dataset as de
from mindspore.common import set_seed
from src.config import config as args_opt
from src.dataset import create_eval_dataset
from src.inference import (topk_, get_video_results, load_ground_truth, load_result,
remove_nonexistent_ground_truth, calculate_clip_acc)
from src.videodataset_multiclips import get_target_path
random.seed(1)
np.random.seed(1)
de.config.set_seed(1)
set_seed(1)
if __name__ == '__main__':
t1_ = time.time()
cfg = args_opt
print(cfg)
target = args_opt.device_target
if target == 'GPU':
providers = ['CUDAExecutionProvider']
elif target == 'CPU':
providers = ['CPUExecutionProvider']
else:
raise ValueError(
f'Unsupported target device {target}, '
f'Expected one of: "CPU", "GPU"'
)
session = onnxruntime.InferenceSession(args_opt.onnx_path, providers=providers)
predict_data = create_eval_dataset(
cfg.video_path, cfg.annotation_path, cfg)
size = predict_data.get_dataset_size()
total_target_path = get_target_path(cfg.annotation_path)
with total_target_path.open('r') as f:
total_target_data = json.load(f)
results = {'results': defaultdict(list)}
count = 0
for data in predict_data.create_dict_iterator(output_numpy=True):
t1 = time.time()
x, label = data['data'][0], data['label'].tolist()
video_ids, segments = zip(
*total_target_data['targets'][str(label[0])])
x_list = np.split(x, x.shape[0], axis=0)
outputs = []
for x in x_list:
inputs = {session.get_inputs()[0].name: x}
output = session.run(None, inputs)[0]
outputs.append(output)
outputs = np.concatenate(outputs, axis=0)
_, locs = topk_(outputs, K=1)
locs = locs.reshape(1, -1)
t2 = time.time()
print("[{} / {}] Net time: {} ms".format(count, size, (t2 - t1) * 1000))
for j in range(0, outputs.shape[0]):
results['results'][video_ids[j]].append({
'segment': segments[j],
'output': outputs[j]
})
count += 1
class_names = total_target_data['class_names']
inference_results = {'results': {}}
clips_inference_results = {'results': {}}
for video_id, video_results in results['results'].items():
video_outputs = [
segment_result['output'] for segment_result in video_results
]
video_outputs = np.stack(video_outputs, axis=0)
average_scores = np.mean(video_outputs, axis=0)
clips_inference_results['results'][video_id] = get_video_results(
average_scores, class_names, 5)
inference_results['results'][video_id] = []
for segment_result in video_results:
segment = segment_result['segment']
result = get_video_results(segment_result['output'],
class_names, 5)
inference_results['results'][video_id].append({
'segment': segment,
'result': result
})
# init context
print('load ground truth')
ground_truth, class_labels_map = load_ground_truth(
cfg.annotation_path, "validation")
print('number of ground truth: {}'.format(len(ground_truth)))
n_ground_truth_top_1 = len(ground_truth)
n_ground_truth_top_5 = len(ground_truth)
result_top1, result_top5 = load_result(
clips_inference_results, class_labels_map)
ground_truth_top1 = remove_nonexistent_ground_truth(
ground_truth, result_top1)
ground_truth_top5 = remove_nonexistent_ground_truth(
ground_truth, result_top5)
if cfg.ignore:
n_ground_truth_top_1 = len(ground_truth_top1)
n_ground_truth_top_5 = len(ground_truth_top5)
correct_top1 = [1 if line[1] in result_top1[line[0]]
else 0 for line in ground_truth_top1]
correct_top5 = [1 if line[1] in result_top5[line[0]]
else 0 for line in ground_truth_top5]
clip_acc = calculate_clip_acc(
inference_results, ground_truth, class_labels_map)
print(sum(correct_top1))
print(n_ground_truth_top_1)
print(sum(correct_top5))
print(n_ground_truth_top_5)
accuracy_top1 = float(sum(correct_top1)) / float(n_ground_truth_top_1)
accuracy_top5 = float(sum(correct_top5)) / float(n_ground_truth_top_5)
print('==================Accuracy=================\n'
' clip-acc : {} \ttop-1 : {} \ttop-5: {}'.format(clip_acc, accuracy_top1, accuracy_top5))
t2_ = time.time()
print("Total time : {} s".format(t2_ - t1_))
...@@ -31,7 +31,7 @@ parser.add_argument('--ckpt_file', type=str, required=True, ...@@ -31,7 +31,7 @@ parser.add_argument('--ckpt_file', type=str, required=True,
parser.add_argument('--file_name', type=str, parser.add_argument('--file_name', type=str,
default='resnet-3d', help='Output file name.') default='resnet-3d', help='Output file name.')
parser.add_argument('--file_format', type=str, parser.add_argument('--file_format', type=str,
choices=['AIR', 'MINDIR'], default='MINDIR', help='File format.') choices=['AIR', 'MINDIR', 'ONNX'], default='MINDIR', help='File format.')
parser.add_argument('--device_target', type=str, choices=['Ascend', 'CPU', 'GPU'], default='Ascend', parser.add_argument('--device_target', type=str, choices=['Ascend', 'CPU', 'GPU'], default='Ascend',
help='Device target') help='Device target')
parser.add_argument('--sample_duration', type=int, default=16) parser.add_argument('--sample_duration', type=int, default=16)
......
...@@ -14,6 +14,7 @@ annotation_path: "" ...@@ -14,6 +14,7 @@ annotation_path: ""
result_path: "" result_path: ""
pretrain_path: "" pretrain_path: ""
inference_ckpt_path: "" inference_ckpt_path: ""
onnx_path: ""
n_classes: 51 n_classes: 51
sample_size: 112 sample_size: 112
sample_duration: 16 sample_duration: 16
......
numpy 1.21.6
onnxruntime-gpu 1.11.1
pyyaml 6.0
Pillow 9.2.0
\ No newline at end of file
#!/bin/bash
# Copyright 2022 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
echo "=============================================================================================================="
echo "Please run the script as: "
echo "bash run_eval_onnx.sh [ucf101|hmdb51] [VIDEO_PATH] [ANNOTATION_PATH] [ONNX_PATH]"
echo "For example:
bash run_eval_onnx.sh 0 ucf101 \\
/path/ucf101/jpg/ \\
/path/ucf101/json/ucf101_01.json \\
/path/resnet-3d.onnx"
echo "It is better to use the ABSOLUTE path."
echo "=============================================================================================================="
set -e
if [ $# != 4 ]
then
echo "Usage: bash run_eval_onnx.sh [ucf101|hmdb51] [VIDEO_PATH] [ANNOTATION_PATH] [ONNX_PATH]"
exit 1
fi
DATASET=$1
VIDEO_PATH=$2
ANNOTATION_PATH=$3
ONNX_PATH=$4
EXEC_PATH=$(pwd)
echo "$EXEC_PATH"
cd ..
env > env.log
echo "Eval begin"
python eval_onnx.py --is_modelarts False --config_path ./${DATASET}_config.yaml --video_path $VIDEO_PATH \
--annotation_path $ANNOTATION_PATH --onnx_path $ONNX_PATH --device_target GPU > eval_$DATASET.log 2>&1 &
echo "Evaling. Check it at eval_$DATASET.log"
...@@ -33,7 +33,7 @@ class PILTrans: ...@@ -33,7 +33,7 @@ class PILTrans:
ratio=(opt.train_crop_min_ratio, 1.0 / opt.train_crop_min_ratio)) ratio=(opt.train_crop_min_ratio, 1.0 / opt.train_crop_min_ratio))
self.random_horizontal_flip = vision.RandomHorizontalFlip(prob=0.5) self.random_horizontal_flip = vision.RandomHorizontalFlip(prob=0.5)
self.color = vision.RandomColorAdjust(0.4, 0.4, 0.4, 0.1) self.color = vision.RandomColorAdjust(0.4, 0.4, 0.4, 0.1)
self.normalize = vision.Normalize(mean=mean, std=std, is_hwc=False) self.normalize = vision.Normalize(mean=mean, std=std)
self.to_tensor = vision.ToTensor() self.to_tensor = vision.ToTensor()
self.resize = vision.Resize(opt.sample_size) self.resize = vision.Resize(opt.sample_size)
self.center_crop = vision.CenterCrop(opt.sample_size) self.center_crop = vision.CenterCrop(opt.sample_size)
...@@ -75,7 +75,7 @@ class EvalPILTrans: ...@@ -75,7 +75,7 @@ class EvalPILTrans:
self.to_pil = vision.ToPIL() self.to_pil = vision.ToPIL()
self.resize = vision.Resize(opt.sample_size) self.resize = vision.Resize(opt.sample_size)
self.center_crop = vision.CenterCrop(opt.sample_size) self.center_crop = vision.CenterCrop(opt.sample_size)
self.normalize = vision.Normalize(mean=mean, std=std, is_hwc=False) self.normalize = vision.Normalize(mean=mean, std=std)
self.to_tensor = vision.ToTensor() self.to_tensor = vision.ToTensor()
def __call__(self, data, labels, batchInfo): def __call__(self, data, labels, batchInfo):
......
...@@ -14,6 +14,7 @@ annotation_path: "" ...@@ -14,6 +14,7 @@ annotation_path: ""
result_path: "" result_path: ""
pretrain_path: "" pretrain_path: ""
inference_ckpt_path: "" inference_ckpt_path: ""
onnx_path: ""
n_classes: 101 n_classes: 101
sample_size: 112 sample_size: 112
sample_duration: 16 sample_duration: 16
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment